-
Notifications
You must be signed in to change notification settings - Fork 467
Added On-Demand table feature #3250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added On-Demand table feature #3250
Conversation
An on-demand table is like an offline table, except that its tablets will be hosted when a client wants to interact with them. A table can be put into an on-demand state using the ondemand shell command or TableOperations.ondemand method. The accumulo.root and accumulo.metadata tables cannot be put into an On-Demand state. Tablets of an on-demand table are hosted when a client fails to find the tablet location of an on-demand table. When this occurs the client makes an RPC call that inserts an "ondemand" column into the Tablet metadata. In the case that an on-demand tablet needs to be hosted for a client operation, the client will wait for the tablet to be hosted and the tablet location to be resolved by the client before proceeding. The Manager will assign tablets that have an "ondemand" column in their metadata to TabletServers and will unassign hosted on-demand tablets when the "ondemand" column does not exist. When setting an online table to an on-demand state, all of its tablets will be unloaded. The new Property MANAGER_TABLET_GROUP_WATCHER_INTERVAL specifies the interval at which the Manager will look for tablet state changes. Lower values for this property will reduce the wait time in the client for an on-demand tablet to be hosted. TabletServers keep track of the last access time for on-demand tablets. Periodically the TabletServer will evaluate which ondemand tablets should be unloaded. The interval for this evaluation is specified by the new Property TABLE_ONDEMAND_UNLOADER_INTERVAL. At this interval the TabletServer will call the OnDemandTabletUnloader class specified by the Property TABLE_ONDEMAND_UNLOADER, which will return the set of tablets to unload. Unloading is performed by removing the "ondemand" column from the tablet metadata, which will cause the Manager to unassign the tablets. In the case that the TabletServer is running low on memory it will not call the OnDemandTabletUnloader, instead unloading the tablet with the oldest access time. New metrics are emitted for on-demand tablets. Specifically, the metric accumulo.tserver.tablets.ondemand.unloaded.lowmem is incremented when an on-demand tablet is unloaded for low memory in the TabletServer, and the metric accumulo.tserver.tablets.ondemand.online is modified when on-demand tablets are hosted or unloaded. Finally, a new utility, ListOnlineOnDemandTablets, has been created to list on-demand tablets that are currently hosted. Closes apache#3210 apache#3211 apache#3212
If you want to test this yourself, you can:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noticed some things that can be changed.
|
||
@Override | ||
public Repo<Manager> call(final long tid, final Manager manager) { | ||
// TODO: How are we treating ONDEMAND tables for BulkImport? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In V2 bulk import when we switch it to use conditional mutations that may completely address this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #3264 for this.
test/src/main/java/org/apache/accumulo/test/MultiTableBatchWriterIT.java
Outdated
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/functional/ManagerAssignmentIT.java
Outdated
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/functional/ManagerAssignmentIT.java
Outdated
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/functional/ManagerAssignmentIT.java
Outdated
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/functional/ManagerAssignmentIT.java
Outdated
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/functional/ManagerAssignmentIT.java
Outdated
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/functional/OnDemandIT.java
Outdated
Show resolved
Hide resolved
Some ITs were created before the tablet unloading mechanism was developed. These tests used the offline command to unload the tablets. I removed the calls to offline to ensure that the ondemand call was unloading the tablets in the test.
Code introduced to bring ondemand tablets online was inadvertently throwing a TableNotFoundException in some cases causing ITs to fail because the exception was unexpected. Moved the code that could raise this exception and suppressed it with a log statement.
Ran a full IT build and a single test failed, ConditionalWriterIT.testDeleteTable. It failed because a TableNotFoundException was thrown instead of a TableDeletedException. This was due to a call being added to get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am mostly finished reviewing this, the only thing I have left that I want to look at are the changes around the tablet state iterator code. Going to come back to that later today.
core/src/main/java/org/apache/accumulo/core/clientImpl/TableOperationsImpl.java
Outdated
Show resolved
Hide resolved
+ " a table to be offline before exporting."); | ||
if (isOnline(tableName) || isOnDemand(tableName)) { | ||
throw new IllegalStateException("The table " + tableName | ||
+ " is not offline; exportTable requires a table to be offline before exporting."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment made me realize investigation may be needed. Created the following item in the elasticity project.
core/src/main/java/org/apache/accumulo/core/spi/ondemand/OnDemandTabletUnloader.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/spi/ondemand/OnDemandTabletUnloader.java
Outdated
Show resolved
Hide resolved
server/base/src/main/java/org/apache/accumulo/server/tables/TableManager.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public Repo<Manager> call(final long tid, final Manager manager) { | ||
// TODO: How are we treating ONDEMAND tables for BulkImport? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In V2 bulk import when we switch it to use conditional mutations that may completely address this.
private final AtomicLong syncCounter = new AtomicLong(0); | ||
|
||
final OnlineTablets onlineTablets = new OnlineTablets(); | ||
private final Map<KeyExtent,AtomicLong> onDemandTabletAccessTimes = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if it would simplify things if the last access time were stored in the tablet object. Then this map does not need to be maintained and have the possibility to drift from the set of online tablets. When this map is updated could instead call a method on the tablet.
Also wondering if it would be better for the tablet code to take ownership of updating the last access time rather than the tserver code. Tserver code could read it and tablet code update it as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no issue with this, and is probably a better implementation. I wonder if this should be done as a follow-on commit because Tablet is going to change, maybe drastically. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A follow on commit after merging SGTM. Could create an issue if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #3263 for this
server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java
Outdated
Show resolved
Hide resolved
server/tserver/src/main/java/org/apache/accumulo/tserver/metrics/TabletServerMetricsUtil.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/spi/ondemand/OnDemandTabletUnloader.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/spi/ondemand/OnDemandTabletUnloader.java
Outdated
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/functional/OnDemandIT.java
Outdated
Show resolved
Hide resolved
"1.3.5"), | ||
MANAGER_TABLET_GROUP_WATCHER_INTERVAL("manager.tablet.watcher.interval", "60s", | ||
PropertyType.TIMEDURATION, | ||
"Time to wait between scanning tablet states to determine migrations, etc.", "3.1.0"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could make a PR for the 3 branch to add this change. Could still include the change here, we can make the change in main branch and when we merge main into elastic branch just resolve any conflicts. If we get this change in main then it will be one less divergence to worry about as we merge main into elastic branch going forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #3256 for this.
final boolean shouldBeOnline = onlineTables.contains(tls.extent.tableId()); | ||
final boolean isOnDemandTable = onDemandTables.contains(tls.extent.tableId()); | ||
final boolean onDemandTabletShouldBeHosted = tls.ondemand; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be able to compute everything here and then later in the switch stmt only use shouldBeOnline var.
final boolean shouldBeOnline = onlineTables.contains(tls.extent.tableId()); | |
final boolean isOnDemandTable = onDemandTables.contains(tls.extent.tableId()); | |
final boolean onDemandTabletShouldBeHosted = tls.ondemand; | |
final boolean shouldBeOnline = onlineTables.contains(tls.extent.tableId()) || (onDemandTables.contains(tls.extent.tableId()) && tls.ondemand); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
those variables are used in the debug log line that follows.
core/src/main/java/org/apache/accumulo/core/clientImpl/TableOperationsImpl.java
Show resolved
Hide resolved
core/src/main/java/org/apache/accumulo/core/spi/ondemand/DefaultOnDemandTabletUnloader.java
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/functional/ManagerAssignmentIT.java
Outdated
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/functional/ManagerAssignmentIT.java
Outdated
Show resolved
Hide resolved
test/src/main/java/org/apache/accumulo/test/functional/ManagerAssignmentIT.java
Outdated
Show resolved
Hide resolved
Kicked off full IT build |
I tested this out quite a bit today using Uno and so far so good. The instructions need to be updated though with new property names. I couldn't figure out why the tables were not unloading and then I did some debugging and it looks like things got renamed. The instructions say to set: manager.tablet.watcher.interval=15s
table.ondemand.tablet.unloader.interval=1m
table.custom.ondemand.unloader.inactivity.threshold=120000 But this should now be: manager.tablet.watcher.interval=15s
#table renamed to tserver
tserver.ondemand.tablet.unloader.interval=1m
#seconds appended to the end
table.custom.ondemand.unloader.inactivity.threshold.seconds=120 |
My latest changes from #3257 caused merged conflicts in this branch since I changed things in tests and especially TabletLocationState. I went ahead and created a PR for this branch that merges in elasticity branch and fixes all the conflicts: dlmarion#39 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks pretty good, I tested everything out like I said earlier and it went fine once I set the correct properties. I made a few comments but most of them are just stylistic or nits so could be ignored.
There is one change for logging through that should be fixed as there is a mismatched number of parameters in TabletLocatorImpl
core/src/main/java/org/apache/accumulo/core/clientImpl/TabletLocatorImpl.java
Outdated
Show resolved
Hide resolved
server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java
Outdated
Show resolved
Hide resolved
server/base/src/main/java/org/apache/accumulo/server/util/ListOnlineOnDemandTablets.java
Outdated
Show resolved
Hide resolved
server/base/src/main/java/org/apache/accumulo/server/util/ListOnlineOnDemandTablets.java
Outdated
Show resolved
Hide resolved
server/tserver/src/main/java/org/apache/accumulo/tserver/TabletClientHandler.java
Outdated
Show resolved
Hide resolved
|
||
@Override | ||
public void bringOnDemandTabletsOnline(TInfo tinfo, TCredentials credentials, String tableId, | ||
List<TKeyExtent> extents) throws ThriftSecurityException, TException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List<TKeyExtent> extents) throws ThriftSecurityException, TException { | |
List<TKeyExtent> extents) throws TException { |
Merge elasticity branch and resolve conflicts
An on-demand table is like an offline table, except that its tablets will be hosted when a client wants to interact with them. A table can be put into an on-demand state using the ondemand shell command or TableOperations.ondemand method. The accumulo.root and accumulo.metadata tables cannot be put into an On-Demand state.
Tablets of an on-demand table are hosted when a client fails to find the tablet location of an on-demand table. When this occurs the client makes an RPC call that inserts an "ondemand" column into the Tablet metadata. In the case that an on-demand tablet needs to be hosted for a client operation, the client will wait for the tablet to be hosted and the tablet location to be resolved by the client before proceeding.
The Manager will assign tablets that have an "ondemand" column in their metadata to TabletServers and will unassign hosted on-demand tablets when the "ondemand" column does not exist. When setting an online table to an on-demand state, all of its tablets will be unloaded. The new Property MANAGER_TABLET_GROUP_WATCHER_INTERVAL specifies the interval at which the Manager will look for tablet state changes. Lower values for this property will reduce the wait time in the client for an on-demand tablet to be hosted.
TabletServers keep track of the last access time for on-demand tablets. Periodically the TabletServer will evaluate which ondemand tablets should be unloaded. The interval for this evaluation is specified by the new Property
TABLE_ONDEMAND_UNLOADER_INTERVAL. At this interval the TabletServer will call the OnDemandTabletUnloader class specified by the Property TABLE_ONDEMAND_UNLOADER, which will return the set of tablets to unload. Unloading is performed by removing the "ondemand" column from the tablet metadata, which will cause the Manager to unassign the tablets. In the case that the TabletServer is running low on memory it will not call the OnDemandTabletUnloader, instead unloading the tablet with the oldest access time.
New metrics are emitted for on-demand tablets. Specifically, the metric accumulo.tserver.tablets.ondemand.unloaded.lowmem is incremented when an on-demand tablet is unloaded for low memory in the TabletServer, and the metric
accumulo.tserver.tablets.ondemand.online is modified when on-demand tablets are hosted or unloaded.
Finally, a new utility, ListOnlineOnDemandTablets, has been created to list on-demand tablets that are currently hosted.
Closes #3210 #3211 #3212