Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support lock-free commit for Iceberg using HMS #22182

Open
oneonestar opened this issue May 29, 2024 · 3 comments
Open

Support lock-free commit for Iceberg using HMS #22182

oneonestar opened this issue May 29, 2024 · 3 comments
Labels
iceberg Iceberg connector performance

Comments

@oneonestar
Copy link
Member

For HMS with HIVE-26882, we can avoid using table lock during commit to Iceberg table.
This improves performance of concurrent write to iceberg table and reduce the chance of having an unreleased lock stuck in HMS.

finally {
try {
thriftMetastore.releaseTableLock(lockId);
}
catch (RuntimeException e) {
// Release lock step has failed. Not throwing this exception, after commit has already succeeded.
// So, that underlying iceberg API will not do the metadata cleanup, otherwise table will be in unusable state.
// If configured and supported, the unreleased lock will be automatically released by the metastore after not hearing a heartbeat for a while,
// or otherwise it might need to be manually deleted from the metastore backend storage.
log.error(e, "Failed to release lock %s when committing to table %s", lockId, table.getTableName());
}
}

apache/iceberg#6570 implemented iceberg.engine.hive.lock-enabled = false. All writers including Trino, Spark and other engines should honor this setting to avoid using different locking mechanism, which could result to data corruption.

An unreleased lock could result in the following error:

Query 20240528_062551_35616_6hrf3 failed: Timed out waiting for lock 46108 for query 20240528_062551_35616_6hrf3
io.trino.spi.TrinoException: Timed out waiting for lock 46108 for query 20240528_062551_35616_6hrf3
	at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.acquireLock(ThriftHiveMetastore.java:1784)
	at io.trino.plugin.hive.metastore.thrift.ThriftHiveMetastore.acquireTableExclusiveLock(ThriftHiveMetastore.java:1765)
	at io.trino.plugin.iceberg.catalog.hms.HiveMetastoreTableOperations.commitToExistingTable(HiveMetastoreTableOperations.java:66)
	at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.commit(AbstractIcebergTableOperations.java:171)
	at org.apache.iceberg.BaseTransaction.lambda$commitSimpleTransaction$3(BaseTransaction.java:417)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:413)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:219)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:203)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:196)
	at org.apache.iceberg.BaseTransaction.commitSimpleTransaction(BaseTransaction.java:413)
	at org.apache.iceberg.BaseTransaction.commitTransaction(BaseTransaction.java:308)
	at io.trino.plugin.iceberg.IcebergMetadata.finishInsert(IcebergMetadata.java:1016)
...
@findepi
Copy link
Member

findepi commented May 29, 2024

This is a good idea, but needs to be coordinated across all applications that currently use locks.
is this related to @pvary's apache/iceberg#6648 ?

@findepi findepi added iceberg Iceberg connector performance labels May 29, 2024
@pvary
Copy link

pvary commented May 29, 2024

apache/iceberg#6648 is only the refactoring, which makes apache/iceberg#6570 possible. The later PR is the one which enables the lock-free commit.

If you enable the lock-free commit on table level, then you have to make sure, that every writer of the table uses Iceberg 1.3.0 version or later, so they will use the appropriate locking mechanism. For more details check the end of this paragraph: https://iceberg.apache.org/docs/nightly/configuration/#hadoop-configuration

Edit: Don't forget that you need the correct HMS version too.

@arghya18
Copy link

arghya18 commented Nov 8, 2024

@findepi any update on this? how we can make lock free commit using Trino? Also if a iceberg table is locked permanently, how can we unlock it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
iceberg Iceberg connector performance
Development

No branches or pull requests

4 participants