-
Notifications
You must be signed in to change notification settings - Fork 248
Fix metadata table names conflicts #1772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix metadata table names conflicts #1772
Conversation
@fivetran-kostaszoumpatianos I think that the fix you are proposing is going to return false for any metadata table, even if the base table exists. Isn't this concerning? I wonder if the fix shouldn't be in Iceberg, rather than in Polaris, wdyt? |
thanks @adutra ! As far as I understand, For example, I can see the following uses (and more but similar in nature):
Indeed, we could fix that in iceberg, that was my first thought, but the fix would be similar in nature. Load table will internally try to resolve a metadata table that doesn't exist as a metadata table, that's ok and not really fixable since we have arbitrary namespace nesting. Instead, iceberg should provide a way of knowing if this table actually exists or not. I think semantically, this was the purpose of What do you think? Do you think that semantically it should also check for metadata table names, and do you know of any such usages? Thanks again for looking at my PR! |
an alternative at the Polaris level would be to add a:
Also, as far as I understand, the |
To be clear, I agree that calling I just think that it would be better to fix that in Iceberg so that all catalog impls benefit from the fix (as it's not specific to the REST catalog). |
import java.util.Map; | ||
import java.util.Optional; | ||
import java.util.UUID; | ||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We try to avoid star imports, so let's change this back
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, that looked weird. Either spotlessApply
or intelliJ for some reason thought that this is ok. I will revert it back.
@Override | ||
public boolean tableExists(TableIdentifier identifier) { | ||
if (isValidIdentifier(identifier)) { | ||
return newTableOps(identifier).current() != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of this, can we just use loadTable
directly like the current code does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of defies the purpose, as this will call BaseMetastoreCatalog::loadTable and this will try to resolve the table as a metadata table - if it doesn't exist. We should avoid that if we want to allow Polaris to create tables named after these keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. For context, this will force a trip to object storage to load the table's metadata, which #433 (being re-implemented) proposes to skip via a trip to the metastore. I know this method isn't called often, but it's a shame if we have to lose that performance optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loadTable(...)
currently does exactly the same thing internally.
Once we optimize that and table metadata are promoted to metastore we could adapt this method as well.
Thanks @adutra, Both That said, we could have fixed that at the
Is this what you had in mind? I am wondering if that would create more issues downstream since inheritor implementations might rely on this "broken feature". wdyt? |
Hey @fivetran-kostaszoumpatianos thanks for the pointers. It's interesting to notice that Now I'm trying to reproduce the issue. But so far I can't. Do you have a reproducer? With Spark + Polaris, I can't see any errors, the following commands all succeeded:
|
These commands also work:
|
Thanks @adutra , this is interesting. Something like that fails for me:
|
@fivetran-kostaszoumpatianos I transformed your example above into a test (to be added in @Test
public void createTableFails() throws IOException {
String catalogName = client.newEntityName("createTableFails");
createCatalog(
catalogName,
Catalog.TypeEnum.INTERNAL,
principalRoleName,
FileStorageConfigInfo.builder(StorageConfigInfo.StorageTypeEnum.FILE)
.setAllowedLocations(List.of(baseLocation.toString()))
.build(),
baseLocation.toString());
try (RESTSessionCatalog catalog = newSessionCatalog(catalogName)) {
Namespace ns = Namespace.of("ns1");
SessionCatalog.SessionContext sessionContext = SessionCatalog.SessionContext.createEmpty();
if (!catalog.namespaceExists(sessionContext, ns)) {
catalog.createNamespace(sessionContext, ns);
}
catalog
.buildTable(
sessionContext,
TableIdentifier.of(ns, "history"),
new Schema(
List.of(
Types.NestedField.required(1, "id", Types.IntegerType.get()),
Types.NestedField.required(2, "number", Types.IntegerType.get())),
Set.of(1)))
.withSortOrder(SortOrder.unsorted())
.withPartitionSpec(PartitionSpec.unpartitioned())
.withProperty("stage-create", "true")
.create();
}
} But the test passes 🤷♂️ |
After chatting with @fivetran-kostaszoumpatianos we were finally able to reproduce consistently: the issue only happens with credential vending. The following test, declared in @RestCatalogConfig({"header.X-Iceberg-Access-Delegation", "vended-credentials"})
@Test
public void createTableFails() {
Namespace ns = Namespace.of("ns1");
if (!restCatalog.namespaceExists(ns)) {
restCatalog.createNamespace(ns);
}
restCatalog
.buildTable(
TableIdentifier.of(ns, "history"),
new Schema(
List.of(
Types.NestedField.required(1, "id", Types.IntegerType.get()))))
.withSortOrder(SortOrder.unsorted())
.withPartitionSpec(PartitionSpec.unpartitioned())
.withProperty("stage-create", "true")
.create();
} The error is:
|
It only happens with credentials vending because the Line 426 in 5fe9fd9
And this check is absent from That check is throwing the |
So here is my analyzis:
I still don't know if we should fix something in the @dennishuo @collado-mike WDYT? |
Thank you very much @adutra, this demystifies the issue. My 2 cents 🪙 : maybe we can fix both.
@dennishuo @collado-mike WDYT? |
Fixes: #1771
Fix reserved Iceberg metadata table name conflicts in Polaris (history, entries, etc.)
When we try to create or load tables named after one of the keywords defined here, Iceberg fails since it tries to resolve them as metadata tables.
The reason is that Polaris will call
Catalog::tableExists(...)
in IcebergCatalogHandler.java, which internally callsBaseMetastoreCatalog::loadTable(...)
. If that call returns an exception then the table is considered non-existent. The problem, however, is thatloadTable()
will try to resolve the table as a metadata table if it's not found and if the table name matches one of the reserved metadata keywords. Specifically, it will try to callloadMetadataTable()
using the namespace as a table name, and this will fail.Since we only care about actual tables when we call
tableExists()
and not metadata tables, we can override it and provide an implementation that does not rely onloadTable()
. As a result, it will not conflict with metadata table names anymore.