Skip to content

Fix metadata table names conflicts #1772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

fivetran-kostaszoumpatianos
Copy link

@fivetran-kostaszoumpatianos fivetran-kostaszoumpatianos commented Jun 2, 2025

Fixes: #1771

Fix reserved Iceberg metadata table name conflicts in Polaris (history, entries, etc.)

When we try to create or load tables named after one of the keywords defined here, Iceberg fails since it tries to resolve them as metadata tables.

The reason is that Polaris will call Catalog::tableExists(...) in IcebergCatalogHandler.java, which internally calls BaseMetastoreCatalog::loadTable(...). If that call returns an exception then the table is considered non-existent. The problem, however, is that loadTable() will try to resolve the table as a metadata table if it's not found and if the table name matches one of the reserved metadata keywords. Specifically, it will try to call loadMetadataTable() using the namespace as a table name, and this will fail.

Since we only care about actual tables when we call tableExists() and not metadata tables, we can override it and provide an implementation that does not rely on loadTable(). As a result, it will not conflict with metadata table names anymore.

@adutra
Copy link
Contributor

adutra commented Jun 2, 2025

Since we only care about actual tables when we call tableExists() and not metadata tables

@fivetran-kostaszoumpatianos I think that the fix you are proposing is going to return false for any metadata table, even if the base table exists. Isn't this concerning?

I wonder if the fix shouldn't be in Iceberg, rather than in Polaris, wdyt?

@fivetran-kostaszoumpatianos
Copy link
Author

fivetran-kostaszoumpatianos commented Jun 2, 2025

thanks @adutra !

As far as I understand, tableExists() should only check if an actual table exists (based on how it's used). I don't think that it was intended behaviour to check for metadata tables.

For example, I can see the following uses (and more but similar in nature):

  • FlinkCatalog::createTableLoader()
  • BaseMetaStoreCatalog::registerTable()
  • DynamoDbCatalog::ensureCatalogTableExistsOrCreate()
  • DynamoDbCatalog::ensureLockTableExistsOrCreate()
  • JdbcCatalog::ranameTable()
  • JdbcCatalog::ranameView()
  • JdbcCatalogOperations::createTable()
  • EcsCatalog::dropTable()
  • EcsCatalog::renameTable()

Indeed, we could fix that in iceberg, that was my first thought, but the fix would be similar in nature. Load table will internally try to resolve a metadata table that doesn't exist as a metadata table, that's ok and not really fixable since we have arbitrary namespace nesting. Instead, iceberg should provide a way of knowing if this table actually exists or not. I think semantically, this was the purpose of tableExists() and this function should only check for real tables, but since it is implemented at the base class, the only way they had to implement it in a generic way was by wrapping loadTable() in a try/catch, which I think is ok as generic, catalog-agnostic logic, but it breaks in the REST catalog case. So I think the Polaris catalog should provide an implementation that "unbreaks" it.

What do you think? Do you think that semantically it should also check for metadata table names, and do you know of any such usages?

Thanks again for looking at my PR!

@fivetran-kostaszoumpatianos
Copy link
Author

an alternative at the Polaris level would be to add a:
boolean realTableExists(...) { ... } method in the IcebergCatalog class.
Then inside IcebergCatalogHandler, every time that we would call tableExists(...), we can call that method (after checking if we can cast the catalog to an IcebergCatalog). That would be in:

  • createTableDirectWithWriteDelegation
  • stageTableCreateHelper

Also, as far as I understand, the IcebergCatalogHandler::tableExists method that we "expose" in the REST API, uses loadTable internally, so this wouldn't be affected neither by the change above nor by the change proposed in the current state of this PR.

@adutra
Copy link
Contributor

adutra commented Jun 2, 2025

To be clear, I agree that calling tableExists on a metadata table does not make a ton of sense, and I also agree that the default implementation of tableExists in both Catalog and SessionCatalog are just poor man's implementations.

I just think that it would be better to fix that in Iceberg so that all catalog impls benefit from the fix (as it's not specific to the REST catalog).

import java.util.Map;
import java.util.Optional;
import java.util.UUID;
import java.util.*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try to avoid star imports, so let's change this back

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, that looked weird. Either spotlessApply or intelliJ for some reason thought that this is ok. I will revert it back.

@Override
public boolean tableExists(TableIdentifier identifier) {
if (isValidIdentifier(identifier)) {
return newTableOps(identifier).current() != null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, can we just use loadTable directly like the current code does?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of defies the purpose, as this will call BaseMetastoreCatalog::loadTable and this will try to resolve the table as a metadata table - if it doesn't exist. We should avoid that if we want to allow Polaris to create tables named after these keywords.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. For context, this will force a trip to object storage to load the table's metadata, which #433 (being re-implemented) proposes to skip via a trip to the metastore. I know this method isn't called often, but it's a shame if we have to lose that performance optimization.

Copy link
Author

@fivetran-kostaszoumpatianos fivetran-kostaszoumpatianos Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadTable(...) currently does exactly the same thing internally.
Once we optimize that and table metadata are promoted to metastore we could adapt this method as well.

@fivetran-kostaszoumpatianos
Copy link
Author

Thanks @adutra,

BothHiveCatalog and EcsCatalog provide their own implementations of tableExists. GlueCatalog on the other hand doesn't but still it doesn't break when history (for example) is used as a new table name. This, however, is just a client, same as RESTCatalog.
What I want to stress is that it looks like common practice for inheritors of this class to provide their own tableExists implementations and since the part that breaks is within Polaris itself and not in the client, I was thinking that it probably made more sense to override that in IcebergCatalog in Polaris. It is also faster in terms of release cycles.

That said, we could have fixed that at the BaseMetastoreCatalog level by overriding tableExists there as:

  @Override
  public boolean tableExists(TableIdentifier identifier) {
    if (isValidIdentifier(identifier)) {
      return newTableOps(identifier).current() != null;
    }
    return false;
  }

Is this what you had in mind? I am wondering if that would create more issues downstream since inheritor implementations might rely on this "broken feature".

wdyt?

@adutra
Copy link
Contributor

adutra commented Jun 3, 2025

Hey @fivetran-kostaszoumpatianos thanks for the pointers. It's interesting to notice that HiveCatalog's implementation of tableExists also considers metadata tables, but EcsCatalog's one doesn't 🤷‍♂️

Now I'm trying to reproduce the issue. But so far I can't. Do you have a reproducer?

With Spark + Polaris, I can't see any errors, the following commands all succeeded:

spark-sql ()> create namespace test;
Time taken: 0.223 seconds
spark-sql ()> create table test.history (c1 int);
Time taken: 2.314 seconds
spark-sql ()> show tables in test;
history
Time taken: 0.192 seconds, Fetched 1 row(s)
spark-sql ()> insert into test.history VALUES (1), (2), (3);
Time taken: 7.436 seconds
spark-sql ()> select * from test.history;
1
2
3
Time taken: 2.833 seconds, Fetched 3 row(s)
spark-sql ()> select * from test.history.history;
2025-06-03 11:57:17.582 7530028985343056875     NULL    true
Time taken: 2.322 seconds, Fetched 1 row(s)
spark-sql ()> spark-sql ()> select * from test.history.snapshots;
2025-06-03 11:57:17.582 7530028985343056875     NULL    append  file:/var/tmp/quickstart_catalog/test/history/metadata/snap-7530028985343056875-1-0a43529e-1043-4008-820b-37242a6ace67.avro     {"added-data-files":"3","added-files-size":"1221","added-records":"3","app-id":"local-1748944575124","changed-partition-count":"1","engine-name":"spark","engine-version":"3.5.5","iceberg-version":"Apache Iceberg unspecified (commit 7dbafb438ee1e68d0047bebcb587265d7d87d8a1)","spark.app.id":"local-1748944575124","total-data-files":"3","total-delete-files":"0","total-equality-deletes":"0","total-files-size":"1221","total-position-deletes":"0","total-records":"3"}
Time taken: 2.421 seconds, Fetched 1 row(s)

@adutra
Copy link
Contributor

adutra commented Jun 3, 2025

These commands also work:

spark-sql ()> show create table test.history;
CREATE TABLE polaris.test.history (
  c1 INT)
USING iceberg
LOCATION 'file:///var/tmp/quickstart_catalog/test/history'
TBLPROPERTIES (
  'current-snapshot-id' = 'none',
  'format' = 'iceberg/parquet',
  'format-version' = '2',
  'write.parquet.compression-codec' = 'zstd')

Time taken: 5.194 seconds, Fetched 1 row(s)
spark-sql ()> show create table test.history.history;
CREATE TABLE polaris.test.history.history (
  made_current_at TIMESTAMP NOT NULL,
  snapshot_id BIGINT NOT NULL,
  parent_id BIGINT,
  is_current_ancestor BOOLEAN NOT NULL)
USING iceberg
LOCATION 'file:///var/tmp/quickstart_catalog/test/history'
TBLPROPERTIES (
  'current-snapshot-id' = 'none',
  'format' = 'iceberg/parquet')

Time taken: 29.912 seconds, Fetched 1 row(s)
spark-sql ()> 

@fivetran-kostaszoumpatianos
Copy link
Author

Thanks @adutra , this is interesting.
I was able to replicate that via the iceberg library in Java.

Something like that fails for me:

          Namespace namespace = Namespace.of("test_database");
          TableIdentifier tableId = TableIdentifier.of(namespace.toString(), "history");
           
           // Set up catalog properties
          var uri = "http://127.0.0.1:8181/api/catalog";
          var catalogName = "quickstart_catalog";
          var clientCredentials = "XXXXX:YYYYYY";

          Map<String, String> properties = new HashMap<>();
          properties.put(CatalogProperties.CATALOG_IMPL, "org.apache.iceberg.rest.RESTCatalog");
          properties.put("header.X-Iceberg-Access-Delegation", "true");
          properties.put(CatalogProperties.URI, uri);
          properties.put("header.Polaris-Client-ID", UUID.randomUUID().toString().replace("-", ""));
          properties.put(CatalogProperties.WAREHOUSE_LOCATION, catalogName);
          properties.put("credential", clientCredentials);
          properties.put("scope", "PRINCIPAL_ROLE:ALL");
  
          RESTCatalog catalog = new RESTCatalog();
          catalog.initialize("polaris", properties); 
          catalog.setConf(new org.apache.hadoop.conf.Configuration()); 
  
            if (!catalog.namespaceExists(namespace)) {
                catalog.createNamespace(namespace);
            }      
 
            // Create table
            Map<String, Type> schema = new HashMap<>();
            schema.put("id", Types.IntegerType.get());
            schema.put("number", Types.IntegerType.get());
            Set<String> pks = new HashSet<>();
            pks.add("id");
            catalog.buildTable(
                            TableIdentifier.of(namespace.toString(), "history"),
                            buildIcebergSchema(schema, pks))
                    .withProperty("stage-create", "true")
                    .create();

@adutra
Copy link
Contributor

adutra commented Jun 3, 2025

@fivetran-kostaszoumpatianos I transformed your example above into a test (to be added in PolarisApplicationIntegrationTest):

  @Test
  public void createTableFails() throws IOException {
    String catalogName = client.newEntityName("createTableFails");
    createCatalog(
        catalogName,
        Catalog.TypeEnum.INTERNAL,
        principalRoleName,
        FileStorageConfigInfo.builder(StorageConfigInfo.StorageTypeEnum.FILE)
            .setAllowedLocations(List.of(baseLocation.toString()))
            .build(),
        baseLocation.toString());
    try (RESTSessionCatalog catalog = newSessionCatalog(catalogName)) {
      Namespace ns = Namespace.of("ns1");
      SessionCatalog.SessionContext sessionContext = SessionCatalog.SessionContext.createEmpty();
      if (!catalog.namespaceExists(sessionContext, ns)) {
        catalog.createNamespace(sessionContext, ns);
      }
      catalog
          .buildTable(
              sessionContext,
              TableIdentifier.of(ns, "history"),
              new Schema(
                  List.of(
                      Types.NestedField.required(1, "id", Types.IntegerType.get()),
                      Types.NestedField.required(2, "number", Types.IntegerType.get())),
                  Set.of(1)))
          .withSortOrder(SortOrder.unsorted())
          .withPartitionSpec(PartitionSpec.unpartitioned())
          .withProperty("stage-create", "true")
          .create();
    }
  }

But the test passes 🤷‍♂️
I still cannot reproduce the bug.

@adutra
Copy link
Contributor

adutra commented Jun 3, 2025

After chatting with @fivetran-kostaszoumpatianos we were finally able to reproduce consistently: the issue only happens with credential vending.

The following test, declared in PolarisRestCatalogIntegrationTest reproduces the issue:

  @RestCatalogConfig({"header.X-Iceberg-Access-Delegation", "vended-credentials"})
  @Test
  public void createTableFails() {
    Namespace ns = Namespace.of("ns1");
    if (!restCatalog.namespaceExists(ns)) {
      restCatalog.createNamespace(ns);
    }
    restCatalog
        .buildTable(
            TableIdentifier.of(ns, "history"),
            new Schema(
                List.of(
                    Types.NestedField.required(1, "id", Types.IntegerType.get()))))
        .withSortOrder(SortOrder.unsorted())
        .withPartitionSpec(PartitionSpec.unpartitioned())
        .withProperty("stage-create", "true")
        .create();
  }

The error is:

Server error: IllegalStateException: invalid_key_for_passthrough_resolved_path: key={} passthroughPaths={}, [ns1, {ns1.history=entityNames:[ns1, history];lastEntityType:TABLE_LIKE;isOptional:true}]
org.apache.iceberg.exceptions.ServiceFailureException: Server error: IllegalStateException: invalid_key_for_passthrough_resolved_path: key={} passthroughPaths={}, [ns1, {ns1.history=entityNames:[ns1, history];lastEntityType:TABLE_LIKE;isOptional:true}]
	at org.apache.iceberg.rest.ErrorHandlers$DefaultErrorHandler.accept(ErrorHandlers.java:241)
	at org.apache.iceberg.rest.ErrorHandlers$TableErrorHandler.accept(ErrorHandlers.java:123)
	at org.apache.iceberg.rest.ErrorHandlers$TableErrorHandler.accept(ErrorHandlers.java:107)
	at org.apache.iceberg.rest.HTTPClient.throwFailure(HTTPClient.java:215)
	at org.apache.iceberg.rest.HTTPClient.execute(HTTPClient.java:299)
	at org.apache.iceberg.rest.BaseHTTPClient.post(BaseHTTPClient.java:88)
	at org.apache.iceberg.rest.RESTSessionCatalog$Builder.create(RESTSessionCatalog.java:771)
	at org.apache.polaris.service.it.test.PolarisRestCatalogIntegrationTest.createTableFails(PolarisRestCatalogIntegrationTest.java:650)

@adutra
Copy link
Contributor

adutra commented Jun 3, 2025

It only happens with credentials vending because the createTableDirectWithWriteDelegation method has this check:

And this check is absent from createTableDirect.

That check is throwing the ServiceFailureException.

@adutra
Copy link
Contributor

adutra commented Jun 3, 2025

So here is my analyzis:

  1. tableExists(ns1.history) is called
  2. loadTable(ns1.history) is called
  3. Following the logic in BaseMetastoreCatalog a first TableOperations is created for the identifier ns1.history.
  4. IcebergCatalog.doRefresh() is called for ns1.history, which calls PolarisResolutionManifest.getPassthroughResolvedPath(ns1.history).
  5. PolarisResolutionManifest.passthroughPaths contains the key ns1.history, so that call doesn't throw, and IcebergCatalog.doRefresh() returns null => the table doesn't exist
  6. Still in BaseMetastoreCatalog, another TableOperations is created for identifier ns1 (trying to figure out if this is the base table of a hypothetical metadata table).
  7. IcebergCatalog.doRefresh() is called again, but this time for ns1.
  8. PolarisResolutionManifest.getPassthroughResolvedPath(ns1) is called, and that call throws, because ns1 is not a key in PolarisResolutionManifest.passthroughPaths.

I still don't know if we should fix something in the Resolver or change how tableExists is implemented as @fivetran-kostaszoumpatianos suggested. IMHO getPassthroughResolvedPath should be resilient to "unresolved" paths, or at least be prepared to receive the parent of a resolved path.

@dennishuo @collado-mike WDYT?

@fivetran-kostaszoumpatianos
Copy link
Author

Thank you very much @adutra, this demystifies the issue.

My 2 cents 🪙 : maybe we can fix both.

  • I think we shouldn't be calling loadTable() when it's not required as this complicates a relatively simple operation. At least at the Polaris level, where we can restrict pretty well what it means for a table to exist semantically.
  • I also agree that the resolver should be robust. Maybe we can open a separate PR that solves that at that level as well.

@dennishuo @collado-mike WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reserved Iceberg metadata table names in Polaris (history, entries, etc.)
3 participants