Skip to content

Discussion: splitting up (or extending) the Catalog traits tabular methods for tables/views separately #186

@gruuya

Description

@gruuya

The iceberg_rust::catalog::Catalog trait currently treats both tables and views as different flavors of the more general tabular (super) type, e.g.

async fn list_tabulars(&self, namespace: &Namespace) -> Result<Vec<Identifier>, Error>;
async fn load_tabular(self: Arc<Self>, identifier: &Identifier) -> Result<Tabular, Error>;

While I find this to be neat/elegant, I do encounter some problems with it occasionally.

Issues

For one thing, note that this is only the case for the above "read" methods, whereas the "write" methods (create/update/delete) are already split up for tables and views.

In addition, it's not aligned with the REST spec, which defines endpoints separately for tables and views. This is also reflected in the java (and python) clients having separate client methods/interfaces for tables and views too.

While relatively benign, this difference can lead to some more important practical limitations:

  • there's no way to know which identifier returned by list_tabulars is a table and which one is a view; another call to load_tabular per identifier would be needed to discern that
  • since list_tabulars chains identifiers coming from the list-tables and list-views endpoint, having access control that e.g. forbids a client to query views means it won't be able to list tables either (even though it has permissions for it)
  • load_tabular does a trial-and-error approach to fetch the tabular—it first tries loading a view and if it fails with error code 404 it tries to load a proper table. Besides the potential above issue with permissions, note that this also means we duplicate the number of API hits per identifier. This in turn can become a problem in a scenario where we're e.g. trying to load all tables in a namespace/catalog periodically (think IMPORT FOREIGN SCHEMA, but for iceberg catalogs), as one more easily gets rate-limited/throttled by the catalog server.

Proposal

If the above issues resonate with people, I can see two potential resolutions here, either

  1. split up the tabular methods into table/view ones, or
  2. extend the existing interface with table/view specific ones

where the former is simpler, but the later retains backward compatibility.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions