Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize Adapter constructor API #805

Open
danielballan opened this issue Oct 29, 2024 · 3 comments
Open

Formalize Adapter constructor API #805

danielballan opened this issue Oct 29, 2024 · 3 comments
Assignees

Comments

@danielballan
Copy link
Member

danielballan commented Oct 29, 2024

Summary of discussion with @genematx

We have a diverse variety of classmethod constructors for Adapters. These include from_uri, from_uris, from_single_file, from_array and so on. This is in addition to some adapters that have no classmethod constructors and merely use __init__.

These constructors are used in two distinct use cases. These are registered with two registries, respectively, that are each keyed on MIME type.

  1. I have a file, with an identified MIME type, and I want to inspect the file to determine its structure (and metadata...) and index that in the catalog database.
  2. I have information about a file from the catalog database, and I need to open the file and read out some data.

In the first case, we do not know the structure, and in the second case we always know the structure. This points toward having two classmethod constructors with standard names, one per use case. The __init__ can be reserved as an internal API, corresponding to its status as a single privileged entrypoint. (We listen to @dutc sometimes.)

Benefits:

  • The registries can point to classes like CSVAdapter, not a zoo of diverge classmethods like CSVAdapter.from_single_file.
  • We can remove the fuzzy conditional branch on if structure is None. We know in the calling code whether we have a structure or not.
  • For developers, it will more clear what needs to be implemented. At the moment, variety of contructors makes it hard to know what is required.

Downsides:

  • If we use the classes in the registries, then all Adapters must be classes; they cannot be simple functions. (If this is seen as an important, we can still implement the proposed change but just have the registries point the now-consistent method.)
  • Writing an Adapter requires implementing more code.
@dylanmcreynolds
Copy link
Contributor

This is very cool. I have no issues requiring that Adapters be classes.

@genematx
Copy link
Contributor

genematx commented Nov 1, 2024

@danielballan Can structure and metadata be made properties of an Adapter class (instead of proper callable methods)?

@danielballan
Copy link
Member Author

I think they should remain methods. In fact, a year or so ago we changed metadata from a property to a method. Two reasons:

  1. Support optional async and possibility that these methods may be doing I/O rather than returning cached state.
  2. Leave room for additional optional methods, such as predicate-pushdown on filtered metadata.

Both arguments are stronger for metadata than structure but I am in favor of leaving both as methods to keep our options open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants