-
Notifications
You must be signed in to change notification settings - Fork 8
Make nrows
a method separate from schema
#137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If you want to introduce |
@bkamins That's great feedback. Will overload DataAPI.jl |
We need to add |
|
In case JuliaData/DataAPI.jl#40 takes long, probably you can expose |
Recall that
ScientificTypes.schema
is essentially the same asTables.schema
but withscitypes
andnrows
added as fields.There are two reasons for separating out
nrows
:Computing
nrows
for a general table is typically more expensive than computing theschema
. In particular, assuming progress towards Towards a more efficientschema
methods for row-based tables #127, the compute times are significant in the extreme case of out-of-memory tables, for example.Tables.schema
does not include anything about the number of rows for this reason.With
nrows
gone, we can view aschema
as a table (implement the Tables.jl interface for it) which would be convenient. For example, for a large tableDataFrame(schema(X))
would then work.As an aside, the current implementation of
schema(X).nrows
is poor for row-based tables and @OkonSamuel improved this in MLJBase, where there is already a separatenrows
method (extended from MLJModelInterface). This implementation could be ported to ScientificTypes to replace the currentschema(X).nrows
.Sadly, this is breaking. I would propose adding the
nrows
method and deprecating thenrows
field ofSchema
. We could already implementSchema
as a table, simply ignoring this field.Are there any objections or other thoughts regarding this plan?
@OkonSamuel @quinnj @juliohm @bkamins
The text was updated successfully, but these errors were encountered: