Make `nrows` a method separate from `schema`

Recall that `ScientificTypes.schema` is essentially the same as `Tables.schema` but with `scitypes` *and* `nrows` added as fields. 

There are two reasons for separating out `nrows`:

1. Computing `nrows` for a general table is typically more expensive than computing the `schema`. In particular, assuming progress towards #127, the compute times are significant in the extreme case of out-of-memory tables, for example.  `Tables.schema` does not include anything about the number of rows for this reason. 

2. With `nrows` gone, we can view a `schema` as a table (implement the Tables.jl interface for it) which would be convenient. For example, for a large table `DataFrame(schema(X))` would then work. 

As an aside, the current implementation of `schema(X).nrows` is poor for row-based tables and @OkonSamuel  improved this in MLJBase, where there is already a separate `nrows` method (extended from MLJModelInterface). This implementation could be ported to ScientificTypes to replace the current `schema(X).nrows`. 

Sadly, this is breaking. I would propose adding the `nrows` method and deprecating the `nrows` field of `Schema`. We could already implement `Schema` as a table, simply ignoring this field. 

Are there any objections or other thoughts regarding this plan?

@OkonSamuel @quinnj @juliohm @bkamins

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make `nrows` a method separate from `schema` #137

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Make nrows a method separate from schema #137

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Make `nrows` a method separate from `schema` #137