Skip to content

add nrow and ncol #40

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 4, 2021
Merged

add nrow and ncol #40

merged 5 commits into from
Sep 4, 2021

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Jul 22, 2021

@codecov
Copy link

codecov bot commented Jul 22, 2021

Codecov Report

Merging #40 (0e62aad) into main (708b8ba) will increase coverage by 1.28%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #40      +/-   ##
==========================================
+ Coverage   91.30%   92.59%   +1.28%     
==========================================
  Files           1        1              
  Lines          23       27       +4     
==========================================
+ Hits           21       25       +4     
  Misses          2        2              
Impacted Files Coverage Δ
src/DataAPI.jl 92.59% <100.00%> (+1.28%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 708b8ba...0e62aad. Read the comment docs.

@nalimilan
Copy link
Member

Makes sense. Maybe we should also say that Tables.jl is the owner of the function, and add a fallback definition there (that would notably work for vectors of names tuples)? I'm also not sure whether we should provide a docstring here, or in Tables.jl instead, given that we don't define any method here (cf. discussion regarding join functions).

@bkamins
Copy link
Member Author

bkamins commented Jul 23, 2021

@quinnj - is making Tables.jl an owner feasible? I was thinking about it also, but I was not sure.

An alternative would be to make the default implementation in DataAPI.jl that would return missing (signaling that it is unknown)?
Then clearly Tables.jl could add extra methods in cases when nrow or ncol is known.

@nalimilan
Copy link
Member

I'd rather throw a MethodError than return missing for unsupported types. The latter could be useful for table objects for which the number of rows cannot be known in advance (e.g. iterators of named tuples).

@bkamins
Copy link
Member Author

bkamins commented Jul 23, 2021

OK - makes sense. With the definitions I gave now MethodError is thrown appropriately.
So I assume that if we merge and release it Tables.jl should add nrow and ncol definitions to the tables it ships. Right?

@bkamins
Copy link
Member Author

bkamins commented Jul 23, 2021

Ah - and I made the docsting, as as opposed to join* case I think we have a fixed API here (as opposed to join* where different packages can probably have different args/kwargs)

@ablaom
Copy link

ablaom commented Jul 24, 2021

Just FYI: For a Tables.jl implementation one might look at what we have currently in MLJBase.jl (courtesy of @OkonSamuel): https://github.com/JuliaAI/MLJBase.jl/blob/f21d295d3f0714ec8f5dcc93d1aa34ccbf16fdff/src/interface/data_utils.jl#L77 .

@quinnj
Copy link
Member

quinnj commented Aug 3, 2021

I'm fine with adding these definitions here; I've mentioned elsewhere in various Tables.jl issues, however, that nrow and ncol aren't well-defined for all "tables" (in the Tables.jl source definition). You can have a row-iterator table with Base.SizeUnknown() or a filtering generator of NamedTuples and you won't be able to call nrow on those.

Now, I will say that's just the row-based table case. For column-based tables, we do expect the # of rows to be defined, and Tables.jl has Tables.rowcount internally defined that works on any result of Tables.columns(x).

And so, I'd say I don't think I'm on board with having Tables.jl be the owner, because I'm afraid that will give the impression that nrow is now supported on any table. It makes it tricky though because to rename Tables.rowcount to nrow, it kind of needs to be the fallback definition. It's a little messy.

@bkamins
Copy link
Member Author

bkamins commented Aug 3, 2021

So - in summary. What I propose now is just what is needed if I understand the comments correctly. Let us wait for @nalimilan to review.

@bkamins
Copy link
Member Author

bkamins commented Sep 4, 2021

@nalimilan - could you please review. After merging this I would make a 1.9 release. Thank you!

Co-authored-by: Milan Bouchet-Valat <[email protected]>
@bkamins bkamins merged commit 92b0def into main Sep 4, 2021
@bkamins bkamins deleted the bkamins-patch-2 branch September 4, 2021 21:26
@bkamins
Copy link
Member Author

bkamins commented Sep 4, 2021

Thank you! Making a release

@ablaom
Copy link

ablaom commented Sep 4, 2021

cc @OkonSamuel

@matthieugomez
Copy link

matthieugomez commented Sep 6, 2021

Would not nrows and ncols be better names? It would be more consistent with Base (e.g. ndims, nfields, etc). (see also JuliaData/DataFrames.jl#2247)

@bkamins
Copy link
Member Author

bkamins commented Sep 6, 2021

I think it is too late to change this for DataFrames.jl, and given we will stick to nrow and ncol there it is better to use a common naming scheme across packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants