Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset Comparison tool #322

Open
joaquinvanschoren opened this issue Dec 3, 2023 · 1 comment
Open

Dataset Comparison tool #322

joaquinvanschoren opened this issue Dec 3, 2023 · 1 comment
Milestone

Comments

@joaquinvanschoren
Copy link
Contributor

Proposed by @ogrisel - A 'comparison' view to see how two datasets differ, including for instance:

  • list column with different names between 2 versions of the same dataset or 2 datasets chosen by the user,
  • list change of data type representation for columns with same names,
  • list per-column number of rows with changed values and show the first 5 differing row values,

Possible approach: the new dataset table view allows users to select rows and do action on the selected datasets. 'Compare' could be one such action.

@ogrisel
Copy link

ogrisel commented Dec 8, 2023

Thanks for opening this feature request. A related feature request would be to ask the dataset uploaders to better trace the lineage of their uploads.

For instance by linking to a public git repo with a script that can reproduce the version of the data uploaded to openml.org from the original raw data (if publicly available on another website).

Similarly, when uploading a new version, it would be helpful to document the relevant changes in such a script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants