Skip to content

aggregate the result of DocumentStatistic #313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 10, 2023
Merged

Conversation

ArneBinder
Copy link
Owner

@ArneBinder ArneBinder commented Aug 10, 2023

This PR modifies the DocumentStatistic

  • to return aggregated results instead of the lists of collected values. The default aggregation functions can be set via the class variable DEFAULT_AGGREGATION_FUNCTIONS: List[str] (the fucntion names are resolved against some inhouse methods (mean, median, std), or python builtin functions and, if this does not work, against the global namespace via utils.hydra.resolve_target).
  • adds the parameter show_histogram: Iff True (default: False), show the histogram of the collected values with plotext on the console
  • adds the parameter show_as_markdown: Iff True (default: False), show the result as markdown on the console (requires the tabulate package)
  • adds the parameter title: this is used in the histogram and markdown table captions

This also adds the property current_split to the DocumentMetric which is available when processing a dataset dict.

Note: This is breaking with respect to #312 because before this PR, the DocumentStatistic returned a dict with list values, but now the values are aggregated (we do not label it as breaking because #312 was not yet part of a release). However, by using aggregation_functions=["list"] the previous behavior can be more or less restored (just the keys are a bit different).

@ArneBinder ArneBinder added the enhancement New feature or request label Aug 10, 2023
@ArneBinder ArneBinder added the breaking Breaking Changes label Aug 10, 2023
@ArneBinder ArneBinder removed the breaking Breaking Changes label Aug 10, 2023
@ArneBinder ArneBinder changed the title DocumentStatistic aggregate their results aggregate the result of DocumentStatistic Aug 10, 2023
@ArneBinder ArneBinder merged commit 93eecec into main Aug 10, 2023
@ArneBinder ArneBinder deleted the aggregate_statistics branch August 10, 2023 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant