Add ability to request specific dimension data.#35
Draft
aaronweeden wants to merge 2 commits intoubccr:mainfrom
Draft
Add ability to request specific dimension data.#35aaronweeden wants to merge 2 commits intoubccr:mainfrom
aaronweeden wants to merge 2 commits intoubccr:mainfrom
Conversation
This was referenced Oct 30, 2024
Draft
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
WORK IN PROGRESS
Description
This PR adds the ability to request specific dimension data. For example, given this call to
get_data()(which has the renameddate_rangeandgroup_byparameters), note the fields specified for thegroup_byandfiltersparameters:The
dfvariable will be assigned a data frame like the following (which is in the long format from #32), noting the extra columns for ACCESS ID, ORCID, and Globus ID:access-example11234-5678-1234-5678globus-example1access-example21234-5678-1234-5679globus-example2If the
group_byparameter is just given a single string, e.g.:Then the only dimension columns in the data frame will be ID and Label.
Similarly, if the
filtersparameter does not have a field specified, e.g.:Then the Label field will be used for filtering. Otherwise, it will use the specified field for filtering, as in the first example above.
Otherwise,
group_bycan take a dictionary with a single key where the key is the dimension's ID or label. The value can be a collection (as in the first example above) or a single string, e.g.:In which case the only dimension columns in the data frame will be ID and ORCID.
The data frame returned by the
get_dimension_metadata()method will now include a column listing the additional dimension fields that can be used for grouping or filtering, e.g.:The data frame returned by the
get_dimension_data()method will now have an additionalfieldsparameter that will allow specifying which fields to include in the resulting data frame, as in:The resulting
dfwill have this structure:access-example11234-5678-1234-5678access-example21234-5678-1234-5679If the
fieldsparameter is not given, then only the ID and Label fields will be included.Motivation and Context
Since some entities have multiple IDs associated with them depending on the context, this PR enables the Data Analytics Framework to make it easier to work with such entities.
Tests performed
Types of changes
Checklist:
CHANGELOG.mdhas been updateddocs/developing.md) produces no errorsxdmod-notebooksrepository as necessary, and the notebooks all run successfully