-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add standard unit of measure support #202
Comments
I don't think pint would go into the standard itself - but hopefully the standard would enable someone to write a library-agnostic version of pint-pandas! |
Yep, that's what I mean, it'd be good for the dataframe-api to specify a standard mechanism for transmitting unit of measure data (and/or a mechanism for transmitting metadata + a mechanism that determines how that metadata can change across operations on dfs). |
It seems to me like this is related to gh-40, which discussed adding a way to incorporate any kind of metadata beyond what was standardized in the interchange protocol. The transmitting or storing part is fairly clear I think. The second part of you suggestion here is less clear to me @kszlim. That seems to suggest some kind of hook that any dataframe library must call after each method it calls. That could be quite expensive to do I think, and there may be other/simpler alternatives there (if the dataframe object lives in a |
Hmm, I see. I'm not sure how a I guess it's pretty hard if not impossible to make it work agnostically without defining a huge space of operations on the dataframe api itself (which I think you guys are trying to avoid?). |
All "base" dataframe objects have the same API, so I imagine you could store it as a private attribute. Something like: class PintDataFrame
def __init__(self, base_dataframe : StandardDataFrame, units_metadata : ?) -> PintDataFrame:
self._df = base_dataframe
def sum(*, skip_nulls: bool = True) -> PintDataFrame:
"""Reduction returns a 1-row DataFrame."""
result = self._df.sum(skip_nulls=skip_nulls)
# If needed, manipulate units metadata here
result_metadata = self.units_metadata # or some transformation
return PintDataFrame(result, units_metadata=result_metadata) For all methods that don't actually change the units, I imagine there's a way to handle them in an automated/streamlined fashion. And for the ones that do, the custom logic needs to be written once and is independent of what library the base dataframe comes from. |
I see, I guess that this will still require a bunch of custom implementations if there are operations that dont' delegate to "base" dataframe methods, but I suppose that's probably impossible to avoid altogether. |
I don't know if it's possible, but having a standard way to thread through unit of measures would be great.
Ideally you could implement something like pint-pandas but instead as
pint-dataframe
and it would interop seamlessly with all dataframe libraries.The text was updated successfully, but these errors were encountered: