-
Notifications
You must be signed in to change notification settings - Fork 22
Refine handling of labels="both" #244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
e469e36 to
10f3358
Compare
|
@aboddie would you mind to give this branch a try? Some context: in #243 I did most of the 'grunt work' I referred to in #242 (comment) and the comment before that. In this PR I've specifically targeted your snippet: import sdmx
dm = smdx.Client("ESTAT").data(
"UNE_RT_A", key={"geo": "EL+ES+IE"}, params={"startPeriod": "2014"},
)
data = sdmx.to_pandas(dm, labels="both") # Note "id", "both", "name" per the standard
print(data.head(5))This revealed some further issues:
So I've corrected this issue. Now, when the data message/data set is read, those Code references (technically, CodedKeyValue) are established right away. Thus, when to_pandas()/PandasConverter receives them, it only needs to format them correctly, and doesn't need to traverse the DSD itself to look up the codes. I see now this is what your code in #242 was doing; but I think I prefer this fix because it generates "more correct" data structures at the moment of reading the message. The above snippet now gives: This is a bit different from your example, which more aligns with labels="name" per the SDMX-CSV 2.0.0 standard. I can try to add that in a later PR or maybe this one, but in the meanwhile if you can please try out the branch and report if it gives roughly the behaviour you expect, that would be much appreciated. |
10f3358 to
9878118
Compare
9878118 to
0aedcf8
Compare
- Rely only on .format_options. - Use base/abstract .csv.common.CSVFormatOptions to indicate "no particular CSV format". - Add ._strict bool attribute.
Replace repeated code with function calls.
0aedcf8 to
3d12dff
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #244 +/- ##
==========================================
- Coverage 99.02% 98.25% -0.77%
==========================================
Files 113 114 +1
Lines 9297 9326 +29
==========================================
- Hits 9206 9163 -43
- Misses 91 163 +72
🚀 New features to boost your workflow:
|
|
Took a brief look at the code, agree this looks like a better approach. Some thoughts:
I can do an in-depth test in two weeks after the global conference, if you want. |
You're right! I misread and thought that labels=both was different in SDMX CSV 1.0 ("ID: Name") versus 2.x ("ID", "Name" in 2 columns). The latter is indeed labels=name as you say, and the former is the same across versions. Thanks for that brief feedback—it's easy to make these small oversights when doing bigger refactoring as in #243. I'll expand the PR to address these points, merge, and then release. There are several other improvements on deck that I'd like to get out the door. If there are further bugs found, those can be fixed in a point release. |
These are already on Lines 9 to 23 in 07683cc
But indeed I can (a) mention in the docs that only key=none is currently supported, (b) validate, and (c) test these. Will do this.
This is something I am sure differs between SDMX-CSV v1.0 and v2.x. In the former, it is only ever "OBS_VALUE", even if labels=both or the primary measure has an ID other than "OBS_VALUE". See:
So I'll have to put in logic that does this only for SDMX-CSV 2.x. |
3d12dff to
72fc370
Compare
- Add tests.
- Simplify Column classes for KeyValue and AttributeValue. - Also write component concept name for measure columns. - Update tests.
72fc370 to
b974f21
Compare
- Update docs.
b974f21 to
1eca925
Compare
Building on #243:
to_pandas(..., labels="both")directly, instead of wrapping in FormatOptions.PR checklist