Refine handling of labels="both" #244

khaeru · 2025-09-24T14:59:20Z

Building on #243:

Allow passing to_pandas(..., labels="both") directly, instead of wrapping in FormatOptions.
Simplify .convert.pandas.
Improve BaseDataStructureDefinition.make_key(): if a dimension is enumerated (by a Codelist), replace a string code ID with a reference to the actual Code.

PR checklist

Checks all ✅
Update documentation
Update doc/whatsnew.rst

Items missed in #243.

khaeru · 2025-09-24T15:29:32Z

@aboddie would you mind to give this branch a try?

Some context: in #243 I did most of the 'grunt work' I referred to in #242 (comment) and the comment before that.

In this PR I've specifically targeted your snippet:

import sdmx

dm = smdx.Client("ESTAT").data(
    "UNE_RT_A", key={"geo": "EL+ES+IE"}, params={"startPeriod": "2014"},
)
data = sdmx.to_pandas(dm, labels="both")  # Note "id", "both", "name" per the standard
print(data.head(5))

This revealed some further issues:

The specific usage of key= here triggers a pre-request for the DSD, in order to construct/validate the key for the actual data request.
This DSD is passed on when the data message is read.
However, BaseDataStructureDefinition.make_key() was not making complete use of this information. For example, a key/value pair like geo="EL" from the message was stored as Python str ("EL") instead of as a reference to the Code (EL: Greece) from the codelist for the "geo" dimension—even though this latter was already available (attached to the DSD).

So I've corrected this issue. Now, when the data message/data set is read, those Code references (technically, CodedKeyValue) are established right away. Thus, when to_pandas()/PandasConverter receives them, it only needs to format them correctly, and doesn't need to traverse the DSD itself to look up the codes. I see now this is what your code in #242 was doing; but I think I prefer this fix because it generates "more correct" data structures at the moment of reading the message.

The above snippet now gives:

freq  Time frequency  age     Age class            unit    Unit of measure                               sex  Sex      geo  Geopolitical entity (reporting)  TIME_PERIOD  Time
A     Annual          Y15-24  From 15 to 24 years  PC_ACT  Percentage of population in the labour force  F    Females  EL   Greece                           2014         2014    58.5
                                                                                                                                                             2015         2015    54.8
                                                                                                                                                             2016         2016    52.1
                                                                                                                                                             2017         2017    49.0
                                                                                                                                                             2018         2018    45.4
Name: value, dtype: float64

This is a bit different from your example, which more aligns with labels="name" per the SDMX-CSV 2.0.0 standard. I can try to add that in a later PR or maybe this one, but in the meanwhile if you can please try out the branch and report if it gives roughly the behaviour you expect, that would be much appreciated.

- Rely only on .format_options. - Use base/abstract .csv.common.CSVFormatOptions to indicate "no particular CSV format". - Add ._strict bool attribute.

Replace repeated code with function calls.

codecov · 2025-09-24T16:16:09Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.25%. Comparing base (07683cc) to head (1eca925).
⚠️ Report is 15 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #244      +/-   ##
==========================================
- Coverage   99.02%   98.25%   -0.77%     
==========================================
  Files         113      114       +1     
  Lines        9297     9326      +29     
==========================================
- Hits         9206     9163      -43     
- Misses         91      163      +72

Files with missing lines	Coverage Δ
sdmx/convert/pandas.py	`99.70% <100.00%> (-0.04%)`	⬇️
sdmx/format/csv/common.py	`100.00% <100.00%> (ø)`
sdmx/format/csv/v1.py	`100.00% <100.00%> (ø)`
sdmx/model/common.py	`99.71% <100.00%> (+<0.01%)`	⬆️
sdmx/model/internationalstring.py	`100.00% <100.00%> (ø)`
sdmx/tests/convert/test_pandas.py	`100.00% <100.00%> (ø)`
sdmx/tests/format/test_csv.py	`100.00% <100.00%> (ø)`
sdmx/tests/format/test_xml.py	`100.00% <ø> (ø)`
sdmx/tests/reader/test_csv.py	`100.00% <ø> (ø)`
sdmx/tests/reader/test_json.py	`100.00% <ø> (ø)`
... and 7 more

... and 20 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

aboddie · 2025-09-25T00:07:06Z

Took a brief look at the code, agree this looks like a better approach. Some thoughts:

My understanding althought maybe I misread is your output above is actually labels=name. Labels=both should put id and name in the same column i.e. "A: Annual" i.e. this line doesn't look right (csv_v2, Labels.both): [KeyValueID, KeyValueName].
Might want to go on and include formatting options for keys (none, obs, series, and both) for CSV 2.0 even if they are not implemented for now.
The measure(s) should also have headers depending on the label format.

I can do an in-depth test in two weeks after the global conference, if you want.

khaeru · 2025-09-25T05:51:44Z

My understanding althought maybe I misread is your output above is actually labels=name. Labels=both should put id and name in the same column i.e. "A: Annual" i.e. this line doesn't look right (csv_v2, Labels.both): [KeyValueID, KeyValueName].

You're right! I misread and thought that labels=both was different in SDMX CSV 1.0 ("ID: Name") versus 2.x ("ID", "Name" in 2 columns). The latter is indeed labels=name as you say, and the former is the same across versions.

Thanks for that brief feedback—it's easy to make these small oversights when doing bigger refactoring as in #243. I'll expand the PR to address these points, merge, and then release. There are several other improvements on deck that I'd like to get out the door.

If there are further bugs found, those can be fixed in a point release.

khaeru · 2025-09-26T21:15:12Z

Might want to go on and include formatting options for keys (none, obs, series, and both) for CSV 2.0 even if they are not implemented for now.

These are already on main per the last PR:

sdmx/sdmx/format/csv/v2.py

Lines 9 to 23 in 07683cc

    
           class Keys(Enum): 
        
               """SDMX-CSV 2.x 'keys' parameter.""" 
        
               #: No related columns. 
        
               none = auto() 
        
               #: Both :attr:`obs` and :attr:`series`. 
        
               both = auto() 
        
               #: Include ``OBS_KEY`` column with key values for all dimension(s). 
        
               obs = auto() 
        
               #: Include ``SERIES_KEY`` column with key values for all dimension(s) *except* the 
        
               #: one(s) attached to each observation. 
        
               series = auto()

But indeed I can (a) mention in the docs that only key=none is currently supported, (b) validate, and (c) test these. Will do this.

The measure(s) should also have headers depending on the label format.

This is something I am sure differs between SDMX-CSV v1.0 and v2.x. In the former, it is only ever "OBS_VALUE", even if labels=both or the primary measure has an ID other than "OBS_VALUE". See:

So I'll have to put in logic that does this only for SDMX-CSV 2.x.

- Add tests.

- Simplify Column classes for KeyValue and AttributeValue. - Also write component concept name for measure columns. - Update tests.

- Update docs.

khaeru added 4 commits September 24, 2025 15:46

Handle {label,time_format} kwargs to to_pandas()

f4b4a15

Handle bare str in KeyValueBoth

9f8b04e

Update documentation with TODOs

ee617a6

Items missed in #243.

Add InternationalString.__bool__

c4c10e2

khaeru self-assigned this Sep 24, 2025

khaeru added bug enh Enhancements & new features xml SDMX-ML format reader Read file formats defined by the SDMX standards labels Sep 24, 2025

khaeru temporarily deployed to publish September 24, 2025 14:59 — with GitHub Actions Inactive

khaeru added a commit that referenced this pull request Sep 24, 2025

Add #244 to doc/whatsnew

10f3358

khaeru force-pushed the enh/to_pandas-labels branch from e469e36 to 10f3358 Compare September 24, 2025 15:11

khaeru temporarily deployed to publish September 24, 2025 15:11 — with GitHub Actions Inactive

khaeru added a commit that referenced this pull request Sep 24, 2025

Add #244 to doc/whatsnew

9878118

khaeru force-pushed the enh/to_pandas-labels branch from 10f3358 to 9878118 Compare September 24, 2025 15:57

khaeru temporarily deployed to publish September 24, 2025 15:57 — with GitHub Actions Inactive

khaeru added a commit that referenced this pull request Sep 24, 2025

Add #244 to doc/whatsnew

0aedcf8

khaeru force-pushed the enh/to_pandas-labels branch from 9878118 to 0aedcf8 Compare September 24, 2025 16:05

khaeru temporarily deployed to publish September 24, 2025 16:05 — with GitHub Actions Inactive

khaeru added 4 commits September 24, 2025 18:13

Add ItemScheme.get() with default

7da305a

Use Code where available in DSD.make_key()

c46d083

Remove PandasConvert.format

d0dfcdf

- Rely only on .format_options. - Use base/abstract .csv.common.CSVFormatOptions to indicate "no particular CSV format". - Add ._strict bool attribute.

Add .pandas._component_to_column_name()

b2f05ae

Replace repeated code with function calls.

khaeru added a commit that referenced this pull request Sep 24, 2025

Add #244 to doc/whatsnew

3d12dff

khaeru force-pushed the enh/to_pandas-labels branch from 0aedcf8 to 3d12dff Compare September 24, 2025 16:14

khaeru temporarily deployed to publish September 24, 2025 16:14 — with GitHub Actions Inactive

khaeru added a commit that referenced this pull request Sep 30, 2025

Add #244 to doc/whatsnew

72fc370

khaeru force-pushed the enh/to_pandas-labels branch from 3d12dff to 72fc370 Compare September 30, 2025 19:45

khaeru temporarily deployed to publish September 30, 2025 19:45 — with GitHub Actions Inactive

khaeru added 5 commits September 30, 2025 21:55

Check invalid labels in csv.v1.FormatOptions

182548b

- Add tests.

Handle Labels=both and Labels=name

5c96c90

- Simplify Column classes for KeyValue and AttributeValue. - Also write component concept name for measure columns. - Update tests.

Move Attributes to .format.csv.common

bdf9915

Shorten .tests.{format,reader,writer} file names

7fe6d27

Raise NIE on Keys.{both,obs,series}; test

f7f79e2

khaeru added a commit that referenced this pull request Sep 30, 2025

Add #244 to doc/whatsnew

b974f21

khaeru force-pushed the enh/to_pandas-labels branch from 72fc370 to b974f21 Compare September 30, 2025 19:55

khaeru temporarily deployed to publish September 30, 2025 19:55 — with GitHub Actions Inactive

Add #244 to doc/whatsnew

1eca925

- Update docs.

khaeru force-pushed the enh/to_pandas-labels branch from b974f21 to 1eca925 Compare September 30, 2025 20:10

khaeru temporarily deployed to publish September 30, 2025 20:10 — with GitHub Actions Inactive

khaeru merged commit 59f003d into main Sep 30, 2025
20 checks passed

khaeru deleted the enh/to_pandas-labels branch September 30, 2025 20:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refine handling of labels="both" #244

Refine handling of labels="both" #244

Uh oh!

khaeru commented Sep 24, 2025 •

edited

Loading

Uh oh!

khaeru commented Sep 24, 2025

Uh oh!

codecov bot commented Sep 24, 2025 •

edited

Loading

Uh oh!

aboddie commented Sep 25, 2025

Uh oh!

khaeru commented Sep 25, 2025

Uh oh!

khaeru commented Sep 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Refine handling of labels="both" #244

Refine handling of labels="both" #244

Uh oh!

Conversation

khaeru commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR checklist

Uh oh!

khaeru commented Sep 24, 2025

Uh oh!

codecov bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

aboddie commented Sep 25, 2025

Uh oh!

khaeru commented Sep 25, 2025

Uh oh!

khaeru commented Sep 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

khaeru commented Sep 24, 2025 •

edited

Loading

codecov bot commented Sep 24, 2025 •

edited

Loading