Skip to content

Conversation

@aboddie
Copy link

@aboddie aboddie commented Feb 24, 2025

Update IMF_beta to IMF_DATA to avoid changing code and documentation after launch as well as update documentation on IMF sources as outlined in issue #224

PR checklist

  • Checks all ✅
  • Update documentation
  • Update doc/whatsnew.rst

@codecov
Copy link

codecov bot commented Feb 24, 2025

Codecov Report

Attention: Patch coverage is 70.27027% with 11 lines in your changes missing coverage. Please review.

Project coverage is 98.09%. Comparing base (5a6ea56) to head (c95ed03).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
sdmx/source/imf_data3.py 50.00% 6 Missing ⚠️
sdmx/source/imf_data.py 57.14% 3 Missing ⚠️
sdmx/reader/xml/v30.py 80.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #225      +/-   ##
==========================================
- Coverage   98.76%   98.09%   -0.67%     
==========================================
  Files         101      103       +2     
  Lines        8646     8667      +21     
==========================================
- Hits         8539     8502      -37     
- Misses        107      165      +58     
Files with missing lines Coverage Δ
sdmx/tests/test_sources.py 92.99% <100.00%> (-3.29%) ⬇️
sdmx/reader/xml/v30.py 93.69% <80.00%> (-1.68%) ⬇️
sdmx/source/imf_data.py 57.14% <57.14%> (ø)
sdmx/source/imf_data3.py 50.00% <50.00%> (ø)

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@khaeru khaeru added the data-source Issues related to specific web services/data source(s) label Feb 24, 2025
Copy link
Owner

@khaeru khaeru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, thanks for these contributions—the User-Agent info, expected failures, and docs URLs are all information that it would be hard to discover independently, so you've saved a lot of time. Much appreciated.

I have a few questions inline, below—can you please answer these? Then I will help adjust the PR as necessary.

As well, I find that the added tests fail. For example:

pytest -m network -k "IMF_DATA and conceptscheme"

gives:

FAILED sdmx/tests/test_sources.py::TestIMF_DATA::test_endpoint[conceptscheme] - requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.imf.org/external/sdmx/2.1/conceptscheme/IMF_DATA/CS_MASTER_SYSTEM/latest

I see two issues here: issue 1 is that IMF_DATA does not appear to be the ID that is used for the maintainer of artefacts provided by this web service. For example, when I open these URLs in a web browser:

  1. https://api.imf.org/external/sdmx/2.1/conceptscheme/IMF_DATA/CS_MASTER_SYSTEM/latest —note "IMF_DATA". This gives an empty StructureMessage.
  2. https://api.imf.org/external/sdmx/2.1/conceptscheme/IMF/CS_MASTER_SYSTEM/latest —note "IMF". This gives the actual artefact in question.

This happens because sdmx1, unless told otherwise, inserts the ID from sources.json into the query string. One way to fix this is to copy what is done for e.g. ESTAT_COMEXT. Here we have to use a different ID in sources.json to distinguish from the ESTAT source at a different base URL. But the artefacts provided by this service still use ESTAT as the maintainer ID. So we have code like this, note the last line:

class Source(ESTAT):
_id = "ESTAT_COMEXT"
def modify_request_args(self, kwargs):
"""Supply explicit provider agency ID for ESTAT_COMEXT.
This hook sets the provider to "ESTAT" for structure queries if it is not given
explicitly.
"""
super().modify_request_args(kwargs)
# NB this is an indirect test for resource_type != 'data'; because of
# the way the hook is called, resource_type is not available
# directly.
if "key" not in kwargs:
kwargs.setdefault("provider", "ESTAT")

(Another way is to insert instead "ALL" so that artefacts with the given ID but with any maintainer are returned.)

For this reason, and also to reduce confusion, I prefer to keep the IDs in sources.json as close to the actual maintainer IDs as possible. To do this, one option would be to use the IDs IMF and IMF3. However, then we would need to change the ID for the older, "SDMX Central" source. Could you suggest a good alternate ID for that source? Or, if there is any reason you would prefer not to change that ID, please explain?

Issue 2 is that, although the above URLs (1) and (2) work in a browser (Firefox in my case), they don't work in code. See the following:

>>> import sdmx
>>> IMF = sdmx.Client("IMF_DATA")

# Fix "issue 1" described above
>>> IMF.source.id = "IMF"

# Request the same URL (2) given above
>>> IMF.conceptscheme("CS_MASTER_SYSTEM")

HTTPError: 403 Client Error: Forbidden for url: https://api.imf.org/external/sdmx/2.1/conceptscheme/IMF/CS_MASTER_SYSTEM/latest

# Inspect a prepared request
>>> req = IMF.conceptscheme("CS_MASTER_SYSTEM", dry_run=True)
>>> req.headers
{'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

# Give the user-agent header explicitly for a non-data request
>>> IMF.conceptscheme("CS_MASTER_SYSTEM", headers={"User-Agent": "iscript-data-client"})
<sdmx.StructureMessage>
  <Header>
    id: 'IDREF5229'
    prepared: '2025-02-25T09:17:45.722251+00:00'
    sender: <Agency unknown>
    source: 
    test: False
  response: <Response [200]>
  Codelist (3): CL_FREQ CL_UNIT_MULT CL_DECIMALS
  ConceptScheme (1): CS_MASTER_SYSTEM

I think these happen because you have added the following to sources.json:

"headers": {
  "data": {
    "User-Agent": "idata-script-client"
  }
}

"data" here means that this header is only applied for a data request. For the example ConceptScheme request, it is not applied, and I guess this is why there is a 403 response. Is it correct that this User-Agent is needed for every request—not only data requests?

The fact that such headers are narrowly applied to only 1 request type is a limitation of the Client class; see

sdmx/sdmx/client.py

Lines 233 to 238 in 5588f01

# Headers: use headers from source config if not given by the caller
if not headers:
try:
headers.update(self.source.headers.get(resource_type.name, {}))
except AttributeError:
pass
—I would like to relax this some day, but in the meantime I think the best way to address this is a custom Source subclass that always sets this User-Agent.

Anyway, please let me know the answers to the questions inline and in this comment. Once you do, I can help update the branch so it is ready to merge.

The source IDs used in :mod:`sdmx` may change if and when this source exits beta and enters production, or is designated as the recommended, primary, or sole IMF source.

- The API documentation indicates "Our data are available through SDMX 2.1 and SDMX 3.0 APIs," but the documentation pages mention only the SDMX 2.1 (SDMX-REST 1.x) base URL, https://api.imf.org/external/sdmx/2.1.
The base URL used by :mod:`sdmx` for the SDMX 3.0 (SDMX-REST 2.x) API is inferred.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the first-party documentation now give this URL explicitly, or will it?

If not, we should not remove this line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The swagger page lists SDMX 2.1 and SDMX 3.0. This page requires creating an account to access. The documentation on the API doesn't list either base url. I'm open to expanding the documentation, but my thinking was if you really want the technical details that is what the swagger page is for.

In the upcoming Global Discovery Service both base URLs are registered.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the upcoming Global Discovery Service both base URLs are registered.

This sounds intriguing and useful! I've long hoped to basically get rid of sources.json entirely and let this information be retrieved from directory servers—in a way that mirrors DNS and other distributed systems.

However, I haven't seen it mentioned anywhere until your comment. Is this something that the TWG is working on?

(This is tangential, so no urgency.)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the TWG has been working on it and it should be released soon. You can find more information about it here so for example a call like https://gds.sdmx.io/agency/IMF/ to get information about the IMF. This may not be the exact format which goes into production. There is also a feature to resolve a URN, i.e. you give it IMF:ECOFIN_DSD(1.0) and it returns https://sdmxcentral.imf.org/sdmx/v2/structure/datastructure/IMF/ECOFIN_DSD/1.0.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

here

That URL doesn't work on my end, but I managed to extract and decode https://cros.ec.europa.eu/system/files/2024-10/3_4_SDMX_Global_Discovery_Service.pptx.

It would be great if the TWG gave public notice that things like this were being worked on, and solicited comments and inputs before finalizing changes/additions to the standards. Hopefully that can still happen.

@khaeru khaeru changed the title Give durable id is IMF_Beta and IMF_Beta3 Use more durable IDs for IMF_Beta and IMF_Beta3 Feb 25, 2025
@khaeru khaeru linked an issue Feb 25, 2025 that may be closed by this pull request
@aboddie
Copy link
Author

aboddie commented Feb 25, 2025

I prefer to keep the IDs in sources.json as close to the actual maintainer IDs as possible. To do this, one option would be to use the IDs IMF and IMF3. However, then we would need to change the ID for the older, "SDMX Central" source. Could you suggest a good alternate ID for that source? Or, if there is any reason you would prefer not to change that ID, please explain?

I would prefer not to change the ID for "SDMX Central" as someone could be using that. In practice, from this tool's standpoint, that service would mainly be useful to look up artefacts used in relation to the data standards initiative which we do not publish to the global registry, i.e. ECOFIN DSD and related artifacts. While consistency between tools probably isn't important, we did use IMF_DATA for RSDMX for the new service and left IMF for SDMX Central.

In the new service, I do not expect any datasets to use the agency IMF. We are using a hierarchical agency so things like the CPI dataset will be published by IMF.STA while the WEO will be published by IMF.RES. Only cross-departmental artefacts we want to govern, such as an official country code list, will have IMF as the agency. So, using all in SDMX 2.1 or * in SDMX 3.0 is probably best.

Is it correct that this User-Agent is needed for every request—not only data requests?

This is true today, but I expect by the end of the week we will have dropped the requirement for User-Agent in the header in production. Probably better that we cleaned everything else up and then wait to merge it until the User-Agent requirement is dropped.

@khaeru
Copy link
Owner

khaeru commented Feb 27, 2025

Great, thanks for these responses and additional info.

Let me quickly add 1 or a few commit(s) with the suggested Source subclasses. Then, as you suggest, we can clean up or drop the bits no longer needed, and merge once things are stable on your side.

@khaeru
Copy link
Owner

khaeru commented Feb 27, 2025

Let me quickly add 1 or a few commit(s) with the suggested Source subclasses.

Okay, now done. I found I had to also add for the SDMX 3.0.0 service an "Accept:" HTTP header to get SDMX-ML responses, instead of SDMX-JSON. (Support in this package for SDMX-JSON 2.0, corresponding to SDMX 3.0.0, is not yet implemented; see the docs.)

I run and see the following:

$ pytest -m network -k "IMF_DATA"

FAILED sdmx/tests/test_sources.py::TestIMF_DATA::test_endpoint[datastructure] - sdmx.exceptions.XMLParseError: AttributeError: 'AttributeDescriptor' object has no attribute 'keys'
FAILED sdmx/tests/test_sources.py::TestIMF_DATA::test_endpoint[hierarchicalcodelist] - sdmx.exceptions.XMLParseError: KeyError: 'leveled'
FAILED sdmx/tests/test_sources.py::TestIMF_DATA::test_endpoint[metadata] - NotImplementedError
FAILED sdmx/tests/test_sources.py::TestIMF_DATA::test_endpoint[metadataflow] - ValueError: can't determine a reader for response content type None
FAILED sdmx/tests/test_sources.py::TestIMF_DATA::test_endpoint[metadatastructure] - ValueError: can't determine a reader for response content type None
FAILED sdmx/tests/test_sources.py::TestIMF_DATA3::test_endpoint[codelist] - sdmx.exceptions.XMLParseError: NotImplementedError: ('{http://www.sdmx.org/resources/sdmxml/schemas/v3_0/common}AnnotationValue', 'start')
FAILED sdmx/tests/test_sources.py::TestIMF_DATA3::test_endpoint[conceptscheme] - sdmx.exceptions.XMLParseError: NotImplementedError: ('{http://www.sdmx.org/resources/sdmxml/schemas/v3_0/common}AnnotationValue', 'start')
FAILED sdmx/tests/test_sources.py::TestIMF_DATA3::test_endpoint[contentconstraint] - requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.imf.org/external/sdmx/3.0/structure/contentconstraint/all/all/+
FAILED sdmx/tests/test_sources.py::TestIMF_DATA3::test_endpoint[data] - ValueError: Unexpected/unhandled parameters {'c[TIME_PERIOD]': 'ge:2018'}
FAILED sdmx/tests/test_sources.py::TestIMF_DATA3::test_endpoint[dataflow] - sdmx.exceptions.XMLParseError: NotImplementedError: ('{http://www.sdmx.org/resources/sdmxml/schemas/v3_0/common}AnnotationValue', 'start')
FAILED sdmx/tests/test_sources.py::TestIMF_DATA3::test_endpoint[datastructure] - sdmx.exceptions.XMLParseError: NotImplementedError: ('{http://www.sdmx.org/resources/sdmxml/schemas/v3_0/common}AnnotationValue', 'start')
FAILED sdmx/tests/test_sources.py::TestIMF_DATA3::test_endpoint[hierarchicalcodelist] - requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.imf.org/external/sdmx/3.0/structure/hierarchicalcodelist/all/all/+
FAILED sdmx/tests/test_sources.py::TestIMF_DATA3::test_endpoint[metadata] - ValueError: Missing required parameter 'provider_id'
FAILED sdmx/tests/test_sources.py::TestIMF_DATA3::test_endpoint[metadatastructure] - sdmx.exceptions.XMLParseError: NotImplementedError: ('{http://www.sdmx.org/resources/sdmxml/schemas/v3_0/common}AnnotationValue', 'start')
=================================================================== 14 failed, 10 passed, 1890 deselected, 44 xfailed in 27.74s ====================================================================

To be clear, these tests are not part of the main suite that must pass to merge a PR. We don't need to address all these now. They are only used on the scheduled job that populates this matrix (I don't know if anyone ever visits it) and to support development.

Overall this is actually pretty good performance, since all the more popular endpoints (codelist, dataflow, etc.) are passing.

There are a couple things to note:

  • Some of these are just hitting limitations of sdmx1, for instance "NotImplementedError …AnnotationValue" in SDMX-ML 3.0.0 (assuming this validates according to the official XSDs). We should mark these for now as xfail in test_sources.py, and I would make separate issue(s) and PR(s) to address these → Read SDMX-ML returned by IMF sources #226.
  • Others are apparently limitations of the REST services, for instance the 403 and 404 errors, and the SDMX 2.1 responses for "metadata", "metadataflow", and "metadatastructure". These should be marked in sources.json.

@aboddie
Copy link
Author

aboddie commented Mar 3, 2025

  1. User-agent is no longer required in the header, this was removed today.
  2. By default for SDMX 3.0 we return JSON for SDMX 2.1 we return XML
  3. Metadataflows, Metadata sets, and metadata structures are only supported by us for SDMX 3.0
  4. We do not support hierarchicalcodelist, we support simple hierarchies in the codelist artifacts or more complex ones in the SDMX 3.0's hierarchy artefact.
  5. We do not support contentconstraint at this time
  6. I would expect datastructure to work in SDMX 2.1, maybe the issue is how we specify attachment. based on the error message maybe you require AttachmentGroup?

I'll work on this Tuesday/Wednesday.

@khaeru
Copy link
Owner

khaeru commented Mar 12, 2025

Thanks for the additional info and work here—I am tied up with other tasks but should be able to come back to it towards the end of this week, make any small changes, approve, and merge.

@khaeru
Copy link
Owner

khaeru commented Mar 14, 2025

Okay, having now merged #227 to address #226, I think there should now be fewer cases where limitations in the package itself prevent reading SDMX-ML coming from these updated sources.

What I'll do now:

@khaeru
Copy link
Owner

khaeru commented Mar 14, 2025

  • Push those changes.

Now done. To be clear, because of the rebase I did git push --force, so you should try git pull --rebase on your end to get the exact content as associated with the PR. The last commit hash should be c463d78.

@khaeru
Copy link
Owner

khaeru commented Mar 14, 2025

Now I see:

FAILED sdmx/tests/test_sources.py::TestIMF_DATA::test_endpoint[metadataflow] - ValueError: can't determine a reader for response content type None
FAILED sdmx/tests/test_sources.py::TestIMF_DATA::test_endpoint[metadatastructure] - ValueError: can't determine a reader for response content type None
FAILED sdmx/tests/test_sources.py::TestIMF_DATA3::test_endpoint[codelist] - RuntimeError: 1050 uncollected items
FAILED sdmx/tests/test_sources.py::TestIMF_DATA3::test_endpoint[data] - ValueError: Unexpected/unhandled parameters {'c[TIME_PERIOD]': 'ge:2018'}
FAILED sdmx/tests/test_sources.py::TestIMF_DATA3::test_endpoint[datastructure] - sdmx.exceptions.XMLParseError: NotImplementedError: ('{http://www.sdmx.org/resources/sdmxml/schemas/v3_0/structure}Metadat...
FAILED sdmx/tests/test_sources.py::TestIMF_DATA3::test_endpoint[metadata] - ValueError: Missing required parameter 'provider_id'
============================ 6 failed, 12 passed, 1907 deselected, 47 xfailed, 3 xpassed in 15.67s ============================

I categorize these as:

  1. TestIMF_DATA::test_endpoint[metadataflow], TestIMF_DATA::test_endpoint[metadatastructure] —covered by your comment above.
  2. TestIMF_DATA3::test_endpoint[data], TestIMF_DATA3::test_endpoint[datastructure] —these are more limitations of the current package. I'll make separate issues for these and we'll tolerate them for now.
  3. TestIMF_DATA3::test_endpoint[metadata] —this may be remedied by giving the missing parameter.
  4. TestIMF_DATA3::test_endpoint[codelist] —this may indicate that I introduced some issue in Miscellaneous improvements from 2025-W08 #227 that was not covered by existing tests.

I'll investigate (3) and (4) and then push any changes to the branch before approving and merging.

@khaeru
Copy link
Owner

khaeru commented Mar 14, 2025

  1. TestIMF_DATA3::test_endpoint[metadata] —this may be remedied by giving the missing parameter.

I can't easily identify a particular parameter to provide for this request, so will skip this for now. @aboddie if you can provide or point to an example of a metadata request that should be expected to work, I can add that in a subsequent PR.

  1. TestIMF_DATA3::test_endpoint[codelist] —this may indicate that I introduced some issue in Miscellaneous improvements from 2025-W08 #227 that was not covered by existing tests.

This is now fixed.

I've also updated the docs slightly, e.g. to be explicit about "March" of which year.

@khaeru
Copy link
Owner

khaeru commented Mar 14, 2025

Note that we expect the "codecov/patch" check to fail here. This is because the tests of sdmx.source.imf_data and sdmx.source.imf_data3 are only done as part of the scheduled runs of the sources.yaml workflow from the main branch, which will only happen after this PR is merged. So this failure is not an obstacle to merge the PR.

Because the branch history includes several duplicate messages ("Update sources.json" 4 times), I will use the squash-and-merge strategy.

@khaeru khaeru merged commit 55c45fb into khaeru:main Mar 14, 2025
19 of 20 checks passed
@khaeru
Copy link
Owner

khaeru commented Mar 14, 2025

@aboddie many thanks again for all the information and contributions! It makes my work as maintainer much easier to have direct input from people with first-hand knowledge of or directly involved with the data sources we try to support.

Please don't hesitate in the future to open PRs to make further updates to reflect changes on the operation, documentation URLs, etc. for these services.

@khaeru
Copy link
Owner

khaeru commented Mar 26, 2025

Now released with 2.22.0: https://sdmx1.readthedocs.io/en/stable/whatsnew.html#v2-22-0-2025-03-25

@aboddie if you will be linking to these docs from the IMF documentation, I would suggest using /stable/ in the URL, instead of /latest/; or even recommend a specific version with https://sdmx1.readthedocs.io/en/v2.22.0/whatsnew.html. This will probably be less confusing for most users, who will not want to follow unreleased updates.

@aboddie
Copy link
Author

aboddie commented Mar 28, 2025

Thank you for flagging this, updating our documentation shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data-source Issues related to specific web services/data source(s)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use more durable IDs for IMF_Beta and IMF_Beta3

2 participants