-
Notifications
You must be signed in to change notification settings - Fork 22
Use more durable IDs for IMF_Beta and IMF_Beta3 #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #225 +/- ##
==========================================
- Coverage 98.76% 98.09% -0.67%
==========================================
Files 101 103 +2
Lines 8646 8667 +21
==========================================
- Hits 8539 8502 -37
- Misses 107 165 +58
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, thanks for these contributions—the User-Agent info, expected failures, and docs URLs are all information that it would be hard to discover independently, so you've saved a lot of time. Much appreciated.
I have a few questions inline, below—can you please answer these? Then I will help adjust the PR as necessary.
As well, I find that the added tests fail. For example:
pytest -m network -k "IMF_DATA and conceptscheme"
gives:
FAILED sdmx/tests/test_sources.py::TestIMF_DATA::test_endpoint[conceptscheme] - requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://api.imf.org/external/sdmx/2.1/conceptscheme/IMF_DATA/CS_MASTER_SYSTEM/latest
I see two issues here: issue 1 is that IMF_DATA does not appear to be the ID that is used for the maintainer of artefacts provided by this web service. For example, when I open these URLs in a web browser:
- https://api.imf.org/external/sdmx/2.1/conceptscheme/IMF_DATA/CS_MASTER_SYSTEM/latest —note "IMF_DATA". This gives an empty StructureMessage.
- https://api.imf.org/external/sdmx/2.1/conceptscheme/IMF/CS_MASTER_SYSTEM/latest —note "IMF". This gives the actual artefact in question.
This happens because sdmx1, unless told otherwise, inserts the ID from sources.json into the query string. One way to fix this is to copy what is done for e.g. ESTAT_COMEXT. Here we have to use a different ID in sources.json to distinguish from the ESTAT source at a different base URL. But the artefacts provided by this service still use ESTAT as the maintainer ID. So we have code like this, note the last line:
sdmx/sdmx/source/estat_comext.py
Lines 4 to 19 in 5588f01
| class Source(ESTAT): | |
| _id = "ESTAT_COMEXT" | |
| def modify_request_args(self, kwargs): | |
| """Supply explicit provider agency ID for ESTAT_COMEXT. | |
| This hook sets the provider to "ESTAT" for structure queries if it is not given | |
| explicitly. | |
| """ | |
| super().modify_request_args(kwargs) | |
| # NB this is an indirect test for resource_type != 'data'; because of | |
| # the way the hook is called, resource_type is not available | |
| # directly. | |
| if "key" not in kwargs: | |
| kwargs.setdefault("provider", "ESTAT") |
(Another way is to insert instead "ALL" so that artefacts with the given ID but with any maintainer are returned.)
For this reason, and also to reduce confusion, I prefer to keep the IDs in sources.json as close to the actual maintainer IDs as possible. To do this, one option would be to use the IDs IMF and IMF3. However, then we would need to change the ID for the older, "SDMX Central" source. Could you suggest a good alternate ID for that source? Or, if there is any reason you would prefer not to change that ID, please explain?
Issue 2 is that, although the above URLs (1) and (2) work in a browser (Firefox in my case), they don't work in code. See the following:
>>> import sdmx
>>> IMF = sdmx.Client("IMF_DATA")
# Fix "issue 1" described above
>>> IMF.source.id = "IMF"
# Request the same URL (2) given above
>>> IMF.conceptscheme("CS_MASTER_SYSTEM")
HTTPError: 403 Client Error: Forbidden for url: https://api.imf.org/external/sdmx/2.1/conceptscheme/IMF/CS_MASTER_SYSTEM/latest
# Inspect a prepared request
>>> req = IMF.conceptscheme("CS_MASTER_SYSTEM", dry_run=True)
>>> req.headers
{'User-Agent': 'python-requests/2.32.3', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
# Give the user-agent header explicitly for a non-data request
>>> IMF.conceptscheme("CS_MASTER_SYSTEM", headers={"User-Agent": "iscript-data-client"})
<sdmx.StructureMessage>
<Header>
id: 'IDREF5229'
prepared: '2025-02-25T09:17:45.722251+00:00'
sender: <Agency unknown>
source:
test: False
response: <Response [200]>
Codelist (3): CL_FREQ CL_UNIT_MULT CL_DECIMALS
ConceptScheme (1): CS_MASTER_SYSTEMI think these happen because you have added the following to sources.json:
"headers": {
"data": {
"User-Agent": "idata-script-client"
}
}"data" here means that this header is only applied for a data request. For the example ConceptScheme request, it is not applied, and I guess this is why there is a 403 response. Is it correct that this User-Agent is needed for every request—not only data requests?
The fact that such headers are narrowly applied to only 1 request type is a limitation of the Client class; see
Lines 233 to 238 in 5588f01
| # Headers: use headers from source config if not given by the caller | |
| if not headers: | |
| try: | |
| headers.update(self.source.headers.get(resource_type.name, {})) | |
| except AttributeError: | |
| pass |
Anyway, please let me know the answers to the questions inline and in this comment. Once you do, I can help update the branch so it is ready to merge.
| The source IDs used in :mod:`sdmx` may change if and when this source exits beta and enters production, or is designated as the recommended, primary, or sole IMF source. | ||
|
|
||
| - The API documentation indicates "Our data are available through SDMX 2.1 and SDMX 3.0 APIs," but the documentation pages mention only the SDMX 2.1 (SDMX-REST 1.x) base URL, https://api.imf.org/external/sdmx/2.1. | ||
| The base URL used by :mod:`sdmx` for the SDMX 3.0 (SDMX-REST 2.x) API is inferred. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the first-party documentation now give this URL explicitly, or will it?
If not, we should not remove this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The swagger page lists SDMX 2.1 and SDMX 3.0. This page requires creating an account to access. The documentation on the API doesn't list either base url. I'm open to expanding the documentation, but my thinking was if you really want the technical details that is what the swagger page is for.
In the upcoming Global Discovery Service both base URLs are registered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the upcoming Global Discovery Service both base URLs are registered.
This sounds intriguing and useful! I've long hoped to basically get rid of sources.json entirely and let this information be retrieved from directory servers—in a way that mirrors DNS and other distributed systems.
However, I haven't seen it mentioned anywhere until your comment. Is this something that the TWG is working on?
(This is tangential, so no urgency.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the TWG has been working on it and it should be released soon. You can find more information about it here so for example a call like https://gds.sdmx.io/agency/IMF/ to get information about the IMF. This may not be the exact format which goes into production. There is also a feature to resolve a URN, i.e. you give it IMF:ECOFIN_DSD(1.0) and it returns https://sdmxcentral.imf.org/sdmx/v2/structure/datastructure/IMF/ECOFIN_DSD/1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
That URL doesn't work on my end, but I managed to extract and decode https://cros.ec.europa.eu/system/files/2024-10/3_4_SDMX_Global_Discovery_Service.pptx.
It would be great if the TWG gave public notice that things like this were being worked on, and solicited comments and inputs before finalizing changes/additions to the standards. Hopefully that can still happen.
I would prefer not to change the ID for "SDMX Central" as someone could be using that. In practice, from this tool's standpoint, that service would mainly be useful to look up artefacts used in relation to the data standards initiative which we do not publish to the global registry, i.e. ECOFIN DSD and related artifacts. While consistency between tools probably isn't important, we did use IMF_DATA for RSDMX for the new service and left IMF for SDMX Central. In the new service, I do not expect any datasets to use the agency IMF. We are using a hierarchical agency so things like the CPI dataset will be published by IMF.STA while the WEO will be published by IMF.RES. Only cross-departmental artefacts we want to govern, such as an official country code list, will have IMF as the agency. So, using all in SDMX 2.1 or * in SDMX 3.0 is probably best.
This is true today, but I expect by the end of the week we will have dropped the requirement for User-Agent in the header in production. Probably better that we cleaned everything else up and then wait to merge it until the User-Agent requirement is dropped. |
|
Great, thanks for these responses and additional info. Let me quickly add 1 or a few commit(s) with the suggested Source subclasses. Then, as you suggest, we can clean up or drop the bits no longer needed, and merge once things are stable on your side. |
Okay, now done. I found I had to also add for the SDMX 3.0.0 service an "Accept:" HTTP header to get SDMX-ML responses, instead of SDMX-JSON. (Support in this package for SDMX-JSON 2.0, corresponding to SDMX 3.0.0, is not yet implemented; see the docs.) I run and see the following: To be clear, these tests are not part of the main suite that must pass to merge a PR. We don't need to address all these now. They are only used on the scheduled job that populates this matrix (I don't know if anyone ever visits it) and to support development. Overall this is actually pretty good performance, since all the more popular endpoints (codelist, dataflow, etc.) are passing. There are a couple things to note:
|
I'll work on this Tuesday/Wednesday. |
|
Thanks for the additional info and work here—I am tied up with other tasks but should be able to come back to it towards the end of this week, make any small changes, approve, and merge. |
|
Okay, having now merged #227 to address #226, I think there should now be fewer cases where limitations in the package itself prevent reading SDMX-ML coming from these updated sources. What I'll do now:
|
- Add headers for all requests. - Drop header entries for these sources in sources.json.
Now done. To be clear, because of the rebase I did |
Now I see: I categorize these as:
I'll investigate (3) and (4) and then push any changes to the branch before approving and merging. |
I can't easily identify a particular parameter to provide for this request, so will skip this for now. @aboddie if you can provide or point to an example of a metadata request that should be expected to work, I can add that in a subsequent PR.
This is now fixed. I've also updated the docs slightly, e.g. to be explicit about "March" of which year. |
|
Note that we expect the "codecov/patch" check to fail here. This is because the tests of Because the branch history includes several duplicate messages ("Update sources.json" 4 times), I will use the squash-and-merge strategy. |
|
@aboddie many thanks again for all the information and contributions! It makes my work as maintainer much easier to have direct input from people with first-hand knowledge of or directly involved with the data sources we try to support. Please don't hesitate in the future to open PRs to make further updates to reflect changes on the operation, documentation URLs, etc. for these services. |
|
Now released with 2.22.0: https://sdmx1.readthedocs.io/en/stable/whatsnew.html#v2-22-0-2025-03-25 @aboddie if you will be linking to these docs from the IMF documentation, I would suggest using /stable/ in the URL, instead of /latest/; or even recommend a specific version with https://sdmx1.readthedocs.io/en/v2.22.0/whatsnew.html. This will probably be less confusing for most users, who will not want to follow unreleased updates. |
|
Thank you for flagging this, updating our documentation shortly. |
Update IMF_beta to IMF_DATA to avoid changing code and documentation after launch as well as update documentation on IMF sources as outlined in issue #224
PR checklist