Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

freeze degree string representation #2347

Merged
merged 3 commits into from
Mar 30, 2022

Conversation

volodymyrss
Copy link
Contributor

For some reason, the default representation depends on astropy and/or python version.

@codecov
Copy link

codecov bot commented Mar 29, 2022

Codecov Report

Merging #2347 (a1beaaf) into main (db3f08b) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #2347   +/-   ##
=======================================
  Coverage   63.40%   63.40%           
=======================================
  Files         131      131           
  Lines       17044    17044           
=======================================
  Hits        10807    10807           
  Misses       6237     6237           
Impacted Files Coverage Δ
astroquery/heasarc/core.py 73.79% <ø> (ø)

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@eerovaher
Copy link
Member

If the failures were indeed caused by using some particular versions of external software then why were the tests failing only in this subpackage?

@volodymyrss
Copy link
Contributor Author

If the failures were indeed caused by using some particular versions of external software then why were the tests failing only in this subpackage?

This subpackage selects mock request data by constructing filename from the request. This way all of the remote tests can have a local version.
In some cases, request is subtly changed due to some external changes. In this case, there are three options:

  • store mock data for the new request.
    • This is possible and may be feasible if we want to handle the case of a potentially meaningfully different request.
  • use the unmodified mock data for the modified request.
    • this may case the request to break on real remote that, since what was used for the mock data is not the same as the one actually sent
  • modify the request so that it is always the same
    • this is what I've done. Formatting of the degrees is more consistent with fixed precision.

@eerovaher
Copy link
Member

I was wondering what makes this subpackage different from all the rest. I think your explanation sounds reasonable.

But if the problem can be solved by improving the robustness of generating filenames from the requests then why is it necessary to update thousands of lines in data files? If the precision of numerical values would be explicitly set to what it is with the combinations of software that did not cause the tests to fail then the generated filenames should be the same as they have been so far, and there should be no need to modify the mock data at all.

@bsipocz
Copy link
Member

bsipocz commented Mar 29, 2022

I share the concern about regenerated data, do you expect them to be stable after this PR? For the long term we need a solution where mock data may change only if there is a server-side API overhaul and not when more data gets added, etc.

@volodymyrss
Copy link
Contributor Author

I was wondering what makes this subpackage different from all the rest. I think your explanation sounds reasonable.

But if the problem can be solved by improving the robustness of generating filenames from the requests then why is it necessary to update thousands of lines in data files? If the precision of numerical values would be explicitly set to what it is with the combinations of software that did not cause the tests to fail then the generated filenames should be the same as they have been so far, and there should be no need to modify the mock data at all.

I think you are referring to the option 2 in my list above: using the unchanged mock data for changed requests, by making filename generation insensitive to some changes of request, right?
Note that I took this approach in couple of cases, see here

patch_get.assume_fileid_for_request(
lambda url, params: f"last-month-{params['tablehead'].split()[-1]}")
. But in general it can be problematic, since it may create divergence between remote tests and local tests of the same functionality.

I instead went for option 3 - making the request the same for the same user input, with different software versions.
In fact, the output could potentially be meaningfully different for different requests. So user might get meaningfully different results for the same user code with different astropy/python versions. It's good to fix this.
Using request hash as filename helps to catch this kind of cases.

For changing files (also responding to @bsipocz), there are several factors:

  • since some requests are now slightly different (and, hopefully, the same for all versions), it's not unreasonable to keep them in different files, since the output for different request is also different. The content is either identical or slightly different (see below). This accounts for the 1000-line changes. I could make sure the
  • In some cases, the metadata at the service changed. E.g. now there is new table "Fermi LAT 12-Year Point Source Catalog (4FGL-DR3)", and there is IXPE. I guess we can assume this does not happen often - with major releases of datasets, new missions, etc. Should such a change be updated in the mock data? I could just revert to the previous file, with the same name.
  • Some responses are actually a bit random. I could keep the previous version with little effort, since the filename is the same.

@bsipocz
Copy link
Member

bsipocz commented Mar 29, 2022

I think all of the above should be covered by the remote data tests, the mock ones is fine if to be limited to ensure _parse_result(), and other astroquery internals work as expected. E.g. throwing a votable or a fits file to it the method processes it properly. It really is irrelevant that the actual content is roughly right for the mock query or exactly right.
Since we already have many files in the repo it's fine to keep them as per your first point, but I don't think we need to track slight changes. A fits file is a fits file in this context, the main points to be tested is that it reads incorrectly, and e.g. units are parsed well (that's a current know bug).

As for your second example, we're in that situation with skyview, and it's an annoyance really. Releases change, not too often, but often enough, so we're in a constant catch-up with our test suite without really serving any real purpose. Would be nice to not be exactly sensitive on exact matches.

And for random responses, I would say the same as above, if the structure is the same, and one that is expected then it doesn't matter. The big things to look for that if we run into an empty return, or a truncated one or a dropped column when we expect full results, etc.

@volodymyrss
Copy link
Contributor Author

volodymyrss commented Mar 29, 2022

Alright, as I try to elaborate in #2348 , there is really no special reason in this parameterized fixture approach I had to change most of the files. I just wanted to make sure that the code will continue to be able to read new responses. But given the disadvantages, I will revert them. Actually, I could just close this PR, and continue in #2348, as you suggested elsewhere. The comments become similar too.

edit: maybe, to fix the tests quickly, we could revert all the data changes in this PR, and but keep the change to the request - it's an actual fix for the module functionality. And then we can proceed in the other PRs.

@eerovaher
Copy link
Member

The commits have to be squashed because otherwise the data files that were added in the first commit will remain in the repository.

@volodymyrss volodymyrss force-pushed the fix-degree-representation branch from 98f4d59 to 8802f84 Compare March 30, 2022 19:09
@volodymyrss volodymyrss force-pushed the fix-degree-representation branch from 8802f84 to 714e6fd Compare March 30, 2022 19:22
@volodymyrss
Copy link
Contributor Author

The commits have to be squashed because otherwise the data files that were added in the first commit will remain in the repository.

Done!

@bsipocz bsipocz added this to the v0.4.7 milestone Mar 30, 2022
Copy link
Member

@bsipocz bsipocz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@@ -171,7 +171,7 @@ def query_region_async(self, position: Union[coordinates.SkyCoord, str],
# Generate the request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may want to leave a one-liner comment as a reminder why formatting is applied here, otherwise it's likely someone would wonder a few years down the line 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done. I did not manage in one line, only two, but if you have better ideas - please suggest!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multi-liner is fine, too, I just meant that it can be just a very short reminder :)

@bsipocz bsipocz merged commit c081ee4 into astropy:main Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants