-
Notifications
You must be signed in to change notification settings - Fork 69
Description
One of the MAVEN uri_tests is consistently failing, and I'm not sure why.
def test_load_mag_byorbit_data(self):
config.CONFIG["local_data_dir"] = f"s3://{bucket_name}"
#saved_logging_level = pyspedas.logger.getEffectiveLevel()
#pyspedas.logger.setLevel(logging.DEBUG)
data = maven.mag(trange=[500, 501], datatype="ss1s")
self.assertTrue(len(tplot_names("OB_B*"))>0)
#pyspedas.logger.setLevel(saved_logging_level)
time.sleep(sleep_time)
This test loads a bunch of MAVEN orbit files from NAIF, merges them into a single file, then reads the merged file to map orbit numbers to dates (eventually requesting mag data for a time range specified as orbit numbers).
I'm seeing different sorts of failures on Github (ubuntu, Python 3.12) and my laptop (Mac M2, Python 3.9).
On Github, the merge_orbit_files routine seems to stall partway through:
pattern = "maven_orb_rec(_|)(|.{6})(|_.{9}).orb"
orb_dates = []
orb_files = []
for f in fl:
x = re.match(pattern, f)
if x is not None:
orb_file = os.path.join(orbit_files_path, f) if not is_fsspec_uri(toolkit_path) else "/".join([orbit_files_path, f])
orb_files.append(orb_file)
if x.group(2) != "":
orb_dates.append(x.group(2))
else:
orb_dates.append("999999")
sorted_files = [x for (y, x) in sorted(zip(orb_dates, orb_files))]
with fo as code:
skip_2_lines = False
for o_file in sorted_files:
logging.info("merge_orbit_files processing file %s", o_file)
if is_fsspec_uri(toolkit_path):
# assumes fsspec filesystem triggered above
fo_file = fs.open(o_file)
else:
fo_file = open(o_file)
with fo_file as f:
if skip_2_lines:
f.readline()
f.readline()
skip_2_lines = True
content=f.read()
logging.info("writing %d bytes to output file %s",len(content),output_filename)
if type(content) is bytes:
code.write(str(content))
else:
code.write(content)
From the test log:
16-Jan-25 08:00:21: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_201001_210101_v1.orb
16-Jan-25 08:00:21: writing 82008 bytes to output file s3://test-bucket/orbitfiles/maven_orb_rec_merged.orb
16-Jan-25 08:00:21: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_210101_210401_v1.orb
16-Jan-25 08:00:21: writing 80266 bytes to output file s3://test-bucket/orbitfiles/maven_orb_rec_merged.orb
16-Jan-25 08:00:21: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_210401_210701_v1.orb
16-Jan-25 08:00:21: writing 81204 bytes to output file s3://test-bucket/orbitfiles/maven_orb_rec_merged.orb
16-Jan-25 08:00:21: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_210701_211001_v1.orb
ERROR
It seems to hang while reading the last listed orbit file (which is probably about halfway through the list). Nothing else happens after "ERROR" is printed, until Github kills the action after 6 hours of elapsed time.
On my Mac, the symptom is slightly different. It appears to make it though the complete list of orbit files to make the merged file, but
then while trying to read it, it's apparently empty.
16-Jan-25 00:35:36: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec_240701_241001_v1.orb
16-Jan-25 00:35:36: writing 80936 bytes to output file s3://test-bucket/orbitfiles/merged_maven_orbits.orb
16-Jan-25 00:35:36: merge_orbit_files processing file s3://test-bucket/orbitfiles/maven_orb_rec.orb
16-Jan-25 00:35:36: writing 85894 bytes to output file s3://test-bucket/orbitfiles/merged_maven_orbits.orb
16-Jan-25 00:35:36: Getting orbit info from file s3://test-bucket/orbitfiles/merged_maven_orbits.orb
Error
Traceback (most recent call last):
File "/Users/jwl/PycharmProjects/pyspedas/pyspedas/projects/maven/tests/uri_tests.py", line 439, in test_load_mag_byorbit_data
data = maven.mag(trange=[500, 501], datatype="ss1s")
File "/Users/jwl/PycharmProjects/pyspedas/pyspedas/projects/maven/mag.py", line 76, in mag
return maven_load(
File "/Users/jwl/PycharmProjects/pyspedas/pyspedas/projects/maven/maven_load.py", line 350, in load_data
maven_files = maven_filenames(
File "/Users/jwl/PycharmProjects/pyspedas/pyspedas/projects/maven/maven_load.py", line 80, in maven_filenames
start_date = parse(start_date)
File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 1368, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 640, in parse
res, skipped_tokens = self._parse(timestr, **kwargs)
File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 719, in _parse
l = _timelex.split(timestr) # Splits the timestr into tokens
File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 201, in split
return list(cls(s))
File "/Users/jwl/PycharmProjects/pyspedas/venv/lib/python3.9/site-packages/dateutil/parser/_parser.py", line 69, in __init__
raise TypeError('Parser must be a string or character stream, not '
TypeError: Parser must be a string or character stream, not NoneType
As far as I can tell stepping through with a debugger, the merged_maven_orbits.orb file appears to be empty when it's opened in orbit_time.py:
logging.info("Getting orbit info from file %s", orb_file)
if is_fsspec_uri(toolkit_path):
protocol, path = toolkit_path.split("://")
fs = fsspec.filesystem(protocol)
fileobj = fs.open(orb_file, "r")
else:
fileobj = open(orb_file, "r")
with fileobj as f:
if end_orbit is None:
end_orbit = begin_orbit
orbit_num = []
time = []
f.readline()
f.readline()
for line in f:
line = line[0:28]
line = line.split(" ")
line = [x for x in line if x != ""]
If I set a breakpoint just after it enters the "with fileobj as f" block, and do
content=f.read()
at the console, I get an empty string.
This all works fine when not using an S3 url for local_data_dir.