Skip to content

Error reading files with pattern-defying CF Conventions value format #295

@sadielbartholomew

Description

@sadielbartholomew

I recently encountered a file (from investigating spammy warnings which brought me to Unidata/cftime#328 - see the file attached in the opening comment there as an example) which had the Conventions global attribute value of :Conventions = "CF-1.6/CF-1.7" (checked via ncdump -h), , a compound form which isn't standard that cfdm can't read it because it errors on processing the version in a naive way, taking whatever it finds after matching the first "CF-" pattern if found:

>>> import cfdm
/home/slb93/git-repos/cfdm/cfdm/read_write/netcdf/netcdfread.py:1028: SyntaxWarning: invalid escape sequence '\s'
  all_conventions = re.split(",\s*", Conventions)
>>> cfdm.read("~/Downloads/subset.nc")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/slb93/git-repos/cfdm/cfdm/read_write/read.py", line 328, in read
    fields = netcdf.read(
             ^^^^^^^^^^^^
  File "/home/slb93/git-repos/cfdm/cfdm/decorators.py", line 171, in verbose_override_wrapper
    return method_with_verbose_kwarg(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/slb93/git-repos/cfdm/cfdm/read_write/netcdf/netcdfread.py", line 1056, in read
    g["file_version"] = Version(file_version)
                        ^^^^^^^^^^^^^^^^^^^^^
  File "/home/slb93/miniconda3/envs/cf-env-312/lib/python3.12/site-packages/packaging/version.py", line 200, in __init__
    raise InvalidVersion(f"Invalid version: '{version}'")
packaging.version.InvalidVersion: Invalid version: '1.6/CF-1.7'

Whilst a weird-y value such as 'CF-1.6/CF-1.7' is not the CF-compliant value to set on that attribute and we shouldn't account for any weirdness that data may possess, IMO it shouldn't mean such files can't be read in at all. I looked at the logic of the 'Conventions' property processing and concluded that it isn't very robust and should be improved so that weird edge cases don't error and instead any non-standard and therefore ambiguous values such as this are ignored - the files can be read in but the CF version is considered ambiguous therefore gets set by our default logic for lack of known/set version.

PR to follow, which makes the Conventions attribute versions processing more robust through regular expressions.

Metadata

Metadata

Labels

bugSomething isn't workingdataset readRelating to reading datasets

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions