Skip to content

Parser: handle empty Produktkategorie and multiple product codes #5

@TomCharlesRousseau

Description

@TomCharlesRousseau

Description:
I tested the parser with a larger Excel file and noticed two issues concerning the parsing of the product category.


Main issue:

If a chemical has no product category (empty Produktkategorie cell), parsing fails with an IndexError.

Root cause appears to be this line in parser.py line 101:

pc_code = self.get_value_as_str(
chemical_row.get("Produktkategorie", "")
).split()[0]

get_value_as_str(...) returns an empty string for empty or NaN cells.

.split()[0] attempts to access the first element of the list.

If the string is empty => split() returns [] => split()[0] triggers IndexError.


Minor issue:

If a cell contains multiple product category codes separated by spaces, only the first is taken.

In testing, sometimes no code is assigned, which is not explained yet.

This does not block parsing and might not need immediate fixing since OpenBIS cannot handle multiple PC codes anyway.

Suggested fix:

For the main issue, handle empty strings before accessing [0].

Optional: openBIS can't process multiple PC codes anyways so we just have to pick the first valid one or none. Maybe add the PC codes as str in the description/notes or add a warning?

Steps to reproduce:

Run the parser with an Excel file containing at least one chemical with an empty Produktkategorie cell and observe IndexError.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions