-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Description:
I tested the parser with a larger Excel file and noticed two issues concerning the parsing of the product category.
Main issue:
If a chemical has no product category (empty Produktkategorie cell), parsing fails with an IndexError.
Root cause appears to be this line in parser.py line 101:
pc_code = self.get_value_as_str(
chemical_row.get("Produktkategorie", "")
).split()[0]
get_value_as_str(...) returns an empty string for empty or NaN cells.
.split()[0] attempts to access the first element of the list.
If the string is empty => split() returns [] => split()[0] triggers IndexError.
Minor issue:
If a cell contains multiple product category codes separated by spaces, only the first is taken.
In testing, sometimes no code is assigned, which is not explained yet.
This does not block parsing and might not need immediate fixing since OpenBIS cannot handle multiple PC codes anyway.
Suggested fix:
For the main issue, handle empty strings before accessing [0].
Optional: openBIS can't process multiple PC codes anyways so we just have to pick the first valid one or none. Maybe add the PC codes as str in the description/notes or add a warning?
Steps to reproduce:
Run the parser with an Excel file containing at least one chemical with an empty Produktkategorie cell and observe IndexError.