Normalization step failure-collection/review ideas

[EDITED after discussion with Evan 2025-09-18]

## Background

I've previously brought up the complications of the normalization step from my POV, based on my experience doing it in my custom notebooks. I collect data on failures (broken down by type of failure) and review/analyze those failures and other normalization stuff. I've been asked to share some details on what I'm doing, so that's what this issue is for. 

While doing all this is more work, I've found it very helpful for:
* finding NodeNorm issues/shortcomings
* finding errors in the resource itself (that developers appreciated being informed of specific errors and fixed!)
* figuring out what to use as the original ID when the resource contains multiple options

## Details

### I catch these types of NodeNorm mapping failures: 

- **NodeNorm returned None - save the input ID. ORION's current NodeNorm failure file does this**
- NodeNorm clique is the wrong primary category - save the input ID and that NodeNorm category. This requires as input an expected category/list of expected categories (which can be tricky to include)
- **NodeNorm clique's doesn't have a primary label - save the input ID. Evan agreed to add this.** I've heard that the UI doesn't like it when Nodes don't have a human-readable label so this 
- Unexpected errors (caught with try-except) - save the input ID and that NodeNorm response

Then I save and print summary statistics for each failure type: how many input IDs affected and how many rows removed as a result. 

### I've done these kinds of reviews/analyses:

- A resource provides two ID columns for an entity - compare the NodeNorm mappings from each column for differences.
   - If they differ, manually compare the input IDs, NodeNorm mappings, and original concept (string) to find which column of input IDs best captures the original concept
- Compare the names provided by the resource to the NodeNorm primary labels
- For 1 ID column from the resource, compare the input IDs, NodeNorm mappings, and original concept (string) to see if the original concept is being accurately represented by the input ID/NodeNorm mapping. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Normalization step failure-collection/review ideas #49

Background

Details

I catch these types of NodeNorm mapping failures:

I've done these kinds of reviews/analyses:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Normalization step failure-collection/review ideas #49

Description

Background

Details

I catch these types of NodeNorm mapping failures:

I've done these kinds of reviews/analyses:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions