Skip to content

Conversation

@daniel-zamora
Copy link
Contributor

@daniel-zamora daniel-zamora commented Nov 20, 2025

Overview

What is the objective?

Fix KMS keyword validation to properly handle "Not provided" as a valid long-name value for Platforms, Instruments, and Projects. Previously, only nil long-names were handled correctly - any non-nil value (including "Not provided") would fail validation if the KMS entry lacked a long-name field.

What are the changes?

  • Added skip-long-name-validation? helper function that treats nil, empty strings, and "Not provided" (case-insensitive, with trimming) as equivalent
  • Refactored long-name validation logic into a new lookup-by-umm-c-keyword-with-long-name function that:
    • Removes long-name from comparison when it should be skipped
    • Removes long-name from KMS index keys during lookup to enable matching on remaining fields
    • Supports optional transform functions for keyword-specific transformations
  • Added dedicated lookup functions for Instruments and Projects (previously used default path)
  • Updated Platforms lookup to use the new shared helper function

What areas of the application does this impact?

common-app-lib: KMS keyword lookup service and validation logic)

Required Checklist

  • New and existing unit and int tests pass locally and remotely
  • clj-kondo has been run locally and all errors in changed files are corrected
  • I have commented my code, particularly in hard-to-understand areas
  • I have made changes to the documentation (if necessary)
  • My changes generate no new warnings

Additional Checklist

  • I have removed unnecessary/dead code and imports in files I have changed
  • I have cleaned up integration tests by doing one or more of the following:
    • migrated any are2 tests to are3 in files I have changed
    • de-duped, consolidated, removed dead int tests
    • transformed applicable int tests into unit tests
    • reduced number of system state resets by updating fixtures. Ex) (use-fixtures :each (ingest/reset-fixture {})) to be :once instead of :each

Summary by CodeRabbit

  • New Features

    • Added keyword lookup support for instruments and projects in collection metadata.
  • Bug Fixes

    • Improved handling to skip or ignore missing, blank, or "Not provided" long-name values for platforms, instruments, and projects so matching and validation behave correctly.
  • Tests

    • Expanded test coverage for long-name scenarios (nil, blank, "Not provided", invalid) to ensure correct matching and validation.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 20, 2025

Walkthrough

Adds conditional long-name handling to KMS lookups for platforms, instruments, and projects by introducing a generic lookup that can strip ignorable long-names (nil/blank/"Not provided") before cache matching, plus new public wrappers and corresponding unit and integration tests.

Changes

Cohort / File(s) Summary
KMS Lookup Core Logic
common-app-lib/src/cmr/common_app/services/kms_lookup.clj
Adds skip-long-name-validation? and remove-long-name-from-kms-index. Introduces lookup-by-umm-c-keyword-with-long-name (with optional transform-fn) that normalizes keys, conditionally strips :long-name, builds comparison maps, reads cache, and preserves Carmine error handling. Adds public wrappers lookup-by-umm-c-keyword-platforms, -instruments, -projects. Updates lookup-by-umm-c-keyword dispatch to route :instruments and :projects.
KMS Lookup Tests
common-app-lib/test/cmr/common_app/test/services/kms_lookup.clj
Expands sample KMS data (adds PLAT2, INST2, PROJ2). Adds tests for long-name handling across platforms, instruments, projects (nil, blank, empty, "Not provided", invalid) and matching behavior when :long-name is present or skipped.
Integration Tests
system-int-test/test/cmr/system_int_test/ingest/collection/collection_keyword_validation_test.clj
Adds tests "Blank and 'Not provided' long names should not cause validation errors" to ensure collections with ignorable long-names for Platform/Instrument/Project pass validation and invalid long-name combinations fail with expected errors.

Sequence Diagram(s)

sequenceDiagram
    participant Caller as Caller
    participant Dispatcher as lookup-by-umm-c-keyword
    participant Wrapper as wrapper (platforms/instruments/projects)
    participant Generic as lookup-by-umm-c-keyword-with-long-name
    participant Validator as skip-long-name-validation?
    participant Cache as KMS Cache

    Caller->>Dispatcher: lookup request (keyword-scheme, umm-c-keyword)
    Dispatcher->>Wrapper: route to platform/instrument/project wrapper
    Wrapper->>Generic: call with umm-c-keyword + optional transform-fn
    Generic->>Validator: check if :long-name is ignorable
    alt ignorable (nil/blank/"Not provided")
        Generic->>Generic: remove :long-name from comparison/index
    else use long-name
        Generic->>Generic: include :long-name in comparison
    end
    Generic->>Cache: read using prepared comparison map
    Cache-->>Generic: cached match or nil
    Generic-->>Wrapper: return match (or propagate Carmine error)
    Wrapper-->>Dispatcher: propagate result
    Dispatcher-->>Caller: final lookup response
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review long-name detection logic in skip-long-name-validation? (case-insensitive "Not provided", blank, nil).
  • Verify remove-long-name-from-kms-index preserves index semantics and key normalization.
  • Confirm transform-fn usage (e.g., :type:category) is applied consistently in wrappers.
  • Check Carmine connection/error handling remains correct across new paths.
  • Validate added tests cover edge cases for blank, nil, empty, and "Not provided" long-names.

Poem

🐇 I hop where KMS entries lie,

Long-names whispered, "Not provided" — I sigh.
Nil or blank, I skip with care,
Match platforms, instruments, projects fair.
Hoppity-hop — lookups leap high!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically references the issue (CMR-10801) and summarizes the main change: avoiding validation errors for keyword long names when not provided.
Description check ✅ Passed The description covers all required template sections with clear explanations of objectives, changes, and impacted areas. All required checklist items are marked as complete.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch CMR-10801

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
common-app-lib/test/cmr/common_app/test/services/kms_lookup.clj (1)

243-310: Long-name tests thoroughly exercise nil/blank/'Not provided' semantics

The new blank-and-not-provided-long-name-validation-test:

  • Verifies that nil/empty/"Not provided" all match KMS entries missing long-names for platforms, instruments, and projects.
  • Confirms that non-blank, non-"Not provided" long-names still fail, even when short-name is valid.
  • Reasserts CMRS-4400 behavior for platforms with long-names in KMS (PLAT1) for both nil and exact long-name cases.

This gives solid unit-level coverage of the new lookup behavior. If we ever need to widen guarantees, optional follow-up would be to add analogous “KMS has long-name, UMM says 'Not provided'” tests for instruments/projects, but that’s not required for this PR.

common-app-lib/src/cmr/common_app/services/kms_lookup.clj (1)

279-333: Generic long-name-aware lookup and wrappers are sound; consider minor doc/perf clarifications

The new lookup-by-umm-c-keyword-with-long-name plus the platform/instrument/project wrappers achieves:

  • Consistent normalization (kebab-case + optional transform) across the three keyword schemes.
  • Conditional stripping of :long-name on both the UMM keyword and KMS index when skip-long-name-validation? is true, so nil/blank/"Not provided" no longer cause false negatives.
  • Preservation of strict matching whenever :long-name is any other non-blank value, as verified by the new tests.

A couple of small, non-blocking follow-ups you might consider:

  • The remove-long-name-from-kms-index docstring still talks only about “nil” long-names; updating it to mention blank/"Not provided" and all three schemes would better reflect current use.
  • remove-long-name-from-kms-index rebuilds the entire scheme index map on each lookup when skipping long-names. This was already true for the older platform path, but it now applies to instruments/projects as well. If lookups on these schemes become hot paths, precomputing or caching a “no-long-name” variant per scheme could avoid repeated work.
  • If KMS ever contained multiple entries that differ only by :long-name for the same short-name (and other fields), stripping :long-name could collapse them; worth confirming that KMS schema/constraints prevent such duplicates.

These are refinements rather than correctness issues; the current behavior matches the tests and PR intent.

Also applies to: 335-357, 367-368

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 21d5106 and 18888db.

📒 Files selected for processing (3)
  • common-app-lib/src/cmr/common_app/services/kms_lookup.clj (3 hunks)
  • common-app-lib/test/cmr/common_app/test/services/kms_lookup.clj (2 hunks)
  • system-int-test/test/cmr/system_int_test/ingest/collection/collection_keyword_validation_test.clj (1 hunks)
🔇 Additional comments (3)
system-int-test/test/cmr/system_int_test/ingest/collection/collection_keyword_validation_test.clj (1)

477-529: New 'Not provided' integration tests are well-targeted and aligned with existing patterns

The added tests cleanly cover:

  • Valid cases for Platforms/Instruments/Projects when long-name is "Not provided".
  • Negative cases ensuring arbitrary non-blank long names still fail even when short names are valid.

Assertions reuse existing helpers and message formats, so they’ll stay in sync with prior behavior. No changes needed here.

common-app-lib/test/cmr/common_app/test/services/kms_lookup.clj (1)

14-19: Sample KMS additions for PLAT2/INST2/PROJ2 look consistent

The extra platform/instrument/project entries cleanly model the “no long-name in KMS” scenarios without disturbing existing tests. Keys follow the same shape and casing as the rest of sample-map.

common-app-lib/src/cmr/common_app/services/kms_lookup.clj (1)

270-277: skip-long-name-validation? correctly captures the intended 'Not provided' semantics

The helper cleanly treats nil, blank, and case/whitespace-insensitive "Not provided" as “no long-name,” which matches the PR objective and keeps the behavior centralized. No changes needed here.

@daniel-zamora daniel-zamora changed the title CMR-10801: refactors longname validation to handle not provided CMR-10801: As a metadata steward, I should not get a CMR validation error for a keyword long name when it’s not provided and not required Nov 20, 2025
@codecov-commenter
Copy link

codecov-commenter commented Nov 20, 2025

Codecov Report

❌ Patch coverage is 96.87500% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 58.18%. Comparing base (d3fbc7b) to head (dc860e9).

Files with missing lines Patch % Lines
...app-lib/src/cmr/common_app/services/kms_lookup.clj 96.87% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##           master    #2350   +/-   ##
=======================================
  Coverage   58.17%   58.18%           
=======================================
  Files        1063     1063           
  Lines       72197    72214   +17     
  Branches     2084     2082    -2     
=======================================
+ Hits        42003    42017   +14     
- Misses      28254    28256    +2     
- Partials     1940     1941    +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
common-app-lib/src/cmr/common_app/services/kms_lookup.clj (1)

270-277: Optional: Redundant nil check.

The explicit (nil? long-name) check on line 274 is technically redundant because string/blank? already returns true for nil values. However, this makes the intent explicit and has no performance impact.

If you prefer a more concise version, you could simplify to:

-(defn- skip-long-name-validation?
-  "Returns true if the long name should be skipped in validation.
-   Long name is skipped if it's nil, empty string, or 'Not provided' (case-insensitive)."
-  [long-name]
-  (or (nil? long-name)
-      (string/blank? long-name)
-      (when (string? long-name)
-        (= "not provided" (string/lower-case (string/trim long-name))))))
+(defn- skip-long-name-validation?
+  "Returns true if the long name should be skipped in validation.
+   Long name is skipped if it's nil, empty string, or 'Not provided' (case-insensitive)."
+  [long-name]
+  (or (string/blank? long-name)
+      (when (string? long-name)
+        (= "not provided" (string/lower-case (string/trim long-name))))))
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18888db and 55a5a73.

📒 Files selected for processing (3)
  • common-app-lib/src/cmr/common_app/services/kms_lookup.clj (3 hunks)
  • common-app-lib/test/cmr/common_app/test/services/kms_lookup.clj (2 hunks)
  • system-int-test/test/cmr/system_int_test/ingest/collection/collection_keyword_validation_test.clj (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • system-int-test/test/cmr/system_int_test/ingest/collection/collection_keyword_validation_test.clj
🔇 Additional comments (8)
common-app-lib/src/cmr/common_app/services/kms_lookup.clj (6)

279-285: LGTM!

The function correctly removes :long-name from all keys in the KMS index map to enable matching on remaining fields when long-name validation should be skipped. The implementation is straightforward and correct.


308-333: LGTM!

The generic lookup function correctly handles long-name validation by:

  1. Conditionally removing long-name from the input keyword when it should be skipped (nil, blank, or "Not provided")
  2. Conditionally removing long-name from KMS index keys to enable matching on remaining fields
  3. Preserving normal matching behavior when long-name has a real value

The logic correctly handles all scenarios:

  • Accepts nil/blank/"Not provided" long-names even when KMS entry lacks a long-name
  • Accepts exact long-name matches when both are present
  • Rejects invalid long-name values
  • Maintains consistent error handling with other lookup functions

335-345: LGTM!

The refactoring correctly delegates to the new generic function while preserving the existing :type → :category transformation for platforms. This maintains backward compatibility while gaining the long-name validation improvements.


347-351: LGTM!

New dedicated lookup function for instruments that correctly delegates to the generic long-name handler. This enables instruments to accept nil/blank/"Not provided" long-names as intended by the PR objectives.


353-357: LGTM!

New dedicated lookup function for projects that correctly delegates to the generic long-name handler. This enables projects to accept nil/blank/"Not provided" long-names as intended by the PR objectives.


367-368: LGTM!

The dispatch correctly routes instruments and projects to their new dedicated lookup functions, ensuring they benefit from the long-name validation logic. The integration is clean and maintains backward compatibility for other keyword schemes.

common-app-lib/test/cmr/common_app/test/services/kms_lookup.clj (2)

14-19: LGTM!

The test data additions correctly provide KMS entries without long-name fields (PLAT2, INST2, PROJ2) to enable testing the new validation logic. These complement the existing entries with long-names (PLAT1, INST1, PROJ1), providing comprehensive test coverage for both scenarios.


243-310: LGTM!

Comprehensive test coverage for the new long-name validation logic:

  1. Tests all three keyword types (platforms, instruments, projects) with identical scenarios
  2. Covers all skip-long-name cases: nil, empty string, and case-insensitive "Not provided"
  3. Validates rejection of invalid long-names even with valid short-names
  4. Tests CMR-4400 behavior where nil long-name matches KMS entries that have long-names (lines 301-310)

The tests correctly verify that when long-name is skipped, matching occurs on remaining fields regardless of whether the KMS entry has a long-name.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
common-app-lib/test/cmr/common_app/test/services/kms_lookup.clj (1)

243-310: New long-name tests are solid; consider an extra case and minor dedup

The new blank-and-not-provided-long-name-validation-test thoroughly covers nil, empty string, "Not provided" (case-insensitive), and invalid long-name cases for platforms, instruments, and projects, and also reasserts the CMR-4400 behavior for platforms with long-names in KMS. This is good coverage.

Two optional follow-ups:

  • Add a case where KMS has a long-name (e.g., PLAT1) and the UMM long-name is "Not provided" to explicitly lock in that behavior, since the helper treats "Not provided" equivalently to nil/blank.
  • Consider folding the earlier CMR-4400 test (Lines 191–195) into this new group, or vice versa, to avoid testing the same nil-long-name platform behavior in two places.
common-app-lib/src/cmr/common_app/services/kms_lookup.clj (2)

270-278: skip-long-name-validation? matches requirements; minor robustness/doc tweaks possible

The predicate correctly treats nil, blank, and "Not provided" (case-insensitive, trimmed) as “not provided,” which aligns with the PR intent.

If you want to harden this a bit and make the doc more precise, you could:

  • Reflect that whitespace-only strings are also treated as blank.
  • Guard string/blank? behind string? to avoid surprises if a non-string ever sneaks in.

Example tweak:

(defn- skip-long-name-validation?
-  "Returns true if the long name should be skipped in validation.
-   Long name is skipped if it's nil, empty string, or 'Not provided' (case-insensitive)."
+  "Returns true if the long name should be skipped in validation.
+   Long name is skipped if it's nil, blank/whitespace, or 'Not provided' (case-insensitive)."
  [long-name]
-  (or (nil? long-name)
-      (string/blank? long-name)
-      (when (string? long-name)
-        (= "not provided" (string/lower-case (string/trim long-name))))))
+  (or (nil? long-name)
+      (and (string? long-name)
+           (or (string/blank? long-name)
+               (= "not provided"
+                  (-> long-name string/trim string/lower-case))))))

308-368: Generic long-name lookup refactor is correct; watch out for repeated full-map rewrites

The new lookup-by-umm-c-keyword-with-long-name helper plus the platform/instrument/project wrappers look logically sound:

  • UMM keywords are kebab-cased, optionally transformed (e.g., :type:category for platforms), and only include :long-name in the comparison when it should be validated.
  • When long-name is “skipped” (nil/blank/"Not provided"), you drop it from the comparison map and also from the KMS index keys, so KMS entries with and without long-names can match on the remaining fields.
  • Error handling and logging follow the existing pattern, and instruments/projects now get the same long-name behavior that platforms already had.

One thing to keep an eye on: whenever skip-long-name? is true, remove-long-name-from-kms-index walks the entire scheme map and builds a new map on every lookup. If the KMS index for platforms/instruments/projects is large and these lookups are hot, that extra O(N) work per call could add up.

If this becomes a hotspot, consider a follow-up refactor such as:

  • Precomputing and storing a “no-long-name” index variant at index-build time, or
  • Memoizing remove-long-name-from-kms-index per scheme (e.g., keyed by the value identity or a stable hash) so repeated lookups reuse the same transformed map.

Also, the docstring on remove-long-name-from-kms-index (“only want to validate long-name if it is not nil”) is now slightly out of date given the broader skip semantics; updating that comment in a follow-up would avoid confusion.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 55a5a73 and dc860e9.

📒 Files selected for processing (3)
  • common-app-lib/src/cmr/common_app/services/kms_lookup.clj (3 hunks)
  • common-app-lib/test/cmr/common_app/test/services/kms_lookup.clj (2 hunks)
  • system-int-test/test/cmr/system_int_test/ingest/collection/collection_keyword_validation_test.clj (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • system-int-test/test/cmr/system_int_test/ingest/collection/collection_keyword_validation_test.clj
🔇 Additional comments (1)
common-app-lib/test/cmr/common_app/test/services/kms_lookup.clj (1)

14-19: Sample KMS entries for PLAT2/INST2/PROJ2 look consistent with new scenarios

The added platform/instrument/project entries correctly model “no long-name in KMS” cases and align with the new tests (PLAT2 has :category only, INST2/PROJ2 have only :short-name). No issues here.

@daniel-zamora daniel-zamora merged commit 159f56a into master Dec 3, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants