Skip to content

CMR-11116 and CMR-11106: Improve error handling for launchpad tokens#2398

Open
daniel-zamora wants to merge 4 commits intomasterfrom
CMR-11116
Open

CMR-11116 and CMR-11106: Improve error handling for launchpad tokens#2398
daniel-zamora wants to merge 4 commits intomasterfrom
CMR-11116

Conversation

@daniel-zamora
Copy link
Contributor

@daniel-zamora daniel-zamora commented Mar 9, 2026

Overview

What is the objective?

Fix launchpad token timeout behavior and improve error propagation from EDL (Earthdata Login) to provide more actionable error messages to users.

What are the changes?

  1. CMR-11116: Fixed launchpad invalid token timeout
  • Fixed cache eviction logic so errors are not cached when re-validating expired tokens
  • Added transient error handling for 429 (rate limit) and 504 (gateway timeout) errors - these are no longer cached and are passed through immediately for retry
  • Corrected token expiration behavior to use absolute expiration times instead of relative times
  1. CMR-11106: Added error propagation from EDL for launchpad tokens
  • Added specific error type mappings for different HTTP status codes (408, 429, 500, 502, 503, 504, 401)
  • EDL error messages are now passed through directly to users instead of being wrapped in generic error messages
  • Improved error messages to be more specific about the nature of the failure (e.g., "Rate limit exceeded", "Gateway timeout")
  1. CMR-11116: Added tests for launchpad token caching behavior
  • Added comprehensive tests for token expiration, cache eviction, transient vs non-transient error handling
  • Enhanced mock URS to support stateful token expiration for realistic testing
  • Added cache size assertions and multi-key eviction tests

What areas of the application does this impact?

  • Ingest - Launchpad token authentication and caching
  • Access Control - User authentication via Launchpad tokens
  • transmit-lib - URS client and launchpad user cache
  • mock-echo-app - Mock URS for testing

Required Checklist

  • New and existing unit and int tests pass locally and remotely
  • clj-kondo has been run locally and all errors in changed files are corrected
  • I have commented my code, particularly in hard-to-understand areas
  • I have made changes to the documentation (if necessary)
  • My changes generate no new warnings

Additional Checklist

  • I have removed unnecessary/dead code and imports in files I have changed
  • I have cleaned up integration tests by doing one or more of the following:
    • migrated any are2 tests to are3 in files I have changed
    • de-duped, consolidated, removed dead int tests
    • transformed applicable int tests into unit tests
    • reduced number of system state resets by updating fixtures. Ex) (use-fixtures :each (ingest/reset-fixture {})) to be :once instead of :each

Summary by CodeRabbit

  • New Features

    • Enhanced error handling for additional HTTP statuses (408, 429, 500, 502, 503, 504) with clearer responses for timeouts, rate limits, and gateway issues.
    • Improved Launchpad token validation with expiration tracking and deterministic test tokens.
    • Added cache eviction capability for finer cache control and to avoid caching transient failures.
  • Tests

    • Added tests covering transient vs. non-transient error caching and cache eviction behavior.

@coderabbitai
Copy link

coderabbitai bot commented Mar 9, 2026

Warning

Rate limit exceeded

@daniel-zamora has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 38 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7df8a62d-7c15-4687-ae45-b4f714d90e11

📥 Commits

Reviewing files that changed from the base of the PR and between 5ef6ea1 and 81088e0.

📒 Files selected for processing (2)
  • system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/launchpad_user_cache.clj
📝 Walkthrough

Walkthrough

Adds cache eviction to the cache API and in-memory implementation; expands HTTP error-to-status mappings; refines Launchpad token validation and caching (transient-error detection, SHA-256 cache keys, eviction on expiration, and test-token behaviors); and adds tests for eviction and Launchpad error caching behavior.

Changes

Cohort / File(s) Summary
Cache protocol & impl
common-lib/src/cmr/common/cache.clj, common-lib/src/cmr/common/cache/in_memory_cache.clj
Added evict method to CmrCache protocol and implemented evict in InMemoryCache (uses core.cache/evict via atom swap).
Cache tests
common-lib/test/cmr/common/test/cache/in_memory_cache.clj
Added tests verifying evict removes keys, triggers reload logic, and preserves unrelated entries.
HTTP error mappings & handling
common-lib/src/cmr/common/api/errors.clj, transmit-lib/src/cmr/transmit/urs.clj, transmit-lib/src/cmr/transmit/tokens.clj
Expanded type->http-status-code with :request-timeout, :internal-server-error, :bad-gateway; added specific handling for 408/429/500/502/503/504 in Launchpad response processing and 429 handling in token flow.
Launchpad caching & token logic
transmit-lib/src/cmr/transmit/launchpad_user_cache.clj, transmit-lib/src/cmr/transmit/urs.clj
Added transient-error-types, switched cache keys to DigestUtils/sha256Hex(token), evict expired entries, avoid caching transient errors (e.g., 429, 504), and standardize cached error payloads with :error-type and expiration metadata.
Mock Echo test tokens & routes
mock-echo-app/src/cmr/mock_echo/api/urs.clj, mock-echo-app/src/cmr/mock_echo/api/routes.clj
Added launchpad-token-validations atom and reset-launchpad-tokens!; deterministic test-token expiration behaviors; route reset now calls reset-launchpad-tokens!; removed unused requires.
Integration tests
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
Added tests ensuring transient errors (429, 504) are not cached and non-transient errors (401) are cached; changed token-key derivation to SHA-256.
Project manifests
project.clj, deps.edn
Manifest files referenced/updated for dependency/import changes (e.g., DigestUtils import usage).

Sequence Diagram

sequenceDiagram
    participant Client
    participant Cache as "Token Cache\n(ring/cache)"
    participant EDL as "EDL / Launchpad"
    participant Err as "Error Handler"

    Client->>Cache: validate(token)
    alt Cached & valid
        Cache-->>Client: return cached user
    else Cached & error
        Cache-->>Client: return cached error payload
    else Not cached or expired
        Cache->>Cache: evict(key)
        Client->>EDL: request validation(token)
        alt Success (200)
            EDL-->>Cache: user data
            Cache->>Cache: store user with expiration
            Cache-->>Client: return user
        else Transient (429,504)
            EDL-->>Err: transient error
            Err-->>Client: return error (do not cache)
        else Non-transient (401,408,500,502,503)
            EDL-->>Cache: error payload
            Cache->>Cache: store error with expiration
            Cache-->>Client: return error
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • zimzoom
  • jmaeng72

Poem

🐰
I nibbled stale keys beneath the log,
Evicted cobwebs from the cachey bog.
Tokens ticked, some fleeting, some grave,
I keep the swift, but never save the wave.
Hop, validate — a rabbit's rave!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main changes: improving error handling for launchpad tokens across two related issues (CMR-11116 and CMR-11106).
Description check ✅ Passed The PR description comprehensively covers objectives, detailed changes across three main areas (cache eviction, error propagation, tests), and impacted components. Required and additional checklists are included.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch CMR-11116

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@daniel-zamora daniel-zamora changed the title Cmr 11116 CMR-11116 and CMR-11106: Improve error handling for launchpad tokens Mar 9, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj (1)

98-109: Clarify test intent: verifies re-fetch after cache expiry, not cached error persistence.

The test name suggests "Non-transient errors are cached for 5 minutes," but advancing time by 301 seconds exceeds the 5-minute (300s) cache TTL. At that point, the cache entry would have expired, so the second request is actually re-fetching from the mock (which still returns 401).

If the intent is to verify the error remains cached within the 5-minute window, consider advancing time by less than 300 seconds. If the intent is to verify behavior after cache expiry, the test name and comment should reflect that.

💡 If testing cached error persistence within TTL
-      (dev-sys-util/advance-time! 301)
+      (dev-sys-util/advance-time! 200) ;; Still within 5-minute cache TTL
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`
around lines 98 - 109, The test non-transient-errors-are-cached-test is
ambiguous: either change the TTL advance to stay within 5 minutes (use
dev-sys-util/advance-time! 299) to verify the 401 stays cached and the second
ingest/ingest-concept still hits the cache, or keep the 301s advance but rename
the test (and any assertion text) to indicate "after cache expiry" and add an
assertion using cache-util/list-cache-keys (or check token-key) that the cache
entry is gone before the second ingest/ingest-concept; update the test
name/comment accordingly and adjust assertions to match the chosen intent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@common-lib/src/cmr/common/cache.clj`:
- Around line 41-43: Implement the missing evict method for each CmrCache
implementation: add an evict implementation to FallbackCache that calls evict on
both primary and fallback caches (or on the underlying cache instances
represented in the FallbackCache record), add an evict to
SingleThreadLookupCache that removes the key from its local lookup map and
delegates to the wrapped cache's evict, and add an evict to RedisCache that
performs the Redis deletion logic currently used for put/remove (using the
existing redis client/connection code) so it deletes the key from Redis; ensure
each method signature matches the CmrCache protocol (evict [cache cache-key])
and delegates to the appropriate existing functions/fields in the FallbackCache,
SingleThreadLookupCache, and RedisCache implementations.

In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 58-72: When a cached Launchpad entry expires and
get-launchpad-user-fn returns a non-valid result, the code currently throws the
error without re-caching it; change the expired-path so that on a non-valid
fresh-result you set the cache (using cache/set-value with the fresh-result)
before throwing the service error, mirroring the initial cache-miss behavior for
errors (use the same get-launchpad-user-fn handling and caching behavior), and
keep the existing errors/throw-service-error call to raise the error after
caching.

---

Nitpick comments:
In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 98-109: The test non-transient-errors-are-cached-test is
ambiguous: either change the TTL advance to stay within 5 minutes (use
dev-sys-util/advance-time! 299) to verify the 401 stays cached and the second
ingest/ingest-concept still hits the cache, or keep the 301s advance but rename
the test (and any assertion text) to indicate "after cache expiry" and add an
assertion using cache-util/list-cache-keys (or check token-key) that the cache
entry is gone before the second ingest/ingest-concept; update the test
name/comment accordingly and adjust assertions to match the chosen intent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f90ba1e-3569-4914-8eaa-2a22732dffa3

📥 Commits

Reviewing files that changed from the base of the PR and between 22c44cb and 3d77b74.

📒 Files selected for processing (10)
  • common-lib/src/cmr/common/api/errors.clj
  • common-lib/src/cmr/common/cache.clj
  • common-lib/src/cmr/common/cache/in_memory_cache.clj
  • common-lib/test/cmr/common/test/cache/in_memory_cache.clj
  • mock-echo-app/src/cmr/mock_echo/api/routes.clj
  • mock-echo-app/src/cmr/mock_echo/api/urs.clj
  • system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/tokens.clj
  • transmit-lib/src/cmr/transmit/urs.clj

Comment on lines +41 to +43
(evict
[cache cache-key]
"Removes the value associated with the given key from the cache.")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find all files that implement CmrCache protocol
echo "=== Files implementing CmrCache protocol ==="
rg -l 'cache/CmrCache|defrecord.*CmrCache' --type clojure

echo ""
echo "=== Checking for evict implementations ==="
# Search for evict implementations in cache-related files
rg -n '\(evict\b' --type clojure -g '*cache*'

Repository: nasa/Common-Metadata-Repository

Length of output: 38625


🏁 Script executed:

#!/bin/bash
# Check FallbackCache implementation
echo "=== FallbackCache implementation ==="
rg -n "defrecord\|evict" common-lib/src/cmr/common/cache/fallback_cache.clj -A 2

echo ""
echo "=== SingleThreadLookupCache implementation ==="
rg -n "defrecord\|evict" common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj -A 2

echo ""
echo "=== RedisCache implementation ==="
rg -n "defrecord\|evict" redis-utils-lib/src/cmr/redis_utils/redis_cache.clj -A 2

Repository: nasa/Common-Metadata-Repository

Length of output: 194


🏁 Script executed:

#!/bin/bash
# Check FallbackCache - read file and look for extend-type or defrecord and protocol methods
echo "=== FallbackCache file ==="
wc -l common-lib/src/cmr/common/cache/fallback_cache.clj
cat -n common-lib/src/cmr/common/cache/fallback_cache.clj | head -100

echo ""
echo "=== SingleThreadLookupCache file ==="
wc -l common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj
cat -n common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj | head -120

echo ""
echo "=== RedisCache file ==="
wc -l redis-utils-lib/src/cmr/redis_utils/redis_cache.clj
cat -n redis-utils-lib/src/cmr/redis_utils/redis_cache.clj | head -150

Repository: nasa/Common-Metadata-Repository

Length of output: 14945


🏁 Script executed:

cat -n common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj | tail -85

Repository: nasa/Common-Metadata-Repository

Length of output: 3330


🏁 Script executed:

#!/bin/bash
# Check the CmrCache protocol definition
echo "=== CmrCache Protocol ==="
cat -n common-lib/src/cmr/common/cache.clj | head -60

echo ""
echo "=== in_memory_cache evict implementation ==="
cat -n common-lib/src/cmr/common/cache/in_memory_cache.clj | sed -n '117,180p'

Repository: nasa/Common-Metadata-Repository

Length of output: 5082


🏁 Script executed:

#!/bin/bash
# Check the test implementations
echo "=== granule_counts_cache_test ==="
rg -n "defrecord|extend-type.*CmrCache|evict" search-app/test/cmr/search/test/unit/data/granule_counts_cache_test.clj

echo ""
echo "=== has_granules_or_cwic_test ==="
rg -n "defrecord|extend-type.*CmrCache|evict" search-app/test/cmr/search/test/unit/data/complex_to_simple_converters/has_granules_or_cwic_test.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 143


Missing evict implementation in three CmrCache implementations.

The evict method is required by the CmrCache protocol but is missing from FallbackCache, SingleThreadLookupCache, and RedisCache. Calling evict on these implementations will throw AbstractMethodError at runtime.

  • FallbackCache (common-lib/src/cmr/common/cache/fallback_cache.clj)
  • SingleThreadLookupCache (common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj)
  • RedisCache (redis-utils-lib/src/cmr/redis_utils/redis_cache.clj)

Each implementation should delegate evict appropriately to underlying caches or operations.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@common-lib/src/cmr/common/cache.clj` around lines 41 - 43, Implement the
missing evict method for each CmrCache implementation: add an evict
implementation to FallbackCache that calls evict on both primary and fallback
caches (or on the underlying cache instances represented in the FallbackCache
record), add an evict to SingleThreadLookupCache that removes the key from its
local lookup map and delegates to the wrapped cache's evict, and add an evict to
RedisCache that performs the Redis deletion logic currently used for put/remove
(using the existing redis client/connection code) so it deletes the key from
Redis; ensure each method signature matches the CmrCache protocol (evict [cache
cache-key]) and delegates to the appropriate existing functions/fields in the
FallbackCache, SingleThreadLookupCache, and RedisCache implementations.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
common-lib/src/cmr/common/cache.clj (1)

41-43: ⚠️ Potential issue | 🟠 Major

Finish the evict rollout before extending CmrCache.

FallbackCache, SingleThreadLookupCache, and RedisCache still implement cache/CmrCache without an evict method. Any cache/evict dispatch to one of those types will fail, so this protocol change is still incomplete.

Run this to confirm every CmrCache implementation picked up the new method. Expected result: no MISSING evict lines.

#!/bin/bash
set -euo pipefail

rg -l --type clojure 'cache/CmrCache' common-lib/src redis-utils-lib/src | while read -r file; do
  if ! rg -q '\(evict\b' "$file"; then
    echo "MISSING evict: $file"
  fi
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@common-lib/src/cmr/common/cache.clj` around lines 41 - 43, The protocol
CmrCache was extended with an evict method but FallbackCache,
SingleThreadLookupCache, and RedisCache do not implement it, causing dispatch
failures; add an evict implementation to each of these types (implement the
protocol method named evict / cache/evict) that matches their semantics: for
FallbackCache delegate eviction to both/backing caches or the appropriate inner
cache, for SingleThreadLookupCache remove the key from its local lookup store
and delegate to its backing cache, and for RedisCache perform the appropriate
key removal from Redis; after changes run the provided rg script to verify no
"MISSING evict" results.
🧹 Nitpick comments (2)
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj (1)

98-110: This test still doesn't prove the cached 401 was refreshed after 5 minutes.

The second request returns 401 whether the cache expired and revalidated or the stale error was served again. Capturing the cached :expiration-time before advancing time and asserting it moves forward afterward would make this test cover the timeout behavior it is naming.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`
around lines 98 - 110, The test non-transient-errors-are-cached-test should
capture the cache entry's expiration timestamp before advancing time and assert
that it is updated after the second ingest; locate the cache entry using the
token-key variable and cache-util/list-cache-keys or a cache-read helper to
fetch the cached map (which contains :expiration-time), store that
:expiration-time in a local binding, call dev-sys-util/advance-time! 301 and
perform the second ingest, then read the cache entry again and assert that the
new :expiration-time is greater than the previously captured value to prove the
entry was refreshed after expiry.
mock-echo-app/src/cmr/mock_echo/api/urs.clj (1)

66-76: Consider using the stored :expires-in value for consistency.

The :expires-in is stored in the atom on line 75 but never read back. Line 70 always uses the local constant expires-in instead of (:expires-in validation-info). While this works correctly for the current test token (always 1600 seconds), it would be more consistent to use the stored value:

♻️ Proposed fix to use stored expires-in
       (let [expires-in 1600
             validation-info (get `@launchpad-token-validations` token)]
         (if validation-info
-          (let [elapsed-seconds (t/in-seconds (t/interval (:first-validated-at validation-info) (time-keeper/now)))]
-            (if (>= elapsed-seconds expires-in)
+          (let [stored-expires-in (:expires-in validation-info)
+                elapsed-seconds (t/in-seconds (t/interval (:first-validated-at validation-info) (time-keeper/now)))]
+            (if (>= elapsed-seconds stored-expires-in)
               {:status 401 :body {:error (format "Launchpad token (partially redacted) [%s] has expired."
                                                   (common-util/scrub-token token))}}
-              {:status 200 :body {:uid "user1" :lp_token_expires_in (- expires-in elapsed-seconds)}}))
+              {:status 200 :body {:uid "user1" :lp_token_expires_in (- stored-expires-in elapsed-seconds)}}))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mock-echo-app/src/cmr/mock_echo/api/urs.clj` around lines 66 - 76, The code
binds a local expires-in constant but then stores :expires-in in
launchpad-token-validations and never reads it; update the logic in the branch
that reads validation-info to use (:expires-in validation-info) instead of the
local expires-in when comparing elapsed-seconds and when computing
lp_token_expires_in, so the code consistently uses the stored :expires-in value
from validation-info (the atom launchpad-token-validations and the map key
:expires-in) rather than the hardcoded expires-in local.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 83-96: Update the assertions in transient-errors-not-cached-test
to match the new Launchpad messages emitted by
transmit-lib/src/cmr/transmit/urs.clj: replace the re-find checks for "Rate
limit exceeded" and "Gateway timeout" with patterns that match "Launchpad rate
limit exceeded" and "(gateway timeout)" (or use case-insensitive patterns like
(?i)rate limit exceeded and (?i)gateway timeout) so the ingest/ingest-concept
responses asserted in transient-errors-not-cached-test will match the current
URS messages.

In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 56-57: Currently cache-key is created with (keyword (str (hash
token))) which interns a keyword per token and exposes collision/memory-leak
risks; change cache-key to be a plain string (not a keyword) derived from the
token or, preferably, a collision-resistant digest/HMAC (e.g., SHA-256 or
HMAC-SHA256 of token) and use that string when calling cache/get-value with
get-launchpad-user-fn so tokens are not interned as keywords and collisions are
minimized.
- Around line 61-74: After evicting the stale entry, do not write a failing
revalidation result back into the cache; change the logic in the block using
cache/evict, get-launchpad-user-fn, cache/set-value, and
errors/throw-service-error so that if fresh-result is valid you set
cache/set-value and return it, but if fresh-result is invalid you do NOT call
cache/set-value and instead directly raise the error
(errors/throw-service-error) using the existing (:error-type/:error-message)
(with the scrubbed token message fallback); this preserves eviction on failed
revalidation instead of caching the failure.

---

Duplicate comments:
In `@common-lib/src/cmr/common/cache.clj`:
- Around line 41-43: The protocol CmrCache was extended with an evict method but
FallbackCache, SingleThreadLookupCache, and RedisCache do not implement it,
causing dispatch failures; add an evict implementation to each of these types
(implement the protocol method named evict / cache/evict) that matches their
semantics: for FallbackCache delegate eviction to both/backing caches or the
appropriate inner cache, for SingleThreadLookupCache remove the key from its
local lookup store and delegate to its backing cache, and for RedisCache perform
the appropriate key removal from Redis; after changes run the provided rg script
to verify no "MISSING evict" results.

---

Nitpick comments:
In `@mock-echo-app/src/cmr/mock_echo/api/urs.clj`:
- Around line 66-76: The code binds a local expires-in constant but then stores
:expires-in in launchpad-token-validations and never reads it; update the logic
in the branch that reads validation-info to use (:expires-in validation-info)
instead of the local expires-in when comparing elapsed-seconds and when
computing lp_token_expires_in, so the code consistently uses the stored
:expires-in value from validation-info (the atom launchpad-token-validations and
the map key :expires-in) rather than the hardcoded expires-in local.

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 98-110: The test non-transient-errors-are-cached-test should
capture the cache entry's expiration timestamp before advancing time and assert
that it is updated after the second ingest; locate the cache entry using the
token-key variable and cache-util/list-cache-keys or a cache-read helper to
fetch the cached map (which contains :expiration-time), store that
:expiration-time in a local binding, call dev-sys-util/advance-time! 301 and
perform the second ingest, then read the cache entry again and assert that the
new :expiration-time is greater than the previously captured value to prove the
entry was refreshed after expiry.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e071585c-c7b1-4cf8-bfc2-3caf1c259ae8

📥 Commits

Reviewing files that changed from the base of the PR and between 3d77b74 and 392df50.

📒 Files selected for processing (10)
  • common-lib/src/cmr/common/api/errors.clj
  • common-lib/src/cmr/common/cache.clj
  • common-lib/src/cmr/common/cache/in_memory_cache.clj
  • common-lib/test/cmr/common/test/cache/in_memory_cache.clj
  • mock-echo-app/src/cmr/mock_echo/api/routes.clj
  • mock-echo-app/src/cmr/mock_echo/api/urs.clj
  • system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/tokens.clj
  • transmit-lib/src/cmr/transmit/urs.clj
🚧 Files skipped from review as they are similar to previous changes (1)
  • common-lib/test/cmr/common/test/cache/in_memory_cache.clj

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj (1)

100-112: Consider removing unused token-key variable.

The token-key computed on line 103 is never used in the test assertions. If no cache key verification is intended, this variable can be removed to improve clarity.

♻️ Proposed fix to remove unused variable
 (deftest non-transient-errors-are-cached-test
   (testing "Non-transient errors are cached for 5 minutes"
-    (let [token "ABC-INV-ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"
-          token-key (DigestUtils/sha256Hex token)]
+    (let [token "ABC-INV-ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"]
       (let [resp (ingest/ingest-concept (data-umm-c/collection-concept {}) {:token token})]
         (is (= 401 (:status resp))))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`
around lines 100 - 112, The test non-transient-errors-are-cached-test computes
token-key using (DigestUtils/sha256Hex token) but never uses it; remove the
unused token-key binding to clean up the test, i.e., delete the token-key
binding declaration that references DigestUtils/sha256Hex and keep the rest of
the test (token, ingest/ingest-concept calls, cache-util/list-cache-keys check,
and dev-sys-util/advance-time! invocation) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 100-112: The test non-transient-errors-are-cached-test computes
token-key using (DigestUtils/sha256Hex token) but never uses it; remove the
unused token-key binding to clean up the test, i.e., delete the token-key
binding declaration that references DigestUtils/sha256Hex and keep the rest of
the test (token, ingest/ingest-concept calls, cache-util/list-cache-keys check,
and dev-sys-util/advance-time! invocation) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 443d2468-6180-4ffb-89de-9406fc1346c5

📥 Commits

Reviewing files that changed from the base of the PR and between 392df50 and faf4a20.

📒 Files selected for processing (2)
  • system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/launchpad_user_cache.clj

@daniel-zamora daniel-zamora force-pushed the CMR-11116 branch 2 times, most recently from 5ef6ea1 to 4ee6718 Compare March 10, 2026 04:35
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 29-31: The transient-error-types set currently only excludes
:too-many-requests and :gateway-timeout but upstream urs/get-launchpad-user can
throw :request-timeout, :internal-server-error, :bad-gateway, and
:service-unavailable; update the transient-error-types definition to include
those four symbols so the cache logic that treats other errors as invalid-token
results (e.g. the invalid-token-timeout/invalid-token caching path) will skip
caching for these transient server-side failures and let callers retry
immediately.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f0de632c-521c-49f9-9bbe-2174a428b74c

📥 Commits

Reviewing files that changed from the base of the PR and between faf4a20 and 5ef6ea1.

📒 Files selected for processing (10)
  • common-lib/src/cmr/common/api/errors.clj
  • common-lib/src/cmr/common/cache.clj
  • common-lib/src/cmr/common/cache/in_memory_cache.clj
  • common-lib/test/cmr/common/test/cache/in_memory_cache.clj
  • mock-echo-app/src/cmr/mock_echo/api/routes.clj
  • mock-echo-app/src/cmr/mock_echo/api/urs.clj
  • system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/tokens.clj
  • transmit-lib/src/cmr/transmit/urs.clj
🚧 Files skipped from review as they are similar to previous changes (3)
  • common-lib/src/cmr/common/cache/in_memory_cache.clj
  • system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
  • common-lib/src/cmr/common/api/errors.clj

Comment on lines +29 to +31
(def transient-error-types
"Error types that should not be cached - they are transient and should be retried immediately"
#{:too-many-requests :gateway-timeout})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Treat the new upstream availability errors as transient too.

urs/get-launchpad-user now throws :request-timeout, :internal-server-error, :bad-gateway, and :service-unavailable, but this set still only exempts 429 and 504. The catch block below will therefore cache those server-side failures as invalid-token results for invalid-token-timeout, which can pin a temporary EDL/Launchpad outage onto valid tokens.

Suggested fix
 (def transient-error-types
   "Error types that should not be cached - they are transient and should be retried immediately"
-  #{:too-many-requests :gateway-timeout})
+  #{:request-timeout
+    :too-many-requests
+    :internal-server-error
+    :bad-gateway
+    :service-unavailable
+    :gateway-timeout})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj` around lines 29 - 31,
The transient-error-types set currently only excludes :too-many-requests and
:gateway-timeout but upstream urs/get-launchpad-user can throw :request-timeout,
:internal-server-error, :bad-gateway, and :service-unavailable; update the
transient-error-types definition to include those four symbols so the cache
logic that treats other errors as invalid-token results (e.g. the
invalid-token-timeout/invalid-token caching path) will skip caching for these
transient server-side failures and let callers retry immediately.

@codecov-commenter
Copy link

codecov-commenter commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 7.31707% with 76 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.84%. Comparing base (22c44cb) to head (81088e0).

Files with missing lines Patch % Lines
transmit-lib/src/cmr/transmit/urs.clj 0.00% 35 Missing ⚠️
...smit-lib/src/cmr/transmit/launchpad_user_cache.clj 8.00% 23 Missing ⚠️
mock-echo-app/src/cmr/mock_echo/api/urs.clj 16.66% 15 Missing ⚠️
mock-echo-app/src/cmr/mock_echo/api/routes.clj 0.00% 2 Missing ⚠️
transmit-lib/src/cmr/transmit/tokens.clj 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2398      +/-   ##
==========================================
- Coverage   57.88%   57.84%   -0.05%     
==========================================
  Files        1067     1067              
  Lines       73351    73417      +66     
  Branches     2121     2122       +1     
==========================================
+ Hits        42460    42466       +6     
- Misses      28912    28971      +59     
- Partials     1979     1980       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants