CMR-11116 and CMR-11106: Improve error handling for launchpad tokens by daniel-zamora · Pull Request #2398 · nasa/Common-Metadata-Repository

daniel-zamora · 2026-03-09T20:12:29Z

Overview

What is the objective?

Fix launchpad token timeout behavior and improve error propagation from EDL (Earthdata Login) to provide more actionable error messages to users.

What are the changes?

CMR-11116: Fixed launchpad invalid token timeout

Fixed cache eviction logic so errors are not cached when re-validating expired tokens
Added transient error handling for 429 (rate limit) and 504 (gateway timeout) errors - these are no longer cached and are passed through immediately for retry
Corrected token expiration behavior to use absolute expiration times instead of relative times

CMR-11106: Added error propagation from EDL for launchpad tokens

Added specific error type mappings for different HTTP status codes (408, 429, 500, 502, 503, 504, 401)
EDL error messages are now passed through directly to users instead of being wrapped in generic error messages
Improved error messages to be more specific about the nature of the failure (e.g., "Rate limit exceeded", "Gateway timeout")

CMR-11116: Added tests for launchpad token caching behavior

Added comprehensive tests for token expiration, cache eviction, transient vs non-transient error handling
Enhanced mock URS to support stateful token expiration for realistic testing
Added cache size assertions and multi-key eviction tests

What areas of the application does this impact?

Ingest - Launchpad token authentication and caching
Access Control - User authentication via Launchpad tokens
transmit-lib - URS client and launchpad user cache
mock-echo-app - Mock URS for testing

Required Checklist

New and existing unit and int tests pass locally and remotely
clj-kondo has been run locally and all errors in changed files are corrected
I have commented my code, particularly in hard-to-understand areas
I have made changes to the documentation (if necessary)
My changes generate no new warnings

Additional Checklist

I have removed unnecessary/dead code and imports in files I have changed
I have cleaned up integration tests by doing one or more of the following:
- migrated any are2 tests to are3 in files I have changed
- de-duped, consolidated, removed dead int tests
- transformed applicable int tests into unit tests
- reduced number of system state resets by updating fixtures. Ex) (use-fixtures :each (ingest/reset-fixture {})) to be :once instead of :each

Summary by CodeRabbit

New Features
- Enhanced error handling for additional HTTP statuses (408, 429, 500, 502, 503, 504) with clearer responses for timeouts, rate limits, and gateway issues.
- Improved Launchpad token validation with expiration tracking and deterministic test tokens.
- Added cache eviction capability for finer cache control and to avoid caching transient failures.
Tests
- Added tests covering transient vs. non-transient error caching and cache eviction behavior.

coderabbitai · 2026-03-09T20:12:57Z

Warning

Rate limit exceeded

@daniel-zamora has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 38 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 7df8a62d-7c15-4687-ae45-b4f714d90e11

📥 Commits

Reviewing files that changed from the base of the PR and between 5ef6ea1 and 81088e0.

📒 Files selected for processing (2)

system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
transmit-lib/src/cmr/transmit/launchpad_user_cache.clj

📝 Walkthrough

Walkthrough

Adds cache eviction to the cache API and in-memory implementation; expands HTTP error-to-status mappings; refines Launchpad token validation and caching (transient-error detection, SHA-256 cache keys, eviction on expiration, and test-token behaviors); and adds tests for eviction and Launchpad error caching behavior.

Changes

Cohort / File(s)	Summary
Cache protocol & impl `common-lib/src/cmr/common/cache.clj`, `common-lib/src/cmr/common/cache/in_memory_cache.clj`	Added `evict` method to `CmrCache` protocol and implemented `evict` in `InMemoryCache` (uses `core.cache/evict` via atom swap).
Cache tests `common-lib/test/cmr/common/test/cache/in_memory_cache.clj`	Added tests verifying `evict` removes keys, triggers reload logic, and preserves unrelated entries.
HTTP error mappings & handling `common-lib/src/cmr/common/api/errors.clj`, `transmit-lib/src/cmr/transmit/urs.clj`, `transmit-lib/src/cmr/transmit/tokens.clj`	Expanded `type->http-status-code` with `:request-timeout`, `:internal-server-error`, `:bad-gateway`; added specific handling for 408/429/500/502/503/504 in Launchpad response processing and 429 handling in token flow.
Launchpad caching & token logic `transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`, `transmit-lib/src/cmr/transmit/urs.clj`	Added `transient-error-types`, switched cache keys to `DigestUtils/sha256Hex(token)`, evict expired entries, avoid caching transient errors (e.g., 429, 504), and standardize cached error payloads with `:error-type` and expiration metadata.
Mock Echo test tokens & routes `mock-echo-app/src/cmr/mock_echo/api/urs.clj`, `mock-echo-app/src/cmr/mock_echo/api/routes.clj`	Added `launchpad-token-validations` atom and `reset-launchpad-tokens!`; deterministic test-token expiration behaviors; route reset now calls `reset-launchpad-tokens!`; removed unused requires.
Integration tests `system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`	Added tests ensuring transient errors (429, 504) are not cached and non-transient errors (401) are cached; changed token-key derivation to SHA-256.
Project manifests `project.clj`, `deps.edn`	Manifest files referenced/updated for dependency/import changes (e.g., DigestUtils import usage).

Sequence Diagram

sequenceDiagram
    participant Client
    participant Cache as "Token Cache\n(ring/cache)"
    participant EDL as "EDL / Launchpad"
    participant Err as "Error Handler"

    Client->>Cache: validate(token)
    alt Cached & valid
        Cache-->>Client: return cached user
    else Cached & error
        Cache-->>Client: return cached error payload
    else Not cached or expired
        Cache->>Cache: evict(key)
        Client->>EDL: request validation(token)
        alt Success (200)
            EDL-->>Cache: user data
            Cache->>Cache: store user with expiration
            Cache-->>Client: return user
        else Transient (429,504)
            EDL-->>Err: transient error
            Err-->>Client: return error (do not cache)
        else Non-transient (401,408,500,502,503)
            EDL-->>Cache: error payload
            Cache->>Cache: store error with expiration
            Cache-->>Client: return error
        end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

CMR-10969: Cache failed LP token validation #2354: Related changes to launchpad_user_cache behavior and handling of failed Launchpad validations.
CMR-10969: Cache failed LP token validation #2363: Related modifications to transient-error handling and eviction logic in transmit-lib's Launchpad caching.

Suggested reviewers

zimzoom
jmaeng72

Poem

🐰
I nibbled stale keys beneath the log,
Evicted cobwebs from the cachey bog.
Tokens ticked, some fleeting, some grave,
I keep the swift, but never save the wave.
Hop, validate — a rabbit's rave!

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the main changes: improving error handling for launchpad tokens across two related issues (CMR-11116 and CMR-11106).
Description check	✅ Passed	The PR description comprehensively covers objectives, detailed changes across three main areas (cache eviction, error propagation, tests), and impacted components. Required and additional checklists are included.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch CMR-11116

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj (1)
98-109: Clarify test intent: verifies re-fetch after cache expiry, not cached error persistence.

The test name suggests "Non-transient errors are cached for 5 minutes," but advancing time by 301 seconds exceeds the 5-minute (300s) cache TTL. At that point, the cache entry would have expired, so the second request is actually re-fetching from the mock (which still returns 401).

If the intent is to verify the error remains cached within the 5-minute window, consider advancing time by less than 300 seconds. If the intent is to verify behavior after cache expiry, the test name and comment should reflect that.
💡 If testing cached error persistence within TTL
-      (dev-sys-util/advance-time! 301)
+      (dev-sys-util/advance-time! 200) ;; Still within 5-minute cache TTL
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`
around lines 98 - 109, The test non-transient-errors-are-cached-test is
ambiguous: either change the TTL advance to stay within 5 minutes (use
dev-sys-util/advance-time! 299) to verify the 401 stays cached and the second
ingest/ingest-concept still hits the cache, or keep the 301s advance but rename
the test (and any assertion text) to indicate "after cache expiry" and add an
assertion using cache-util/list-cache-keys (or check token-key) that the cache
entry is gone before the second ingest/ingest-concept; update the test
name/comment accordingly and adjust assertions to match the chosen intent.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@common-lib/src/cmr/common/cache.clj`:
- Around line 41-43: Implement the missing evict method for each CmrCache
implementation: add an evict implementation to FallbackCache that calls evict on
both primary and fallback caches (or on the underlying cache instances
represented in the FallbackCache record), add an evict to
SingleThreadLookupCache that removes the key from its local lookup map and
delegates to the wrapped cache's evict, and add an evict to RedisCache that
performs the Redis deletion logic currently used for put/remove (using the
existing redis client/connection code) so it deletes the key from Redis; ensure
each method signature matches the CmrCache protocol (evict [cache cache-key])
and delegates to the appropriate existing functions/fields in the FallbackCache,
SingleThreadLookupCache, and RedisCache implementations.

In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 58-72: When a cached Launchpad entry expires and
get-launchpad-user-fn returns a non-valid result, the code currently throws the
error without re-caching it; change the expired-path so that on a non-valid
fresh-result you set the cache (using cache/set-value with the fresh-result)
before throwing the service error, mirroring the initial cache-miss behavior for
errors (use the same get-launchpad-user-fn handling and caching behavior), and
keep the existing errors/throw-service-error call to raise the error after
caching.

---

Nitpick comments:
In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 98-109: The test non-transient-errors-are-cached-test is
ambiguous: either change the TTL advance to stay within 5 minutes (use
dev-sys-util/advance-time! 299) to verify the 401 stays cached and the second
ingest/ingest-concept still hits the cache, or keep the 301s advance but rename
the test (and any assertion text) to indicate "after cache expiry" and add an
assertion using cache-util/list-cache-keys (or check token-key) that the cache
entry is gone before the second ingest/ingest-concept; update the test
name/comment accordingly and adjust assertions to match the chosen intent.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f90ba1e-3569-4914-8eaa-2a22732dffa3

📥 Commits

Reviewing files that changed from the base of the PR and between 22c44cb and 3d77b74.

📒 Files selected for processing (10)

common-lib/src/cmr/common/api/errors.clj
common-lib/src/cmr/common/cache.clj
common-lib/src/cmr/common/cache/in_memory_cache.clj
common-lib/test/cmr/common/test/cache/in_memory_cache.clj
mock-echo-app/src/cmr/mock_echo/api/routes.clj
mock-echo-app/src/cmr/mock_echo/api/urs.clj
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
transmit-lib/src/cmr/transmit/launchpad_user_cache.clj
transmit-lib/src/cmr/transmit/tokens.clj
transmit-lib/src/cmr/transmit/urs.clj

coderabbitai · 2026-03-09T20:19:39Z

common-lib/src/cmr/common/cache.clj

+  (evict
+    [cache cache-key]
+    "Removes the value associated with the given key from the cache.")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Find all files that implement CmrCache protocol echo "=== Files implementing CmrCache protocol ===" rg -l 'cache/CmrCache|defrecord.*CmrCache' --type clojure echo "" echo "=== Checking for evict implementations ===" # Search for evict implementations in cache-related files rg -n '\(evict\b' --type clojure -g '*cache*'

Repository: nasa/Common-Metadata-Repository

Length of output: 38625

🏁 Script executed:

#!/bin/bash # Check FallbackCache implementation echo "=== FallbackCache implementation ===" rg -n "defrecord\|evict" common-lib/src/cmr/common/cache/fallback_cache.clj -A 2 echo "" echo "=== SingleThreadLookupCache implementation ===" rg -n "defrecord\|evict" common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj -A 2 echo "" echo "=== RedisCache implementation ===" rg -n "defrecord\|evict" redis-utils-lib/src/cmr/redis_utils/redis_cache.clj -A 2

Repository: nasa/Common-Metadata-Repository

Length of output: 194

🏁 Script executed:

#!/bin/bash # Check FallbackCache - read file and look for extend-type or defrecord and protocol methods echo "=== FallbackCache file ===" wc -l common-lib/src/cmr/common/cache/fallback_cache.clj cat -n common-lib/src/cmr/common/cache/fallback_cache.clj | head -100 echo "" echo "=== SingleThreadLookupCache file ===" wc -l common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj cat -n common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj | head -120 echo "" echo "=== RedisCache file ===" wc -l redis-utils-lib/src/cmr/redis_utils/redis_cache.clj cat -n redis-utils-lib/src/cmr/redis_utils/redis_cache.clj | head -150

Repository: nasa/Common-Metadata-Repository

Length of output: 14945

🏁 Script executed:

cat -n common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj | tail -85

Repository: nasa/Common-Metadata-Repository

Length of output: 3330

🏁 Script executed:

#!/bin/bash # Check the CmrCache protocol definition echo "=== CmrCache Protocol ===" cat -n common-lib/src/cmr/common/cache.clj | head -60 echo "" echo "=== in_memory_cache evict implementation ===" cat -n common-lib/src/cmr/common/cache/in_memory_cache.clj | sed -n '117,180p'

Repository: nasa/Common-Metadata-Repository

Length of output: 5082

🏁 Script executed:

#!/bin/bash # Check the test implementations echo "=== granule_counts_cache_test ===" rg -n "defrecord|extend-type.*CmrCache|evict" search-app/test/cmr/search/test/unit/data/granule_counts_cache_test.clj echo "" echo "=== has_granules_or_cwic_test ===" rg -n "defrecord|extend-type.*CmrCache|evict" search-app/test/cmr/search/test/unit/data/complex_to_simple_converters/has_granules_or_cwic_test.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 143

Missing evict implementation in three CmrCache implementations.

The evict method is required by the CmrCache protocol but is missing from FallbackCache, SingleThreadLookupCache, and RedisCache. Calling evict on these implementations will throw AbstractMethodError at runtime.

FallbackCache (common-lib/src/cmr/common/cache/fallback_cache.clj)

SingleThreadLookupCache (common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj)

RedisCache (redis-utils-lib/src/cmr/redis_utils/redis_cache.clj)

Each implementation should delegate evict appropriately to underlying caches or operations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@common-lib/src/cmr/common/cache.clj` around lines 41 - 43, Implement the missing evict method for each CmrCache implementation: add an evict implementation to FallbackCache that calls evict on both primary and fallback caches (or on the underlying cache instances represented in the FallbackCache record), add an evict to SingleThreadLookupCache that removes the key from its local lookup map and delegates to the wrapped cache's evict, and add an evict to RedisCache that performs the Redis deletion logic currently used for put/remove (using the existing redis client/connection code) so it deletes the key from Redis; ensure each method signature matches the CmrCache protocol (evict [cache cache-key]) and delegates to the appropriate existing functions/fields in the FallbackCache, SingleThreadLookupCache, and RedisCache implementations.

transmit-lib/src/cmr/transmit/launchpad_user_cache.clj

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (1)

common-lib/src/cmr/common/cache.clj (1)
41-43: ⚠️ Potential issue | 🟠 Major

Finish the evict rollout before extending CmrCache.

FallbackCache, SingleThreadLookupCache, and RedisCache still implement cache/CmrCache without an evict method. Any cache/evict dispatch to one of those types will fail, so this protocol change is still incomplete.

Run this to confirm every CmrCache implementation picked up the new method. Expected result: no MISSING evict lines.
#!/bin/bash
set -euo pipefail

rg -l --type clojure 'cache/CmrCache' common-lib/src redis-utils-lib/src | while read -r file; do
  if ! rg -q '\(evict\b' "$file"; then
    echo "MISSING evict: $file"
  fi
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@common-lib/src/cmr/common/cache.clj` around lines 41 - 43, The protocol
CmrCache was extended with an evict method but FallbackCache,
SingleThreadLookupCache, and RedisCache do not implement it, causing dispatch
failures; add an evict implementation to each of these types (implement the
protocol method named evict / cache/evict) that matches their semantics: for
FallbackCache delegate eviction to both/backing caches or the appropriate inner
cache, for SingleThreadLookupCache remove the key from its local lookup store
and delegate to its backing cache, and for RedisCache perform the appropriate
key removal from Redis; after changes run the provided rg script to verify no
"MISSING evict" results.

🧹 Nitpick comments (2)

system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj (1)

98-110: This test still doesn't prove the cached 401 was refreshed after 5 minutes.

The second request returns 401 whether the cache expired and revalidated or the stale error was served again. Capturing the cached :expiration-time before advancing time and asserting it moves forward afterward would make this test cover the timeout behavior it is naming.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`
around lines 98 - 110, The test non-transient-errors-are-cached-test should
capture the cache entry's expiration timestamp before advancing time and assert
that it is updated after the second ingest; locate the cache entry using the
token-key variable and cache-util/list-cache-keys or a cache-read helper to
fetch the cached map (which contains :expiration-time), store that
:expiration-time in a local binding, call dev-sys-util/advance-time! 301 and
perform the second ingest, then read the cache entry again and assert that the
new :expiration-time is greater than the previously captured value to prove the
entry was refreshed after expiry.

mock-echo-app/src/cmr/mock_echo/api/urs.clj (1)

66-76: Consider using the stored :expires-in value for consistency.

The :expires-in is stored in the atom on line 75 but never read back. Line 70 always uses the local constant expires-in instead of (:expires-in validation-info). While this works correctly for the current test token (always 1600 seconds), it would be more consistent to use the stored value:

♻️ Proposed fix to use stored expires-in

       (let [expires-in 1600
             validation-info (get `@launchpad-token-validations` token)]
         (if validation-info
-          (let [elapsed-seconds (t/in-seconds (t/interval (:first-validated-at validation-info) (time-keeper/now)))]
-            (if (>= elapsed-seconds expires-in)
+          (let [stored-expires-in (:expires-in validation-info)
+                elapsed-seconds (t/in-seconds (t/interval (:first-validated-at validation-info) (time-keeper/now)))]
+            (if (>= elapsed-seconds stored-expires-in)
               {:status 401 :body {:error (format "Launchpad token (partially redacted) [%s] has expired."
                                                   (common-util/scrub-token token))}}
-              {:status 200 :body {:uid "user1" :lp_token_expires_in (- expires-in elapsed-seconds)}}))
+              {:status 200 :body {:uid "user1" :lp_token_expires_in (- stored-expires-in elapsed-seconds)}}))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@mock-echo-app/src/cmr/mock_echo/api/urs.clj` around lines 66 - 76, The code
binds a local expires-in constant but then stores :expires-in in
launchpad-token-validations and never reads it; update the logic in the branch
that reads validation-info to use (:expires-in validation-info) instead of the
local expires-in when comparing elapsed-seconds and when computing
lp_token_expires_in, so the code consistently uses the stored :expires-in value
from validation-info (the atom launchpad-token-validations and the map key
:expires-in) rather than the hardcoded expires-in local.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 83-96: Update the assertions in transient-errors-not-cached-test
to match the new Launchpad messages emitted by
transmit-lib/src/cmr/transmit/urs.clj: replace the re-find checks for "Rate
limit exceeded" and "Gateway timeout" with patterns that match "Launchpad rate
limit exceeded" and "(gateway timeout)" (or use case-insensitive patterns like
(?i)rate limit exceeded and (?i)gateway timeout) so the ingest/ingest-concept
responses asserted in transient-errors-not-cached-test will match the current
URS messages.

In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 56-57: Currently cache-key is created with (keyword (str (hash
token))) which interns a keyword per token and exposes collision/memory-leak
risks; change cache-key to be a plain string (not a keyword) derived from the
token or, preferably, a collision-resistant digest/HMAC (e.g., SHA-256 or
HMAC-SHA256 of token) and use that string when calling cache/get-value with
get-launchpad-user-fn so tokens are not interned as keywords and collisions are
minimized.
- Around line 61-74: After evicting the stale entry, do not write a failing
revalidation result back into the cache; change the logic in the block using
cache/evict, get-launchpad-user-fn, cache/set-value, and
errors/throw-service-error so that if fresh-result is valid you set
cache/set-value and return it, but if fresh-result is invalid you do NOT call
cache/set-value and instead directly raise the error
(errors/throw-service-error) using the existing (:error-type/:error-message)
(with the scrubbed token message fallback); this preserves eviction on failed
revalidation instead of caching the failure.

---

Duplicate comments:
In `@common-lib/src/cmr/common/cache.clj`:
- Around line 41-43: The protocol CmrCache was extended with an evict method but
FallbackCache, SingleThreadLookupCache, and RedisCache do not implement it,
causing dispatch failures; add an evict implementation to each of these types
(implement the protocol method named evict / cache/evict) that matches their
semantics: for FallbackCache delegate eviction to both/backing caches or the
appropriate inner cache, for SingleThreadLookupCache remove the key from its
local lookup store and delegate to its backing cache, and for RedisCache perform
the appropriate key removal from Redis; after changes run the provided rg script
to verify no "MISSING evict" results.

---

Nitpick comments:
In `@mock-echo-app/src/cmr/mock_echo/api/urs.clj`:
- Around line 66-76: The code binds a local expires-in constant but then stores
:expires-in in launchpad-token-validations and never reads it; update the logic
in the branch that reads validation-info to use (:expires-in validation-info)
instead of the local expires-in when comparing elapsed-seconds and when
computing lp_token_expires_in, so the code consistently uses the stored
:expires-in value from validation-info (the atom launchpad-token-validations and
the map key :expires-in) rather than the hardcoded expires-in local.

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 98-110: The test non-transient-errors-are-cached-test should
capture the cache entry's expiration timestamp before advancing time and assert
that it is updated after the second ingest; locate the cache entry using the
token-key variable and cache-util/list-cache-keys or a cache-read helper to
fetch the cached map (which contains :expiration-time), store that
:expiration-time in a local binding, call dev-sys-util/advance-time! 301 and
perform the second ingest, then read the cache entry again and assert that the
new :expiration-time is greater than the previously captured value to prove the
entry was refreshed after expiry.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e071585c-c7b1-4cf8-bfc2-3caf1c259ae8

📥 Commits

Reviewing files that changed from the base of the PR and between 3d77b74 and 392df50.

📒 Files selected for processing (10)

common-lib/src/cmr/common/api/errors.clj
common-lib/src/cmr/common/cache.clj
common-lib/src/cmr/common/cache/in_memory_cache.clj
common-lib/test/cmr/common/test/cache/in_memory_cache.clj
mock-echo-app/src/cmr/mock_echo/api/routes.clj
mock-echo-app/src/cmr/mock_echo/api/urs.clj
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
transmit-lib/src/cmr/transmit/launchpad_user_cache.clj
transmit-lib/src/cmr/transmit/tokens.clj
transmit-lib/src/cmr/transmit/urs.clj

🚧 Files skipped from review as they are similar to previous changes (1)

common-lib/test/cmr/common/test/cache/in_memory_cache.clj

system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj

transmit-lib/src/cmr/transmit/launchpad_user_cache.clj

coderabbitai

🧹 Nitpick comments (1)

system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj (1)

100-112: Consider removing unused token-key variable.

The token-key computed on line 103 is never used in the test assertions. If no cache key verification is intended, this variable can be removed to improve clarity.

♻️ Proposed fix to remove unused variable

 (deftest non-transient-errors-are-cached-test
   (testing "Non-transient errors are cached for 5 minutes"
-    (let [token "ABC-INV-ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"
-          token-key (DigestUtils/sha256Hex token)]
+    (let [token "ABC-INV-ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"]
       (let [resp (ingest/ingest-concept (data-umm-c/collection-concept {}) {:token token})]
         (is (= 401 (:status resp))))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`
around lines 100 - 112, The test non-transient-errors-are-cached-test computes
token-key using (DigestUtils/sha256Hex token) but never uses it; remove the
unused token-key binding to clean up the test, i.e., delete the token-key
binding declaration that references DigestUtils/sha256Hex and keep the rest of
the test (token, ingest/ingest-concept calls, cache-util/list-cache-keys check,
and dev-sys-util/advance-time! invocation) unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 100-112: The test non-transient-errors-are-cached-test computes
token-key using (DigestUtils/sha256Hex token) but never uses it; remove the
unused token-key binding to clean up the test, i.e., delete the token-key
binding declaration that references DigestUtils/sha256Hex and keep the rest of
the test (token, ingest/ingest-concept calls, cache-util/list-cache-keys check,
and dev-sys-util/advance-time! invocation) unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 443d2468-6180-4ffb-89de-9406fc1346c5

📥 Commits

Reviewing files that changed from the base of the PR and between 392df50 and faf4a20.

📒 Files selected for processing (2)

system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
transmit-lib/src/cmr/transmit/launchpad_user_cache.clj

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 29-31: The transient-error-types set currently only excludes
:too-many-requests and :gateway-timeout but upstream urs/get-launchpad-user can
throw :request-timeout, :internal-server-error, :bad-gateway, and
:service-unavailable; update the transient-error-types definition to include
those four symbols so the cache logic that treats other errors as invalid-token
results (e.g. the invalid-token-timeout/invalid-token caching path) will skip
caching for these transient server-side failures and let callers retry
immediately.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f0de632c-521c-49f9-9bbe-2174a428b74c

📥 Commits

Reviewing files that changed from the base of the PR and between faf4a20 and 5ef6ea1.

📒 Files selected for processing (10)

common-lib/src/cmr/common/api/errors.clj
common-lib/src/cmr/common/cache.clj
common-lib/src/cmr/common/cache/in_memory_cache.clj
common-lib/test/cmr/common/test/cache/in_memory_cache.clj
mock-echo-app/src/cmr/mock_echo/api/routes.clj
mock-echo-app/src/cmr/mock_echo/api/urs.clj
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
transmit-lib/src/cmr/transmit/launchpad_user_cache.clj
transmit-lib/src/cmr/transmit/tokens.clj
transmit-lib/src/cmr/transmit/urs.clj

🚧 Files skipped from review as they are similar to previous changes (3)

common-lib/src/cmr/common/cache/in_memory_cache.clj
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
common-lib/src/cmr/common/api/errors.clj

coderabbitai · 2026-03-10T04:37:12Z

transmit-lib/src/cmr/transmit/launchpad_user_cache.clj

+(def transient-error-types
+  "Error types that should not be cached - they are transient and should be retried immediately"
+  #{:too-many-requests :gateway-timeout})


⚠️ Potential issue | 🟠 Major

Treat the new upstream availability errors as transient too.

urs/get-launchpad-user now throws :request-timeout, :internal-server-error, :bad-gateway, and :service-unavailable, but this set still only exempts 429 and 504. The catch block below will therefore cache those server-side failures as invalid-token results for invalid-token-timeout, which can pin a temporary EDL/Launchpad outage onto valid tokens.

Suggested fix

(def transient-error-types "Error types that should not be cached - they are transient and should be retried immediately" - #{:too-many-requests :gateway-timeout}) + #{:request-timeout + :too-many-requests + :internal-server-error + :bad-gateway + :service-unavailable + :gateway-timeout})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj` around lines 29 - 31, The transient-error-types set currently only excludes :too-many-requests and :gateway-timeout but upstream urs/get-launchpad-user can throw :request-timeout, :internal-server-error, :bad-gateway, and :service-unavailable; update the transient-error-types definition to include those four symbols so the cache logic that treats other errors as invalid-token results (e.g. the invalid-token-timeout/invalid-token caching path) will skip caching for these transient server-side failures and let callers retry immediately.

codecov-commenter · 2026-03-10T06:13:22Z

Codecov Report

❌ Patch coverage is 7.31707% with 76 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.84%. Comparing base (22c44cb) to head (81088e0).

Files with missing lines	Patch %	Lines
transmit-lib/src/cmr/transmit/urs.clj	0.00%	35 Missing ⚠️
...smit-lib/src/cmr/transmit/launchpad_user_cache.clj	8.00%	23 Missing ⚠️
mock-echo-app/src/cmr/mock_echo/api/urs.clj	16.66%	15 Missing ⚠️
mock-echo-app/src/cmr/mock_echo/api/routes.clj	0.00%	2 Missing ⚠️
transmit-lib/src/cmr/transmit/tokens.clj	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2398      +/-   ##
==========================================
- Coverage   57.88%   57.84%   -0.05%     
==========================================
  Files        1067     1067              
  Lines       73351    73417      +66     
  Branches     2121     2122       +1     
==========================================
+ Hits        42460    42466       +6     
- Misses      28912    28971      +59     
- Partials     1979     1980       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

daniel-zamora changed the title ~~Cmr 11116~~ CMR-11116 and CMR-11106: Improve error handling for launchpad tokens Mar 9, 2026

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

daniel-zamora force-pushed the CMR-11116 branch from 3d77b74 to 392df50 Compare March 9, 2026 21:08

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj Show resolved Hide resolved

transmit-lib/src/cmr/transmit/launchpad_user_cache.clj Outdated Show resolved Hide resolved

transmit-lib/src/cmr/transmit/launchpad_user_cache.clj Outdated Show resolved Hide resolved

daniel-zamora added 3 commits March 10, 2026 00:25

CMR-11116: fixes launchpad invalid token timeout

5afbc78

CMR-11106: adds error propagation from edl for launchpad tokens

bd63dcc

CMR-11116: adds tests for launchpad token caching behavior

5761448

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

daniel-zamora force-pushed the CMR-11116 branch 2 times, most recently from 5ef6ea1 to 4ee6718 Compare March 10, 2026 04:35

coderabbitai bot reviewed Mar 10, 2026

View reviewed changes

daniel-zamora force-pushed the CMR-11116 branch from 4ee6718 to 985b20b Compare March 10, 2026 04:42

CMR-11116: pr comments

81088e0

daniel-zamora force-pushed the CMR-11116 branch from 985b20b to 81088e0 Compare March 10, 2026 04:48

Conversation

daniel-zamora commented Mar 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What is the objective?

What are the changes?

What areas of the application does this impact?

Required Checklist

Additional Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

daniel-zamora commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

codecov-commenter commented Mar 10, 2026 •

edited

Loading