CMR-11116 and CMR-11106: Improve error handling for launchpad tokens#2398
CMR-11116 and CMR-11106: Improve error handling for launchpad tokens#2398daniel-zamora wants to merge 4 commits intomasterfrom
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds cache eviction to the cache API and in-memory implementation; expands HTTP error-to-status mappings; refines Launchpad token validation and caching (transient-error detection, SHA-256 cache keys, eviction on expiration, and test-token behaviors); and adds tests for eviction and Launchpad error caching behavior. Changes
Sequence DiagramsequenceDiagram
participant Client
participant Cache as "Token Cache\n(ring/cache)"
participant EDL as "EDL / Launchpad"
participant Err as "Error Handler"
Client->>Cache: validate(token)
alt Cached & valid
Cache-->>Client: return cached user
else Cached & error
Cache-->>Client: return cached error payload
else Not cached or expired
Cache->>Cache: evict(key)
Client->>EDL: request validation(token)
alt Success (200)
EDL-->>Cache: user data
Cache->>Cache: store user with expiration
Cache-->>Client: return user
else Transient (429,504)
EDL-->>Err: transient error
Err-->>Client: return error (do not cache)
else Non-transient (401,408,500,502,503)
EDL-->>Cache: error payload
Cache->>Cache: store error with expiration
Cache-->>Client: return error
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj (1)
98-109: Clarify test intent: verifies re-fetch after cache expiry, not cached error persistence.The test name suggests "Non-transient errors are cached for 5 minutes," but advancing time by 301 seconds exceeds the 5-minute (300s) cache TTL. At that point, the cache entry would have expired, so the second request is actually re-fetching from the mock (which still returns 401).
If the intent is to verify the error remains cached within the 5-minute window, consider advancing time by less than 300 seconds. If the intent is to verify behavior after cache expiry, the test name and comment should reflect that.
💡 If testing cached error persistence within TTL
- (dev-sys-util/advance-time! 301) + (dev-sys-util/advance-time! 200) ;; Still within 5-minute cache TTL🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj` around lines 98 - 109, The test non-transient-errors-are-cached-test is ambiguous: either change the TTL advance to stay within 5 minutes (use dev-sys-util/advance-time! 299) to verify the 401 stays cached and the second ingest/ingest-concept still hits the cache, or keep the 301s advance but rename the test (and any assertion text) to indicate "after cache expiry" and add an assertion using cache-util/list-cache-keys (or check token-key) that the cache entry is gone before the second ingest/ingest-concept; update the test name/comment accordingly and adjust assertions to match the chosen intent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@common-lib/src/cmr/common/cache.clj`:
- Around line 41-43: Implement the missing evict method for each CmrCache
implementation: add an evict implementation to FallbackCache that calls evict on
both primary and fallback caches (or on the underlying cache instances
represented in the FallbackCache record), add an evict to
SingleThreadLookupCache that removes the key from its local lookup map and
delegates to the wrapped cache's evict, and add an evict to RedisCache that
performs the Redis deletion logic currently used for put/remove (using the
existing redis client/connection code) so it deletes the key from Redis; ensure
each method signature matches the CmrCache protocol (evict [cache cache-key])
and delegates to the appropriate existing functions/fields in the FallbackCache,
SingleThreadLookupCache, and RedisCache implementations.
In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 58-72: When a cached Launchpad entry expires and
get-launchpad-user-fn returns a non-valid result, the code currently throws the
error without re-caching it; change the expired-path so that on a non-valid
fresh-result you set the cache (using cache/set-value with the fresh-result)
before throwing the service error, mirroring the initial cache-miss behavior for
errors (use the same get-launchpad-user-fn handling and caching behavior), and
keep the existing errors/throw-service-error call to raise the error after
caching.
---
Nitpick comments:
In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 98-109: The test non-transient-errors-are-cached-test is
ambiguous: either change the TTL advance to stay within 5 minutes (use
dev-sys-util/advance-time! 299) to verify the 401 stays cached and the second
ingest/ingest-concept still hits the cache, or keep the 301s advance but rename
the test (and any assertion text) to indicate "after cache expiry" and add an
assertion using cache-util/list-cache-keys (or check token-key) that the cache
entry is gone before the second ingest/ingest-concept; update the test
name/comment accordingly and adjust assertions to match the chosen intent.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2f90ba1e-3569-4914-8eaa-2a22732dffa3
📒 Files selected for processing (10)
common-lib/src/cmr/common/api/errors.cljcommon-lib/src/cmr/common/cache.cljcommon-lib/src/cmr/common/cache/in_memory_cache.cljcommon-lib/test/cmr/common/test/cache/in_memory_cache.cljmock-echo-app/src/cmr/mock_echo/api/routes.cljmock-echo-app/src/cmr/mock_echo/api/urs.cljsystem-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.cljtransmit-lib/src/cmr/transmit/launchpad_user_cache.cljtransmit-lib/src/cmr/transmit/tokens.cljtransmit-lib/src/cmr/transmit/urs.clj
| (evict | ||
| [cache cache-key] | ||
| "Removes the value associated with the given key from the cache.") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Find all files that implement CmrCache protocol
echo "=== Files implementing CmrCache protocol ==="
rg -l 'cache/CmrCache|defrecord.*CmrCache' --type clojure
echo ""
echo "=== Checking for evict implementations ==="
# Search for evict implementations in cache-related files
rg -n '\(evict\b' --type clojure -g '*cache*'Repository: nasa/Common-Metadata-Repository
Length of output: 38625
🏁 Script executed:
#!/bin/bash
# Check FallbackCache implementation
echo "=== FallbackCache implementation ==="
rg -n "defrecord\|evict" common-lib/src/cmr/common/cache/fallback_cache.clj -A 2
echo ""
echo "=== SingleThreadLookupCache implementation ==="
rg -n "defrecord\|evict" common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj -A 2
echo ""
echo "=== RedisCache implementation ==="
rg -n "defrecord\|evict" redis-utils-lib/src/cmr/redis_utils/redis_cache.clj -A 2Repository: nasa/Common-Metadata-Repository
Length of output: 194
🏁 Script executed:
#!/bin/bash
# Check FallbackCache - read file and look for extend-type or defrecord and protocol methods
echo "=== FallbackCache file ==="
wc -l common-lib/src/cmr/common/cache/fallback_cache.clj
cat -n common-lib/src/cmr/common/cache/fallback_cache.clj | head -100
echo ""
echo "=== SingleThreadLookupCache file ==="
wc -l common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj
cat -n common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj | head -120
echo ""
echo "=== RedisCache file ==="
wc -l redis-utils-lib/src/cmr/redis_utils/redis_cache.clj
cat -n redis-utils-lib/src/cmr/redis_utils/redis_cache.clj | head -150Repository: nasa/Common-Metadata-Repository
Length of output: 14945
🏁 Script executed:
cat -n common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj | tail -85Repository: nasa/Common-Metadata-Repository
Length of output: 3330
🏁 Script executed:
#!/bin/bash
# Check the CmrCache protocol definition
echo "=== CmrCache Protocol ==="
cat -n common-lib/src/cmr/common/cache.clj | head -60
echo ""
echo "=== in_memory_cache evict implementation ==="
cat -n common-lib/src/cmr/common/cache/in_memory_cache.clj | sed -n '117,180p'Repository: nasa/Common-Metadata-Repository
Length of output: 5082
🏁 Script executed:
#!/bin/bash
# Check the test implementations
echo "=== granule_counts_cache_test ==="
rg -n "defrecord|extend-type.*CmrCache|evict" search-app/test/cmr/search/test/unit/data/granule_counts_cache_test.clj
echo ""
echo "=== has_granules_or_cwic_test ==="
rg -n "defrecord|extend-type.*CmrCache|evict" search-app/test/cmr/search/test/unit/data/complex_to_simple_converters/has_granules_or_cwic_test.cljRepository: nasa/Common-Metadata-Repository
Length of output: 143
Missing evict implementation in three CmrCache implementations.
The evict method is required by the CmrCache protocol but is missing from FallbackCache, SingleThreadLookupCache, and RedisCache. Calling evict on these implementations will throw AbstractMethodError at runtime.
FallbackCache(common-lib/src/cmr/common/cache/fallback_cache.clj)SingleThreadLookupCache(common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj)RedisCache(redis-utils-lib/src/cmr/redis_utils/redis_cache.clj)
Each implementation should delegate evict appropriately to underlying caches or operations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@common-lib/src/cmr/common/cache.clj` around lines 41 - 43, Implement the
missing evict method for each CmrCache implementation: add an evict
implementation to FallbackCache that calls evict on both primary and fallback
caches (or on the underlying cache instances represented in the FallbackCache
record), add an evict to SingleThreadLookupCache that removes the key from its
local lookup map and delegates to the wrapped cache's evict, and add an evict to
RedisCache that performs the Redis deletion logic currently used for put/remove
(using the existing redis client/connection code) so it deletes the key from
Redis; ensure each method signature matches the CmrCache protocol (evict [cache
cache-key]) and delegates to the appropriate existing functions/fields in the
FallbackCache, SingleThreadLookupCache, and RedisCache implementations.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
common-lib/src/cmr/common/cache.clj (1)
41-43:⚠️ Potential issue | 🟠 MajorFinish the
evictrollout before extendingCmrCache.
FallbackCache,SingleThreadLookupCache, andRedisCachestill implementcache/CmrCachewithout anevictmethod. Anycache/evictdispatch to one of those types will fail, so this protocol change is still incomplete.Run this to confirm every
CmrCacheimplementation picked up the new method. Expected result: noMISSING evictlines.#!/bin/bash set -euo pipefail rg -l --type clojure 'cache/CmrCache' common-lib/src redis-utils-lib/src | while read -r file; do if ! rg -q '\(evict\b' "$file"; then echo "MISSING evict: $file" fi done🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@common-lib/src/cmr/common/cache.clj` around lines 41 - 43, The protocol CmrCache was extended with an evict method but FallbackCache, SingleThreadLookupCache, and RedisCache do not implement it, causing dispatch failures; add an evict implementation to each of these types (implement the protocol method named evict / cache/evict) that matches their semantics: for FallbackCache delegate eviction to both/backing caches or the appropriate inner cache, for SingleThreadLookupCache remove the key from its local lookup store and delegate to its backing cache, and for RedisCache perform the appropriate key removal from Redis; after changes run the provided rg script to verify no "MISSING evict" results.
🧹 Nitpick comments (2)
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj (1)
98-110: This test still doesn't prove the cached401was refreshed after 5 minutes.The second request returns
401whether the cache expired and revalidated or the stale error was served again. Capturing the cached:expiration-timebefore advancing time and asserting it moves forward afterward would make this test cover the timeout behavior it is naming.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj` around lines 98 - 110, The test non-transient-errors-are-cached-test should capture the cache entry's expiration timestamp before advancing time and assert that it is updated after the second ingest; locate the cache entry using the token-key variable and cache-util/list-cache-keys or a cache-read helper to fetch the cached map (which contains :expiration-time), store that :expiration-time in a local binding, call dev-sys-util/advance-time! 301 and perform the second ingest, then read the cache entry again and assert that the new :expiration-time is greater than the previously captured value to prove the entry was refreshed after expiry.mock-echo-app/src/cmr/mock_echo/api/urs.clj (1)
66-76: Consider using the stored:expires-invalue for consistency.The
:expires-inis stored in the atom on line 75 but never read back. Line 70 always uses the local constantexpires-ininstead of(:expires-in validation-info). While this works correctly for the current test token (always 1600 seconds), it would be more consistent to use the stored value:♻️ Proposed fix to use stored expires-in
(let [expires-in 1600 validation-info (get `@launchpad-token-validations` token)] (if validation-info - (let [elapsed-seconds (t/in-seconds (t/interval (:first-validated-at validation-info) (time-keeper/now)))] - (if (>= elapsed-seconds expires-in) + (let [stored-expires-in (:expires-in validation-info) + elapsed-seconds (t/in-seconds (t/interval (:first-validated-at validation-info) (time-keeper/now)))] + (if (>= elapsed-seconds stored-expires-in) {:status 401 :body {:error (format "Launchpad token (partially redacted) [%s] has expired." (common-util/scrub-token token))}} - {:status 200 :body {:uid "user1" :lp_token_expires_in (- expires-in elapsed-seconds)}})) + {:status 200 :body {:uid "user1" :lp_token_expires_in (- stored-expires-in elapsed-seconds)}}))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@mock-echo-app/src/cmr/mock_echo/api/urs.clj` around lines 66 - 76, The code binds a local expires-in constant but then stores :expires-in in launchpad-token-validations and never reads it; update the logic in the branch that reads validation-info to use (:expires-in validation-info) instead of the local expires-in when comparing elapsed-seconds and when computing lp_token_expires_in, so the code consistently uses the stored :expires-in value from validation-info (the atom launchpad-token-validations and the map key :expires-in) rather than the hardcoded expires-in local.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 83-96: Update the assertions in transient-errors-not-cached-test
to match the new Launchpad messages emitted by
transmit-lib/src/cmr/transmit/urs.clj: replace the re-find checks for "Rate
limit exceeded" and "Gateway timeout" with patterns that match "Launchpad rate
limit exceeded" and "(gateway timeout)" (or use case-insensitive patterns like
(?i)rate limit exceeded and (?i)gateway timeout) so the ingest/ingest-concept
responses asserted in transient-errors-not-cached-test will match the current
URS messages.
In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 56-57: Currently cache-key is created with (keyword (str (hash
token))) which interns a keyword per token and exposes collision/memory-leak
risks; change cache-key to be a plain string (not a keyword) derived from the
token or, preferably, a collision-resistant digest/HMAC (e.g., SHA-256 or
HMAC-SHA256 of token) and use that string when calling cache/get-value with
get-launchpad-user-fn so tokens are not interned as keywords and collisions are
minimized.
- Around line 61-74: After evicting the stale entry, do not write a failing
revalidation result back into the cache; change the logic in the block using
cache/evict, get-launchpad-user-fn, cache/set-value, and
errors/throw-service-error so that if fresh-result is valid you set
cache/set-value and return it, but if fresh-result is invalid you do NOT call
cache/set-value and instead directly raise the error
(errors/throw-service-error) using the existing (:error-type/:error-message)
(with the scrubbed token message fallback); this preserves eviction on failed
revalidation instead of caching the failure.
---
Duplicate comments:
In `@common-lib/src/cmr/common/cache.clj`:
- Around line 41-43: The protocol CmrCache was extended with an evict method but
FallbackCache, SingleThreadLookupCache, and RedisCache do not implement it,
causing dispatch failures; add an evict implementation to each of these types
(implement the protocol method named evict / cache/evict) that matches their
semantics: for FallbackCache delegate eviction to both/backing caches or the
appropriate inner cache, for SingleThreadLookupCache remove the key from its
local lookup store and delegate to its backing cache, and for RedisCache perform
the appropriate key removal from Redis; after changes run the provided rg script
to verify no "MISSING evict" results.
---
Nitpick comments:
In `@mock-echo-app/src/cmr/mock_echo/api/urs.clj`:
- Around line 66-76: The code binds a local expires-in constant but then stores
:expires-in in launchpad-token-validations and never reads it; update the logic
in the branch that reads validation-info to use (:expires-in validation-info)
instead of the local expires-in when comparing elapsed-seconds and when
computing lp_token_expires_in, so the code consistently uses the stored
:expires-in value from validation-info (the atom launchpad-token-validations and
the map key :expires-in) rather than the hardcoded expires-in local.
In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 98-110: The test non-transient-errors-are-cached-test should
capture the cache entry's expiration timestamp before advancing time and assert
that it is updated after the second ingest; locate the cache entry using the
token-key variable and cache-util/list-cache-keys or a cache-read helper to
fetch the cached map (which contains :expiration-time), store that
:expiration-time in a local binding, call dev-sys-util/advance-time! 301 and
perform the second ingest, then read the cache entry again and assert that the
new :expiration-time is greater than the previously captured value to prove the
entry was refreshed after expiry.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e071585c-c7b1-4cf8-bfc2-3caf1c259ae8
📒 Files selected for processing (10)
common-lib/src/cmr/common/api/errors.cljcommon-lib/src/cmr/common/cache.cljcommon-lib/src/cmr/common/cache/in_memory_cache.cljcommon-lib/test/cmr/common/test/cache/in_memory_cache.cljmock-echo-app/src/cmr/mock_echo/api/routes.cljmock-echo-app/src/cmr/mock_echo/api/urs.cljsystem-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.cljtransmit-lib/src/cmr/transmit/launchpad_user_cache.cljtransmit-lib/src/cmr/transmit/tokens.cljtransmit-lib/src/cmr/transmit/urs.clj
🚧 Files skipped from review as they are similar to previous changes (1)
- common-lib/test/cmr/common/test/cache/in_memory_cache.clj
There was a problem hiding this comment.
🧹 Nitpick comments (1)
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj (1)
100-112: Consider removing unusedtoken-keyvariable.The
token-keycomputed on line 103 is never used in the test assertions. If no cache key verification is intended, this variable can be removed to improve clarity.♻️ Proposed fix to remove unused variable
(deftest non-transient-errors-are-cached-test (testing "Non-transient errors are cached for 5 minutes" - (let [token "ABC-INV-ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ" - token-key (DigestUtils/sha256Hex token)] + (let [token "ABC-INV-ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"] (let [resp (ingest/ingest-concept (data-umm-c/collection-concept {}) {:token token})] (is (= 401 (:status resp))))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj` around lines 100 - 112, The test non-transient-errors-are-cached-test computes token-key using (DigestUtils/sha256Hex token) but never uses it; remove the unused token-key binding to clean up the test, i.e., delete the token-key binding declaration that references DigestUtils/sha256Hex and keep the rest of the test (token, ingest/ingest-concept calls, cache-util/list-cache-keys check, and dev-sys-util/advance-time! invocation) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 100-112: The test non-transient-errors-are-cached-test computes
token-key using (DigestUtils/sha256Hex token) but never uses it; remove the
unused token-key binding to clean up the test, i.e., delete the token-key
binding declaration that references DigestUtils/sha256Hex and keep the rest of
the test (token, ingest/ingest-concept calls, cache-util/list-cache-keys check,
and dev-sys-util/advance-time! invocation) unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 443d2468-6180-4ffb-89de-9406fc1346c5
📒 Files selected for processing (2)
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.cljtransmit-lib/src/cmr/transmit/launchpad_user_cache.clj
5ef6ea1 to
4ee6718
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 29-31: The transient-error-types set currently only excludes
:too-many-requests and :gateway-timeout but upstream urs/get-launchpad-user can
throw :request-timeout, :internal-server-error, :bad-gateway, and
:service-unavailable; update the transient-error-types definition to include
those four symbols so the cache logic that treats other errors as invalid-token
results (e.g. the invalid-token-timeout/invalid-token caching path) will skip
caching for these transient server-side failures and let callers retry
immediately.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: f0de632c-521c-49f9-9bbe-2174a428b74c
📒 Files selected for processing (10)
common-lib/src/cmr/common/api/errors.cljcommon-lib/src/cmr/common/cache.cljcommon-lib/src/cmr/common/cache/in_memory_cache.cljcommon-lib/test/cmr/common/test/cache/in_memory_cache.cljmock-echo-app/src/cmr/mock_echo/api/routes.cljmock-echo-app/src/cmr/mock_echo/api/urs.cljsystem-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.cljtransmit-lib/src/cmr/transmit/launchpad_user_cache.cljtransmit-lib/src/cmr/transmit/tokens.cljtransmit-lib/src/cmr/transmit/urs.clj
🚧 Files skipped from review as they are similar to previous changes (3)
- common-lib/src/cmr/common/cache/in_memory_cache.clj
- system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
- common-lib/src/cmr/common/api/errors.clj
| (def transient-error-types | ||
| "Error types that should not be cached - they are transient and should be retried immediately" | ||
| #{:too-many-requests :gateway-timeout}) |
There was a problem hiding this comment.
Treat the new upstream availability errors as transient too.
urs/get-launchpad-user now throws :request-timeout, :internal-server-error, :bad-gateway, and :service-unavailable, but this set still only exempts 429 and 504. The catch block below will therefore cache those server-side failures as invalid-token results for invalid-token-timeout, which can pin a temporary EDL/Launchpad outage onto valid tokens.
Suggested fix
(def transient-error-types
"Error types that should not be cached - they are transient and should be retried immediately"
- #{:too-many-requests :gateway-timeout})
+ #{:request-timeout
+ :too-many-requests
+ :internal-server-error
+ :bad-gateway
+ :service-unavailable
+ :gateway-timeout})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj` around lines 29 - 31,
The transient-error-types set currently only excludes :too-many-requests and
:gateway-timeout but upstream urs/get-launchpad-user can throw :request-timeout,
:internal-server-error, :bad-gateway, and :service-unavailable; update the
transient-error-types definition to include those four symbols so the cache
logic that treats other errors as invalid-token results (e.g. the
invalid-token-timeout/invalid-token caching path) will skip caching for these
transient server-side failures and let callers retry immediately.
4ee6718 to
985b20b
Compare
985b20b to
81088e0
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #2398 +/- ##
==========================================
- Coverage 57.88% 57.84% -0.05%
==========================================
Files 1067 1067
Lines 73351 73417 +66
Branches 2121 2122 +1
==========================================
+ Hits 42460 42466 +6
- Misses 28912 28971 +59
- Partials 1979 1980 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Overview
What is the objective?
Fix launchpad token timeout behavior and improve error propagation from EDL (Earthdata Login) to provide more actionable error messages to users.
What are the changes?
What areas of the application does this impact?
Required Checklist
Additional Checklist
Summary by CodeRabbit
New Features
Tests