-
Notifications
You must be signed in to change notification settings - Fork 101
CMR-10600: Split cluster feature #2349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…luster (#2283) * spin up second embedded es cluster in local, update elastic-utils * add unit test * remove unit test comment * update elastic cluster naming and formatting
…lly) (#2284) * access control changes and elastic util changes * clean up unnecessary code for this PR * remove unnecessary changes in this PR * update es config lib name, remove gran cluster access for access control * update elastic-servers func * add system require back in * update gran elastic port to elastic port * remove unnecessary access control file changes * update func name * replace es cluster name by index name for concept id * move msg func to prevent circular dependency * add new line
…sitory into CMR-10600-split-cluster-feature
…nasa/Common-Metadata-Repository into CMR-10600-split-cluster-feature
…sitory into CMR-10600-split-cluster-feature
* add index set api and rebalance api changes * fix formatting * add sys tests for new endpoints * fix kondo errors * more kondo errors * format changes * update elastic cluster api name and fix es-msg import * fix msg call * update formatting again * more formatting * more formatting * fix unit test * partly centralize index names * reverse func name changes for easier comparison * add delete index action and formatting
…sitory into CMR-10600-split-cluster-feature
…sitory into CMR-10600-split-cluster-feature
…sitory into CMR-10600-split-cluster-feature
…sitory into CMR-10600-split-cluster-feature
* update split cluster to account for alias * remove dup def * fix func * add logs for tracking during split in upper envs * minor cosmetic changes
…sitory into CMR-10600-split-cluster-feature
…sitory into CMR-10600-split-cluster-feature
* update reshard funcs with split cluster * remove unnecessary func and add new line
…#2330) * update to db migrate * clean up code * replace ifs with whens, move acl check back to routes, fix import names * add ticket num to comment * fix build issue * add the group5 changes from master
…sitory into CMR-10600-split-cluster-feature
* update elastic-root * pass es cluster to context->conn * fix when statement * update resharding apis to accept elastic name * fix alignment * remove unnecessary comment * fix kondo warning
…sitory into CMR-10600-split-cluster-feature
…sitory into CMR-10600-split-cluster-feature
…sitory into CMR-10600-split-cluster-feature
WalkthroughThreads explicit Elasticsearch cluster names (granule vs non-granule) across configs, system wiring, APIs, services, helpers, and tests; adds cluster-aware plumbing and signatures (es-cluster-name/elastic-name) for bulk indexing, resharding, index-set flows, and search operations. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant API as "Bootstrap API"
participant Bootstrap
participant Dispatch
participant Indexer
participant ESUtil as "es helper"
participant GranES as "gran-elastic"
participant MainES as "elastic"
Client->>API: POST /reshard (elastic_name param)
API->>Bootstrap: start-reshard-index(context,index,params)
Bootstrap->>Dispatch: migrate-index(context,src,tgt,elastic-name)
Dispatch->>Indexer: migrate-index(context,src,tgt,elastic-name)
Indexer->>ESUtil: bulk-index-documents(context,docs,es-cluster-name)
alt es-cluster-name == gran-elastic
ESUtil->>GranES: send bulk request
else
ESUtil->>MainES: send bulk request
end
ESUtil-->>Indexer: bulk response
Indexer-->>Dispatch: done
Dispatch-->>Bootstrap: result
Bootstrap-->>API: respond
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Areas requiring extra attention:
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
indexer-app/int-test/cmr/indexer/test/valid_data_crud_tests.clj (1)
290-387: LGTM! Comprehensive test validates per-cluster index-set retrieval.This test thoroughly validates the split cluster feature by:
- Creating an index-set and verifying successful creation
- Fetching index-sets filtered by cluster (granule vs non-granule)
- Fetching index-set by ID with cluster filtering
- Validating expected structures with correct concept separation:
- Granule cluster contains only granule-related concepts
- Non-granule cluster contains collection and other non-granule concepts
The detailed expected structures ensure correctness of the cluster split functionality.
Optional: Consider extracting the large expected data structures (lines 308-379) into helper functions or constants to improve readability and reduce inline complexity. This would make the test assertions more concise while maintaining the same validation coverage.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
indexer-app/int-test/cmr/indexer/test/valid_data_crud_tests.clj(6 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T16:26:11.404Z
Learnt from: indiejames
Repo: nasa/Common-Metadata-Repository PR: 2328
File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349
Timestamp: 2025-10-20T16:26:11.404Z
Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.
Applied to files:
indexer-app/int-test/cmr/indexer/test/valid_data_crud_tests.clj
🔇 Additional comments (5)
indexer-app/int-test/cmr/indexer/test/valid_data_crud_tests.clj (5)
22-33: LGTM! Per-cluster index verification is correct.The test properly validates the split cluster feature by:
- Fetching index names for each cluster separately
- Verifying granule indices exist only in the granule cluster
- Verifying non-granule indices exist only in the non-granule cluster
- Using
doseqfor eager assertion execution
43-59: LGTM! Test correctly validates index-set fetch with new naming conventions.The test properly:
- Uses the updated
collections-v2suffix matching the new naming scheme- Validates both collection and granule concepts are correctly represented in the fetched index-set
- Verifies the expected index structure across both clusters
64-87: LGTM! Delete test correctly validates per-cluster index removal.The test properly:
- Creates indices and verifies they exist in their respective clusters (collection in non-gran, granule in gran)
- Deletes the index-set and confirms removal from the correct clusters
- Uses correct destructuring
{:keys [status response]}(previous bug fixed)- Comment accurately reflects that indices are removed from their respective clusters
94-133: LGTM! Tests correctly validate index-set retrieval and updates across clusters.Both tests properly handle the split cluster architecture:
get-index-sets-testusesdoseqfor eager assertion execution and correctly checks that each index exists in at least one cluster usingorupdate-index-sets-testvalidates the entire update flow and verifies the resulting per-cluster concept structure
154-159: LGTM! Rebalancing assertions correctly use granule cluster.The function properly verifies granule indexes are created in the granule Elasticsearch cluster, which is the correct cluster for granule rebalancing operations.
jceaser
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments
| es-cluster-name (if (= :granule concept-type) | ||
| es-config/gran-elastic-name | ||
| es-config/elastic-name) | ||
| total (index/bulk-index {:system (helper/get-indexer system)} concept-batches es-cluster-name)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the first parameter is a context, and it is now repeated (and I see it below as well). Can you make this a local function in case we have to add something to all of them latter we can do it once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this warrants a separate func since it just calls another func to return a var
|
|
||
| (defmulti create-elastic | ||
| (defmulti create-gran-elastic-server | ||
| "Sets elastic configuration values and returns an instance of an Elasticsearch component to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment for this function and create-elastic-sever should make it clear what is contained in the cluster and what makes the two different. Right now they have the same comment
|
|
||
| (def ^:private elasticsearch-official-docker-image | ||
| "Official docker image." | ||
| "docker.elastic.co/elasticsearch/elasticsearch:7.17.14") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
has anyone ever tried 7.17.28? https://hub.docker.com/layers/library/elasticsearch/7.17.28/ it has less vulnerabilities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should work without issues but needs to be a different ticket
| (dorun (map #(es/create-index-and-alias non-gran-es-store %) non-gran-indices-w-config)) | ||
| (catch ExceptionInfo e | ||
| ;; TODO: Generic work: why does this fail to roll back with bad generics? | ||
| (println "failed to create index, roll back, this does not always work") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we keeping this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, this was here before my changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep it for now and clean up in a separate ticket. I did not want to change what already existed
bootstrap-app/README.md
Outdated
| the resharding is finalized. | ||
|
|
||
| You MUST give the elastic_name parameter to tell CMR which cluster your index is in that is going to be resharded. | ||
| The options are: gran-elastic and elastic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to change these lines to reference the variables instead of the current values. Historically, CMR readme's have often gone neglected through code changes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but can't variable names change as well? This is more API documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (6)
elastic-utils-lib/src/cmr/elastic_utils/test_util.clj (1)
27-29: Remove redundantdoform in finally block.The
doform is unnecessary in afinallyblock, which can contain multiple expressions directly.Apply this diff (only if you don't apply the fix above):
(finally - (do - (l/stop gran-elastic-server nil) - (l/stop elastic-server nil))))))) + (l/stop gran-elastic-server nil) + (l/stop elastic-server nil))))))system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj (1)
155-209: Track commented test as technical debt.The commented test highlights a real timing issue with Elasticsearch's delete-by-query and version conflicts. While the explanation is clear, consider opening an issue to track this gap in test coverage or explore alternative testing strategies (e.g., polling with retry, or accepting eventual consistency in the test).
Do you want me to open a new issue to track this test coverage gap?
bootstrap-app/README.md (3)
97-110: Resharding start docs look accurate; consider minor wording + lint nits.The new explanation and example for the required
elastic_nameparameter match the code ((:elastic_name params)in the handler) and read clearly. If you want to tighten things further and keep markdownlint happy, you could:
- Rephrase slightly, e.g. “You MUST provide the
elastic_nameparameter to tell CMR which cluster contains the index that will be resharded.”- Add a language to the fenced block here (for example ```bash) and adjust list indentation to match your markdownlint config (MD007/MD040).
Purely optional; behavior is documented correctly as-is.
116-129: Status endpoint docs correctly requireelastic_name; maybe vary the example cluster name.The status section correctly documents
elastic_nameas required and uses the same parameter name and options as the handler (:elastic_name), which is good for consistency. For discoverability, you might consider usingelastic_name=elasticin this example (or one of the others) so the README visibly demonstrates both allowed values (gran-elasticandelastic), but that’s just a usability tweak.
135-148: Finalize resharding docs align with handler behavior; optional markdown cleanup.This section’s description and example match the handler:
elastic_nameis required, and the response body remains a simple “resharding completed” message for the index. As with the earlier sections, if you care about markdownlint you can:
- Add a fenced language (e.g., ```bash) for this curl example.
- Normalize bullet indentation for the “Required params” list.
No functional issues from a docs/API-contract perspective.
bootstrap-app/src/cmr/bootstrap/api/resharding.clj (1)
4-57: ES cluster name validation and handler signatures look correct; only minor polish possible.
- The new
validate-es-cluster-name-not-blankhelper plus the updatedstart,get-status, andfinalizesignatures correctly enforce thatelastic_nameis present and non-blank before delegating to the service layer.- Using
string/blank?is appropriate here since it gracefully treats bothniland empty/whitespace values as invalid.- The parameter name in the HTTP API (
elastic_name) matches what you pull fromparams((:elastic_name params)), which lines up with the README.If you want to tighten it further (optional):
- Consider making the error mention the parameter directly, e.g.
"Missing or empty elastic_name parameter is not allowed.", to be slightly more actionable for API consumers.- Long-term, you might decide to have
validate-num-shardsreturn the parsed shard count sostartdoesn’t need to callparse-longtwice, but that would be a broader refactor touching any other callers.From a behavior and error-handling perspective, this change set is solid.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
bootstrap-app/README.md(3 hunks)bootstrap-app/src/cmr/bootstrap/api/resharding.clj(2 hunks)elastic-utils-lib/src/cmr/elastic_utils/test_util.clj(1 hunks)system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj(4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T16:26:11.404Z
Learnt from: indiejames
Repo: nasa/Common-Metadata-Repository PR: 2328
File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349
Timestamp: 2025-10-20T16:26:11.404Z
Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.
Applied to files:
system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.cljbootstrap-app/src/cmr/bootstrap/api/resharding.clj
🪛 LanguageTool
bootstrap-app/README.md
[style] ~97-~97: Use ‘will’ instead of ‘going to’ if the following action is certain.
Context: ...CMR which cluster your index is in that is going to be resharded. - Required params: -...
(GOING_TO_WILL)
[style] ~116-~116: Use ‘will’ instead of ‘going to’ if the following action is certain.
Context: ...CMR which cluster your index is in that is going to be resharded. - Required params: -...
(GOING_TO_WILL)
[style] ~135-~135: Use ‘will’ instead of ‘going to’ if the following action is certain.
Context: ...CMR which cluster your index is in that is going to be resharded. - Required params: - e...
(GOING_TO_WILL)
🪛 markdownlint-cli2 (0.18.1)
bootstrap-app/README.md
100-100: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
101-101: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
103-103: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
119-119: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
120-120: Unordered list indentation
Expected: 4; Actual: 8
(MD007, ul-indent)
122-122: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
124-124: Hard tabs
Column: 1
(MD010, no-hard-tabs)
125-125: Hard tabs
Column: 1
(MD010, no-hard-tabs)
141-141: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (6)
elastic-utils-lib/src/cmr/elastic_utils/test_util.clj (2)
3-3: LGTM! Improved namespace alias clarity.The alias change from
configtoes-configmakes the Elasticsearch configuration namespace more explicit and aligns well with the dual-cluster refactoring.
11-14: I'll help you verify whether the connections created in theelastic-running?function need explicit cleanup. Let me search for thetry-connectimplementation and see how it handles resources.
<function_calls>
#!/bin/bashFind try-connect implementation and resource management patterns
Search for try-connect function definition
echo "=== Searching for try-connect definition ==="
rg -n "defn try-connect" --type=clj -A 10echo ""
echo "=== Searching for try-connect usage patterns ==="
rg -n "try-connect" --type=clj -B 2 -A 5 | head -100echo ""
echo "=== Searching for with-open patterns with elastic connections ==="
rg -n "with-open" --type=clj -B 2 -A 5 | head -50
</function_calls>system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj (4)
27-30: Good validation test coverage for required elastic-name parameter.The new tests appropriately verify that elastic-name is required for all three resharding operations (start, get-status, finalize) with clear, consistent error messages.
Also applies to: 74-77, 86-89
31-73: Parameter updates are consistent with cluster semantics.The error test cases correctly propagate
:elastic-name gran-elastic-namefor granule-related indexes (small_collections), maintaining consistency with the split-cluster architecture.
95-147: Success test workflow correctly implements split-cluster propagation.The test appropriately uses
gran-elastic-namethroughout the resharding workflow for small_collections (a granule index), verifying start, status, finalize, and alias operations with proper cluster context.
24-25: The hardcoded cluster names are correct and match the system configuration—no issue exists.The review comment incorrectly flags the hardcoded cluster names. The values
"gran-elastic"and"elastic"are exact constants defined inelastic-utils-lib/src/cmr/elastic_utils/config.cljas the source of truth for cluster names. These are used consistently throughout the codebase and are properly configured for the test environment, including corresponding port mappings (gran-elastic-port: 9210,elastic-port: 9211) in the same configuration module.
system-int-test/test/cmr/system_int_test/bootstrap/reshard_index_test.clj
Outdated
Show resolved
Hide resolved
TylerHeald1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments added for feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
indexer-app/src/cmr/indexer/data/elasticsearch.clj (1)
439-459: Avoid logging the full connection object.While the conditional logging at lines 445-446 is gated by
split-cluster-log-toggle, logging theconnobject is risky:(info "CMR-10600 Saving doc for concept " concept-id " under index " es-index " in es cluster " conn)The
connobject fromget-es-cluster-connmay contain connection details (URI, HTTP options, authentication headers/tokens), which could expose credentials in logs.Apply this diff to log only high-level routing info:
(let [conn (get-es-cluster-conn context es-index) - ;; Temporarily putting in this log to check which indexes are going to which cluster during ES cluster split _ (when (es-config/split-cluster-log-toggle) - (info "CMR-10600 Saving doc for concept " concept-id " under index " es-index " in es cluster " conn)) + (info "CMR-10600 Saving doc for concept" concept-id "under index" es-index))This still provides routing visibility without risking credential leakage.
🧹 Nitpick comments (4)
indexer-app/src/cmr/indexer/data/index_set_elasticsearch.clj (2)
118-132: Consider unconditional logging for migration visibility.While the conditional logging at lines 123-124 provides debug visibility when
split-cluster-log-toggleis enabled, this function performs a one-time migration operation that retrieves index-sets from the old pre-split cluster architecture. As jceaser suggested in a previous comment, unconditional logging would provide better audit trail and confirmation that this critical migration step executed successfully.Apply this diff to log unconditionally:
[context es-cluster-name index-set-id] - (when (es-config/split-cluster-log-toggle) - (info "CMR-10600 Getting old index set to copy over to new split cluster index set indexes.")) + (info "CMR-10600 Getting old index set" index-set-id "to copy over to new split cluster index set indexes.") (let [es-cluster-name-keyword (es-config/elastic-name-str->keyword es-cluster-name)Based on learnings, past review feedback from jceaser indicated this action should be logged.
165-180: Update comment to reflect per-cluster index-set indexes.Line 175's comment references:
;; to result in 503 if replicas setting value of 'index-sets' is set to > 0 when running on a single nodeWith the cluster split feature, there are now per-cluster index-set indexes (non-gran and gran variants), so the comment should clarify which index-set index it's referring to or generalize the statement.
Apply this diff:
(when (:error result) ;; service layer to rollback index-set create progress on error - ;; to result in 503 if replicas setting value of 'index-sets' is set to > 0 when running on a single node + ;; may result in 503 if replicas setting value of the index-set index is set to > 0 when running on a single node (throw (Exception. (format "Save to Elasticsearch failed. Reported status: %s and error: %s " status error)))))indexer-app/src/cmr/indexer/data/elasticsearch.clj (2)
284-312: Clarify index-set index placement strategy in comments.The ESstore record now includes
es-cluster-name, and thestartlifecycle creates the index-set index using the per-cluster config. However, the comments at lines 301-302 are somewhat confusing:;; this is creating the index set index with the list of indexes in Elastic... do we want them to be a different list in each cluster? Right now, the non-gran cluster has this index ;; unless we want a copy of this index set in each cluster with it being accurate to that specific index, or it only goes to the non-gran cluster...let's try the non-gran cluster only and see what it doesThese comments suggest uncertainty or work-in-progress thinking. Based on the code, each cluster now gets its own index-set index (via
config/idx-cfg-for-index-setswithes-cluster-name), which makes sense. Consider clarifying or removing the outdated comment.Apply this diff:
- ;; this is creating the index set index with the list of indexes in Elastic... do we want them to be a different list in each cluster? Right now, the non-gran cluster has this index - ;; unless we want a copy of this index set in each cluster with it being accurate to that specific index, or it only goes to the non-gran cluster...let's try the non-gran cluster only and see what it does + ;; Create the index-set index for this cluster, which tracks the indexes present in this specific cluster (index-set-es/create-index this (config/idx-cfg-for-index-sets (:es-cluster-name this)))
412-420: Explicit nil return is redundant but harmless.As TylerHeald1 noted in a past comment, line 420's explicit
nilreturn is unnecessary sincehandle-bulk-index-responseends with adoseqthat already returnsnil. However, making it explicit can improve clarity for readers unfamiliar with Clojure's return semantics, so this is optional.If you prefer to remove the redundant return:
(handle-bulk-index-response response))) - nil)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
dev-system/src/cmr/dev_system/system.clj(4 hunks)elastic-utils-lib/src/cmr/elastic_utils/config.clj(4 hunks)elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj(1 hunks)indexer-app/src/cmr/indexer/data/elasticsearch.clj(7 hunks)indexer-app/src/cmr/indexer/data/index_set_elasticsearch.clj(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- elastic-utils-lib/src/cmr/elastic_utils/config.clj
- elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T16:26:11.404Z
Learnt from: indiejames
Repo: nasa/Common-Metadata-Repository PR: 2328
File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349
Timestamp: 2025-10-20T16:26:11.404Z
Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.
Applied to files:
indexer-app/src/cmr/indexer/data/index_set_elasticsearch.cljindexer-app/src/cmr/indexer/data/elasticsearch.clj
🔇 Additional comments (17)
dev-system/src/cmr/dev_system/system.clj (3)
15-15: LGTM! Namespace alias renamed for consistency.The namespace alias has been updated from
elastic-configtoes-config, which aligns with the broader ES configuration changes in this PR.
303-308: LGTM! Pre-components correctly wired for dual-cluster setup.Both
gran-elastic-serverandelastic-serverare properly added to the:pre-componentsmap, ensuring both clusters are started before the application components. Theu/remove-nil-keyscall appropriately handles the external elastic configuration where the server components return nil.
288-289: All configuration functions are properly defined and verified.The new gran-specific port configurations are correctly implemented:
es-config/gran-elastic-portis defined inelastic-utils-lib/src/cmr/elastic_utils/config.clj(line 52) with default port 9210dev-config/embedded-kibana-gran-portis defined indev-system/src/cmr/dev_system/config.clj(line 27) with default port 5601Both functions are already in use across the codebase (test utilities, integration tests, and now the system component initialization).
indexer-app/src/cmr/indexer/data/index_set_elasticsearch.clj (3)
106-116: LGTM: Per-cluster index-set retrieval correctly implemented.The function now accepts
es-cluster-name, normalizes it usingelastic-name-str->keyword, and fetches the cluster-specific config and connection. The lookup is consistent with the per-cluster routing pattern used elsewhere in the PR.
134-143: Documentation correctly updated for per-cluster semantics.The docstrings now accurately reflect that these functions operate on a specific Elasticsearch cluster.
182-193: LGTM: Per-cluster document deletion correctly implemented.The function now correctly resolves the cluster-specific config using
elastic-name-str->keyword(consistent withsave-document-in-elastic), retrieves the host/port/admin-token from the per-cluster config, and constructs the delete URL accordingly.indexer-app/src/cmr/indexer/data/elasticsearch.clj (11)
114-146: LGTM: Resharded index reconciliation logic is sound.The
reconcile-resharded-indexfunction correctly:
- Iterates over resharded indexes
- Skips updates if the resharded index name already exists
- Otherwise locates the canonical index by
index-keyand updates its:nameand:number_of_shardsThis ensures that the expected index-set reflects resharded indexes without duplicating entries.
148-160: LGTM: Per-cluster expected index-set computation is correct.The function correctly:
- Accepts
elastic-nameto determine which cluster's index-set to compute- Fetches the existing index-set for the specified cluster
- Branches to call
gran-index-setornon-gran-index-setas functions (with proper invocation)- Reconciles resharded indexes
The per-cluster branching logic is sound.
162-179: LGTM: Index-set update detection correctly implemented.Both functions are correct:
index-set-requires-update?compares index-sets after removing:concepts(which shouldn't affect update status)cluster-requires-update?validates the cluster name and delegates tocompute-expected-index-setto get both existing and expected sets for comparisonThe previous issue with passing function vars instead of realized maps has been resolved.
182-225: LGTM: Dual-cluster initialization correctly implemented.The function now handles both non-gran and gran clusters independently:
- Computes expected index-sets for each cluster
- Creates index-sets if they don't exist
- Warns (but doesn't force-update) if existing index-sets don't match expected
- Creates the collections alias only for the non-gran cluster
This approach is safe for startup initialization, deferring updates to explicit admin actions.
227-266: LGTM: Dual-cluster index update with migration support.The function correctly:
- Updates both non-gran and gran cluster index-sets
- Validates requested index-sets before applying
- Creates collection alias for non-gran cluster
- Includes transition logic (lines 260-266) to migrate old index-sets to gran cluster during first split
The migration logic appropriately checks for
nilexisting gran index-set, retrieves the old value, and applies it if found. This provides a smooth migration path.
268-276: LGTM: Granule index deletion correctly routed to gran cluster.The function now correctly uses
es-config/gran-elastic-nameto route granule index deletions to the granule-specific cluster.
309-312: LGTM: ESstore factory correctly parameterized.The
create-elasticsearch-storefunction now acceptses-cluster-nameand passes it to the ESstore record constructor.
332-335: LGTM: Per-cluster config retrieval correctly implemented.The function now accepts
es-cluster-name, normalizes it usingelastic-name-str->keyword, and fetches the cluster-specific config from context.
422-431: LGTM: Bulk indexing correctly supports per-cluster routing.The function now accepts
es-cluster-namein both arities and correctly routes bulk operations to the specified cluster viacontext->conn.
433-437: LGTM: Per-cluster connection routing helper is correct.The
get-es-cluster-connfunction correctly determines the cluster name from the index/alias name and retrieves the appropriate connection from context.
461-501: LGTM: Document retrieval and deletion correctly route to per-cluster.Both functions correctly:
- Derive the cluster name from the concept-id using
es-index/get-es-cluster-name-from-concept-id- Route operations to the appropriate cluster
- Preserve existing error handling and conflict resolution logic
The conditional logging doesn't expose sensitive information (just cluster names).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
indexer-app/src/cmr/indexer/data/elasticsearch.clj (1)
442-446: Avoid logging full ES connection map, even behind a toggleThe debug log still prints
conn, which likely containsuri, HTTP options, and possibly auth headers/tokens. Even though it’s guarded bysplit-cluster-log-toggle, enabling that toggle in any non‑local environment risks leaking credentials into logs.Prefer logging only high‑level routing info (cluster name / index), not the full connection object. For example:
- (let [conn (get-es-cluster-conn context es-index) - ;; Temporarily putting in this log to check which indexes are going to which cluster during ES cluster split - _ (when (es-config/split-cluster-log-toggle) - (info "CMR-10600 Saving doc for concept " concept-id " under index " es-index " in es cluster " conn)) + (let [conn (get-es-cluster-conn context es-index) + es-cluster-name (indexer-util/get-es-cluster-name-by-index-or-alias es-index) + ;; Debug which cluster an index routes to without exposing connection details + _ (when (es-config/split-cluster-log-toggle) + (info "CMR-10600 Saving doc for concept" concept-id "under index" es-index "in ES cluster" es-cluster-name))This preserves the routing visibility you need while keeping sensitive connection details out of logs.
🧹 Nitpick comments (7)
dev-system/src/cmr/dev_system/system.clj (1)
289-291: Consider renaming the localelastic-serverbinding for clarity.Within
create-system, theelastic-serverlet-binding shares a name with theelastic-servernamespace alias, which can be slightly confusing when editing this function later. Renaming the local value to something likenongran-elastic-serverwhile keeping the pre-component key:elastic-serverunchanged would make the intent clearer.You could apply a small refactor like:
- gran-elastic-server (create-elastic-server elastic (es-config/gran-elastic-port) (dev-config/embedded-kibana-gran-port)) - elastic-server (create-elastic-server elastic (es-config/elastic-port) (dev-config/embedded-kibana-port)) + gran-elastic-server (create-elastic-server elastic (es-config/gran-elastic-port) (dev-config/embedded-kibana-gran-port)) + nongran-elastic-server (create-elastic-server elastic (es-config/elastic-port) (dev-config/embedded-kibana-port)) @@ - :pre-components (u/remove-nil-keys - {:gran-elastic-server gran-elastic-server - :elastic-server elastic-server + :pre-components (u/remove-nil-keys + {:gran-elastic-server gran-elastic-server + :elastic-server nongran-elastic-serverAlso applies to: 305-306
indexer-app/src/cmr/indexer/data/elasticsearch.clj (5)
150-160:compute-expected-index-setcluster split behavior looks correctUsing
elastic-nameto:
- Fetch the existing index-set per cluster,
- Derive extra granule indexes,
- Select
gran-index-setvsnon-gran-index-set,- Then reconcile resharded indexes,
matches the PR intent of per-cluster expectations and preserving existing resharding. Consider updating the docstring to mention
elastic-name/cluster explicitly for clarity, but behavior looks good.
162-170: Consider stripping:conceptsfrom both maps for clarityThe intent is to ignore
:conceptswhen deciding if an update is required, but currently onlyexisting-index-sethas:conceptsremoved before comparison. Ifexpected-index-setever contains:concepts, it will still influence equality.To make the intent explicit and symmetrical, you could do:
-(defn index-set-requires-update? - [existing-index-set expected-index-set] - (let [updated-value (update-in existing-index-set [:index-set] dissoc :concepts) - result (not= updated-value expected-index-set)] - result)) +(defn index-set-requires-update? + [existing-index-set expected-index-set] + (let [strip-concepts #(update-in % [:index-set] dissoc :concepts)] + (not= (strip-concepts existing-index-set) + (strip-concepts expected-index-set)))This keeps behavior aligned with the docstring even if expected sets later grow a
:conceptskey.
172-179:cluster-requires-update?fix correctly delegates tocompute-expected-index-setThis now validates the cluster name and uses
compute-expected-index-setto obtain realized maps before callingindex-set-requires-update?, addressing the earlier issue where function vars were bound instead of index-set maps. The docstring still mentions “takes either the context or existing/expected sets,” but the function is now definitively[context es-cluster-name]only; updating the comment would avoid confusion.
417-417: Autocomplete suggestions are explicitly pinned to the non‑gran clusterUsing
(indexer-util/context->conn context es-config/elastic-name)here makes it clear that autocomplete indexing happens in the non‑gran cluster only. That’s reasonable, but if you ever need granule-based suggestions, you may need a follow-up overload that acceptses-cluster-name.
475-481:delete-documentcluster routing and config lookup are coherentDeriving
es-cluster-namefrom theconcept-id, then:
- Using
context->es-configwithelastic-name-str->keywordto fetchadmin-token, and- Using
context->connforuri/http-opts,is consistent with the rest of the split-cluster design and keeps the authorization header appropriately cluster-specific. The debug log only includes the concept-id, index, and cluster name, which is safe.
Given how central this path is, it would be good to have targeted tests exercising both clusters’ delete flows (with and without
elastic-version) to guard against regressions.indexer-app/src/cmr/indexer/data/index_set.clj (1)
1069-1082: Simplify index-set ID lookup and filter nil generics when building mapping typesThe multi-cluster fetch logic itself looks good:
fetch-concept-type-index-namesdeep-merges the gran and non-gran index-sets and exposes:index-names, resharding indices, and rebalancing collections.fetch-concept-mapping-typesmerges the per-cluster:index-setmaps for mapping-type discovery.fetch-rebalancing-collection-infocorrectly restricts itself to the granule half of the index-set.Two small refactors would improve robustness and clarity:
- Avoid misusing
gran-index-setjust to recover:idThe 0-arity variants of these fns do:
(let [index-set-id (get-in (gran-index-set context) [:index-set :id])] ...)but
gran-index-setis defined as[extra-granule-indexes]and doesn’t conceptually depend oncontext. Passingcontexthere is confusing and would become fragile ifgran-index-setever assumed a specific type/shape forextra-granule-indexes. Sinceindex-set-idis already a constant, you can simplify:(defn fetch-concept-type-index-names ... ([context] - (let [index-set-id (get-in (gran-index-set context) [:index-set :id])] - (fetch-concept-type-index-names context index-set-id))) + (fetch-concept-type-index-names context index-set-id))and make the analogous change in the 0-arity arities of
fetch-concept-mapping-typesandfetch-rebalancing-collection-info.
- Filter out nil generics before
into {}
get-concept-mapping-types-for-genericsintentionally returnsnilwhen a generic concept has no mapping.fetch-concept-mapping-typesthen does:(into {} (map #(get-concept-mapping-types-for-generics % all-index-set) (cs/get-generic-concept-types-array)))That means
intowill try toconjnilinto a map when a generic is present but has no mapping, which is surprising and may be invalid depending on the Clojure version/implementation. To match the docstring (“Any generic that doesn't have a map is omitted”) and avoid depending on that behavior, I’d filter nils explicitly:- (into {} - (map #(get-concept-mapping-types-for-generics % all-index-set) - (cs/get-generic-concept-types-array))))))) + (into {} + (keep #(get-concept-mapping-types-for-generics % all-index-set) + (cs/get-generic-concept-types-array)))))))Please double-check
conj/intobehavior withnilon your Clojure version in a REPL to confirm there’s no hidden edge case before/after this change.Also applies to: 1085-1096, 1097-1123, 1125-1132
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
dev-system/src/cmr/dev_system/system.clj(4 hunks)indexer-app/src/cmr/indexer/data/elasticsearch.clj(7 hunks)indexer-app/src/cmr/indexer/data/index_set.clj(7 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T16:26:11.404Z
Learnt from: indiejames
Repo: nasa/Common-Metadata-Repository PR: 2328
File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349
Timestamp: 2025-10-20T16:26:11.404Z
Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.
Applied to files:
indexer-app/src/cmr/indexer/data/elasticsearch.cljindexer-app/src/cmr/indexer/data/index_set.clj
🔇 Additional comments (16)
dev-system/src/cmr/dev_system/system.clj (2)
15-15: Alias rename toes-configis consistent and clear.The new
es-configalias reads well and matches the usages increate-system(Lines 289–290); no issues from this change.
103-118:create-elastic-servermultimethod refactor and docstring look correct.The updated dispatch signature
[type elastic-port kibana-port]matches the call sites, and the docstring now accurately describes the granule vs non‑granule cluster usage. The in‑memory method cleanly threads the ports and log level intoelastic-server/create-server.indexer-app/src/cmr/indexer/data/elasticsearch.clj (11)
11-13: New ES config/index dependencies look appropriateAdding
es-configandes-indexhere matches the new cluster-aware routing and index-set logic further down in this file; usage appears consistent.
127-146: Resharded index reconciliation logic is structurally soundThe reducer correctly:
- Locates the per-concept
:indexesvector.- No‑ops when a resharded name already exists.
- Updates matching canonical entries’
:nameand:settings :index :number_of_shards.
get-in/update-inbehavior with missing paths ornilindexesis safe (falls back to empty vectors), so this should be robust to partially populated index-sets.
230-266:update-indexesnow handles both clusters consistently; transition block looks intactThe refactor to:
- Reuse
compute-expected-index-setfor both non‑gran and gran clusters,- Gate updates on
:forceorindex-set-requires-update?,- Call
validate-requested-index-setwith the appropriate cluster name,is consistent and keeps the “manual update” behavior (logging mismatches rather than auto-updating). The transition block that seeds the new gran index-set from
get-old-index-setis preserved and now explicitly scoped to the gran cluster, which matches the migration story.
273-273:delete-granule-indexcorrectly targets the granule clusterSwitching this delete path to use the granule ES cluster via
es-config/gran-elastic-namealigns with the split-cluster design and keeps granule indexes isolated.
282-282:reset-es-storecorrectly reusescreate-default-indexesCalling
create-default-indexesafterindex-set-svc/resetcentralizes the index-set initialization logic and ensures both clusters are re-created consistently during a reset.
289-293: ES store now being parameterized byes-cluster-nameis a good improvementAdding
es-cluster-nametoESstore, threading it viacreate-elasticsearch-store, and using it instartto derive the index-set index config (idx-cfg-for-index-sets) makes the per-cluster store behavior explicit and aligns with the dual-cluster system wiring.Also applies to: 301-304, 311-312
334-335:context->es-configstring→keyword conversion is appropriateUsing
es-config/elastic-name-str->keywordto derive the key intocontext[:system]matches the pattern of getting cluster names as strings (e.g., fromes-index/get-es-cluster-name-from-concept-id) and keeps lookups robust even if the external representation stays string-based.
432-437:get-es-cluster-connhelper is a reasonable abstractionDeriving the cluster name from
get-es-cluster-name-by-index-or-aliasand then callingcontext->conncentralizes the index/alias→cluster routing. Usingkeywordon the returned name is safe whether the function returns a string or keyword and keeps the key shape consistent forcontext->conn.
182-225: I'll verify the review comment's concern about type consistency for cluster-name values passed tocontext->conn. Let me search the codebase to understand the expected types and current usage.
<function_calls>
<invoke_name>shell
<invoke_description>First, find the context->conn function definition to see what type it expects</invoke_description>
<invoke_code>
rg -n "defn context->conn" indexer-app/src
</invoke_code>
</function_calls>
<invoke_name>shell
<invoke_description>Search for es-config definitions to understand the types of cluster-name values</invoke_description>
<invoke_code>
rg -n "elastic-name|gran-elastic-name" indexer-app/src -C2
</invoke_code>
</function_calls>
<invoke_name>shell
<invoke_description>Find all context->conn calls in the elasticsearch.clj file</invoke_description>
<invoke_code>
rg -n "context->conn" indexer-app/src/cmr/indexer/data/elasticsearch.clj -C2
</invoke_code>
</function_calls>
463-467: Type consistency is confirmed—no issues found.The review comment correctly identifies the routing pattern but raises a concern that is already resolved. The flow is:
get-es-cluster-name-from-concept-idreturns strings ("elastic"or"gran-elastic")indexer-util/context->conncallselastic-name-str->keywordon the parameterelastic-name-str->keywordexplicitly handles both strings and keywords, converting strings to their keyword equivalentsThe code is type-consistent by design—the
elastic-name-str->keywordfunction (elastic-utils-lib/src/cmr/elastic_utils/config.clj:139–148) accepts either type and normalizes them. No type mismatch exists.
423-425: All callsites ofbulk-index-documentsin indexer-app are correctly updated for the new arities.Both calls in
indexer-app/src/cmr/indexer/services/index_service.clj(lines 153 and 182) pass four arguments:context, batch,es-cluster-name, andoptions. This matches the new arity 2 signature[context docs es-cluster-name {:keys [all-revisions-index?]}]. The arity 1 delegation[context docs es-cluster-name]→ arity 2 withniloptions is properly implemented for backward compatibility. The separate elastic-utils module used by access-control-app retains its original 3-argument signature and is unaffected.indexer-app/src/cmr/indexer/data/index_set.clj (3)
4-11: Require additions are consistent with later usageThe added aliases (
cmr.common.utilandcmr.elastic-utils.config) are used below, and the expandedcmr.common.logrequire is consistent. No issues from this hunk.
18-20: Index-set ID and derived index-name constants look coherentCentralizing
index-set-idand derivingsmall-collections-*,deleted-granule-index-name, andgranule-index-name-prefixfrom it matches the existing naming scheme (e.g.,small_collections/deleted_granulescanonicalization) and should make future ID/name changes easier. No functional problems spotted here.Also applies to: 77-95
954-989:gran-index-setstructure aligns with granule/deleted-granule definitionsThe granule-side index-set template:
- Uses
index-set-idconsistently for the set metadata.- Correctly wires
deleted-granule-setting/deleted-granule-mapping.- Builds
:granule :indexesassmall_collectionsplusextra-granule-indexes, with appropriate per-index settings and shared:individual-index-settings.This matches the earlier settings/mappings and the documented CRUD impact. No changes requested.
…sitory into CMR-10600-split-cluster-feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
indexer-app/src/cmr/indexer/data/index_set.clj(7 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-20T16:26:11.404Z
Learnt from: indiejames
Repo: nasa/Common-Metadata-Repository PR: 2328
File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349
Timestamp: 2025-10-20T16:26:11.404Z
Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.
Applied to files:
indexer-app/src/cmr/indexer/data/index_set.clj
🔇 Additional comments (6)
indexer-app/src/cmr/indexer/data/index_set.clj (6)
18-20: LGTM!Centralizing
index-set-idas a top-level def is a good refactor for maintainability and consistency across index naming functions.
77-95: LGTM!The index naming defs provide a clean, consistent pattern derived from
index-set-id. This centralizes the naming convention and makes the codebase easier to maintain.
954-988: LGTM!The
gran-index-setfunction cleanly encapsulates granule-specific index configuration. The docstring with the IMPORTANT note about CRUD impact is helpful for maintainers.
990-1059: LGTM!The
non-gran-index-setfunction properly encapsulates non-granule index configuration and correctly includes generic document mappings. The separation fromgran-index-setsupports the dual-cluster architecture.
1127-1131: Logic is correct for fetching from granule cluster only.Rebalancing is a granule-specific operation, so fetching only from
es-config/gran-elastic-nameis appropriate.Note: The
gran-index-set contextissue on line 1127 was already flagged above.
4-4: Remove unused log function imports.The
info,warn, anderrorfunctions imported fromcmr.common.logare not used anywhere in this file. Remove them to keep imports clean.⛔ Skipped due to learnings
Learnt from: indiejames Repo: nasa/Common-Metadata-Repository PR: 2328 File: indexer-app/src/cmr/indexer/services/index_set_service.clj:337-349 Timestamp: 2025-10-20T16:26:11.404Z Learning: In indexer-app/src/cmr/indexer/services/index_set_service.clj, the remove-resharding-index function intentionally returns an empty set when resharding-indexes is nil, rather than throwing an error. This differs from remove-rebalancing-collection and reflects a design preference to avoid nil returns.
Overview
What is the objective?
Allow CMR to point to two different elastic search clusters (gran and non-gran) AND/OR point to one cluster as well
What are the changes?
Update all funcs related to elasticsearch
What areas of the application does this impact?
All
Required Checklist
Additional Checklist
Summary by CodeRabbit
New Features
Bug Fixes & Improvements
Documentation
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.