Skip to content

Comments

CASSSIDECAR-226 Adding endpoint for verifying files post data copy during live migration#309

Open
nvharikrishna wants to merge 3 commits intoapache:trunkfrom
nvharikrishna:226-lm-file-digests-trunk
Open

CASSSIDECAR-226 Adding endpoint for verifying files post data copy during live migration#309
nvharikrishna wants to merge 3 commits intoapache:trunkfrom
nvharikrishna:226-lm-file-digests-trunk

Conversation

@nvharikrishna
Copy link
Contributor

@nvharikrishna nvharikrishna commented Jan 25, 2026

CASSSIDECAR-226 Adding an endpoint for verifying files between source and destination post data copy.

This implementation uses a two-task approach (data copy + file verification) rather than inline digest verification during data copy (as originally proposed in CEP-40). This design choice is motivated by:

  1. Performance Efficiency: The data copy task executes multiple iterations internally. Even with successThreshold=1.0, the task might require at least two internal iterations (iteration 0: download → DOWNLOAD_COMPLETE, iteration 1: verify threshold → SUCCESS). Inline digest verification would calculate digests twice per file (once in each iteration), doubling the I/O cost. With separate tasks, digests are calculated once after data stabilizes.
  2. Code Simplicity: Separating digest verification from file copying provides clear separation of concerns, making each task easier to understand, test, and maintain.
  3. Operational Flexibility: Users can run verification independently, repeat it if needed, or skip it for non-critical migrations. Inline verification would make this mandatory overhead.

Here are the endpoint details:

Sample files verification task submission request:

curl -X POST http://dest-host.example.com:9043/api/v1/live-migration/files-verification-tasks \
  -H "Content-Type: application/json" \
  -d '{
    "maxConcurrency": 10,
    "digestAlgorithm": "MD5"
  }'

It supports XXHash32 algorithm too and seed as additional input in the payload.

Sample response:

{
  "taskId": "b8e4f3d2-5c6b-5d9e-0f2g-3b4c5d6e7f8g",
  "statusUrl": "/api/v1/live-migration/files-verification-tasks/b8e4f3d2-5c6b-5d9e-0f2g-3b4c5d6e7f8g"
}

Fetching files verification task status

curl -X GET http://dest-host.example.com:9043/api/v1/live-migration/files-verification-tasks/b8e4f3d2-5c6b-5d9e-0f2g-3b4c5d6e7f8g

Sample response:

{
  "id": "b8e4f3d2-5c6b-5d9e-0f2g-3b4c5d6e7f8g ",
  "digestAlgorithm": "md5",
  "seed": null,
  "state": "COMPLETED",
  "source": "localhost1",
  "port": 9043,
  "filesNotFoundAtSource": 0,
  "filesNotFoundAtDestination": 0,
  "metadataMatched": 379,
  "metadataMismatches": 0,
  "digestMismatches": 0,
  "digestVerificationFailures": 0,
  "filesMatched": 323
}

Also made additional changes to ensure that either data copy task or file verification task can be executed at any point of time.

Comment on lines 64 to 65
String fullURI = seed != null
? String.format("%s?%s=%s&%s=%d", requestURI, DIGEST_ALGORITHM_PARAM, digestAlgorithm, SEED_PARAM, seed)
Copy link
Contributor

@yifan-c yifan-c Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One learning from the RestoreJob work is that the custom seed does not provide benefit for data integrity validation, but only adds code complexity. I would just drop the support of custom seed support to simplify the implementation, and use the fixed seed 0, which also makes the client-server communication simpler.
Not strong on removing the seed support, but feel ideal to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the code complexity and simplifying code suggestions. I can remove the support for seed for live migration.

@nvharikrishna nvharikrishna force-pushed the 226-lm-file-digests-trunk branch from 9e9da17 to 39cb915 Compare February 20, 2026 19:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants