Skip to content

Thawing implementation in the new logic #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 30, 2025
Merged

Thawing implementation in the new logic #28

merged 12 commits into from
Jun 30, 2025

Conversation

felipeji
Copy link
Member

This PR addresses Issue #13 and introduces a parallel thawing implementation to support copying objects stored in cold storage.

Cold objects are now identified and split into two copysets based on size: smallThaw and largeThaw (just as regular, hot data based in the 5 MiB threshold ). Each copyset is processed independently through its own thaw + copy map chain. Once thawed, files are copied using either Lambda (for small files) or Fargate (for large files), same that hot data.

This is implemented via two new constructs:

thaw-small-copy-map-construct.ts
thaw-large-copy-map-construct.ts

Each of these chains together thaw-objects-map-construct with the corresponding copy construct (small-objects-copy-map-construct or large-objects-copy-map-construct). These thawing chains are integrated into steps-s3-copy-construct.ts, running in parallel with hot data copiers.

In addition to the two new constructs, six other files were updated to integrate thawing and fix some issues encountered during implementation, as detailed below:

  1. dev-testsuite/test-e2e-thawing.ts : the test timeout was increased by 2 minutes. Some thawing runs were cutting it close or failing under the original 5-minute limit.

  2. lambda/coordinate-copy-lambda.ts new copy sets (smallThaw and largeThaw) are now created based on whether the object’s storage class is in COLD_STORAGE_CLASSES, and whether its size is below or above 5 MiB.
    Minor TypeScript-related refactors:

  • Switched to ESM-style import for path/posix
  • Added non-null assertions (!) where needed to satisfy the compiler

(These two were done mostly to silence TS warnings— please, review it.)

  1. small-objects-copy-map-construct.ts made the construct reusable by allowing lambdaStateName to be passed in rather than hardcoding it. There's still a bit of mixing between dynamic and hardcoded names—I might clean that up further in a follow-up ?

4&5. thaw-objects-lambda-step-construct.ts & thaw-objects-map-construct.ts: refactored to support reusable thawing logic.

Key changes:

  • Introduced dynamic naming for the lambdaFunction

  • Now using S3CsvDistributedMap instead of S3JsonlDistributedMap, since we're now reading thaw instructions directly from JSONL (via the copySets.smallThaw / largeThaw input)

  1. steps-s3-copy-construct.ts: Integrated calls to the new thawing constructs using the same naming conventions and parallel execution logic as for hot data copiers.
    NOTE: I’m manually granting invoke permissions to the thawing Lambdas (line ~350). It works, but I’m not entirely sure it’s the cleanest/correct way to do it.

@felipeji felipeji requested a review from andrewpatto June 13, 2025 02:53
inputPath: props.inputPath,
mapStateName: id,
});

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the ramifications if one thaw completed quickly - and the others took 48 hours?

Might the quick thaw "re-expire"??

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought potentially for (large or small) thawing we would have only one distributed map - but that within that map would be a single chain for each object doing the thaw->wait->copy loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this is fine - and is effectively what we had before in that we separated out the thawing map from the copying map. But I wonder if the other is more efficient/neater?

Felipe Jimenez-Ibarra added 4 commits June 23, 2025 13:56
…r reuse. Refactored file names removing "object" from for consistency and improved readability.
…small object thawing still needs refactor to match this refactor.
…h small and large object copy constructs. Cleaned up and unified naming conventions across all related constructs for consistency.
@felipeji
Copy link
Member Author

The previous implementation of thawing was not optimal, as it performed all thawing operations before initiating the copy tasks instead of in the same map, which is inefficiency.

To address this, I’ve moved the thawing logic into the copy map constructs themselves—SmallObjectsCopyMapConstruct and CopyMapConstruct, so that thawing is now triggered via the thaw flag when they are instantiated. This new approach allows each object to be thawed just before it is copied.

The logic is now implemented in:

packages/steps-s3-copy/src/lib/copy-map-construct.ts
packages/steps-s3-copy/src/lib/small-copy-map-construct.ts

So, as a result, the following thawing-specific constructs are no longer needed and have been removed:

packages/steps-s3-copy/src/lib/thaw-large-copy-map-construct.ts
packages/steps-s3-copy/src/lib/thaw-objects-map-construct.ts
packages/steps-s3-copy/src/lib/thaw-small-copy-map-construct.ts

Other changes include:

In packages/steps-s3-copy/lambda/can-read-objects-lambda/can-read-objects-lambda.ts, the original event is now passed downstream so that it remains available to the copy step after thawing.

In packages/steps-s3-copy/src/lib/thaw-lambda-step-construct.ts, I added outputPath: "$.Payload" to handling of InputPath in downstream copy steps.

I also cleaned up the state and step names for clarity, while still using CamelCase for naming. We could consider setting more human-readable names for those steps in the future (?)

return {};
}
// Return original input so downstream steps (e.g. ECS copy) get expected fields
return event;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a note - you can also pass through to downstream steps with config in the Steps. As in - you can ignore the the return value of the lambda and have Steps itself pass through values.

No need to change here - is really possibly to do with us progressing to full JSONata at some point

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to learn more on this. I was dealing quit a bit with this, and was the best solution I found (or at least the one I actually understood) cos the copy step brokes when trying to pass the inputs needed for thawing. I’m happy to work on trying the full implementation JSONata, so I can learn.

} else {
entryState = delayStep;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@@ -42,6 +43,9 @@ type Props = {

readonly taskDefinition: TaskDefinition;
readonly containerDefinition: ContainerDefinition;

readonly thaw?: boolean;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would clarfiy the name of this - to make it clear that this thaw changes the Construct and is not dynamic behaviour.

So change it to be a verb about the construct. Maybe addThawStep?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

intelligentTieringDeepArchiveThawSpeed: props.aggressiveTimes
? "Standard"
: "Bulk",
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will put in a new issue - so not for this PR - but we will eventually want these parameters to be allowed to be specified by the invoker of the copy (so possibly bringing this entire data structure up to the Steps invoke parameter level)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely. Just keep it hard coded as it was but would be nice to having it as accessible params.

@felipeji felipeji merged commit 70e196c into main Jun 30, 2025
6 checks passed
@felipeji felipeji deleted the thawing-new-logic branch June 30, 2025 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants