-
Notifications
You must be signed in to change notification settings - Fork 0
Thawing implementation in the new logic #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
inputPath: props.inputPath, | ||
mapStateName: id, | ||
}); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would be the ramifications if one thaw completed quickly - and the others took 48 hours?
Might the quick thaw "re-expire"??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought potentially for (large or small) thawing we would have only one distributed map - but that within that map would be a single chain for each object doing the thaw->wait->copy loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean this is fine - and is effectively what we had before in that we separated out the thawing map from the copying map. But I wonder if the other is more efficient/neater?
…r reuse. Refactored file names removing "object" from for consistency and improved readability.
…small object thawing still needs refactor to match this refactor.
…h small and large object copy constructs. Cleaned up and unified naming conventions across all related constructs for consistency.
The previous implementation of thawing was not optimal, as it performed all thawing operations before initiating the copy tasks instead of in the same map, which is inefficiency. To address this, I’ve moved the thawing logic into the copy map constructs themselves—SmallObjectsCopyMapConstruct and CopyMapConstruct, so that thawing is now triggered via the The logic is now implemented in:
So, as a result, the following thawing-specific constructs are no longer needed and have been removed:
Other changes include: In In I also cleaned up the state and step names for clarity, while still using CamelCase for naming. We could consider setting more human-readable names for those steps in the future (?) |
return {}; | ||
} | ||
// Return original input so downstream steps (e.g. ECS copy) get expected fields | ||
return event; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a note - you can also pass through to downstream steps with config in the Steps. As in - you can ignore the the return value of the lambda and have Steps itself pass through values.
No need to change here - is really possibly to do with us progressing to full JSONata at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to learn more on this. I was dealing quit a bit with this, and was the best solution I found (or at least the one I actually understood) cos the copy step brokes when trying to pass the inputs needed for thawing. I’m happy to work on trying the full implementation JSONata, so I can learn.
} else { | ||
entryState = delayStep; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
@@ -42,6 +43,9 @@ type Props = { | |||
|
|||
readonly taskDefinition: TaskDefinition; | |||
readonly containerDefinition: ContainerDefinition; | |||
|
|||
readonly thaw?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would clarfiy the name of this - to make it clear that this thaw
changes the Construct and is not dynamic behaviour.
So change it to be a verb about the construct. Maybe addThawStep
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
intelligentTieringDeepArchiveThawSpeed: props.aggressiveTimes | ||
? "Standard" | ||
: "Bulk", | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will put in a new issue - so not for this PR - but we will eventually want these parameters to be allowed to be specified by the invoker of the copy (so possibly bringing this entire data structure up to the Steps invoke parameter level)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, definitely. Just keep it hard coded as it was but would be nice to having it as accessible params.
This PR addresses Issue #13 and introduces a parallel thawing implementation to support copying objects stored in cold storage.
Cold objects are now identified and split into two copysets based on size:
smallThaw
andlargeThaw
(just as regular, hot data based in the 5 MiB threshold ). Each copyset is processed independently through its own thaw + copy map chain. Once thawed, files are copied using either Lambda (for small files) or Fargate (for large files), same that hot data.This is implemented via two new constructs:
thaw-small-copy-map-construct.ts
thaw-large-copy-map-construct.ts
Each of these chains together
thaw-objects-map-construct
with the corresponding copy construct (small-objects-copy-map-construct
orlarge-objects-copy-map-construct
). These thawing chains are integrated intosteps-s3-copy-construct.ts,
running in parallel with hot data copiers.In addition to the two new constructs, six other files were updated to integrate thawing and fix some issues encountered during implementation, as detailed below:
dev-testsuite/test-e2e-thawing.ts
: the test timeout was increased by 2 minutes. Some thawing runs were cutting it close or failing under the original 5-minute limit.lambda/coordinate-copy-lambda.ts
new copy sets (smallThaw and largeThaw) are now created based on whether the object’s storage class is inCOLD_STORAGE_CLASSES
, and whether its size is below or above 5 MiB.Minor TypeScript-related refactors:
(These two were done mostly to silence TS warnings— please, review it.)
small-objects-copy-map-construct.ts
made the construct reusable by allowinglambdaStateName
to be passed in rather than hardcoding it. There's still a bit of mixing between dynamic and hardcoded names—I might clean that up further in a follow-up ?4&5.
thaw-objects-lambda-step-construct.ts
&thaw-objects-map-construct.ts
: refactored to support reusable thawing logic.Key changes:
Introduced dynamic naming for the
lambdaFunction
Now using
S3CsvDistributedMap
instead ofS3JsonlDistributedMap
, since we're now reading thaw instructions directly from JSONL (via thecopySets.smallThaw
/largeThaw
input)NOTE: I’m manually granting invoke permissions to the thawing Lambdas (line ~350). It works, but I’m not entirely sure it’s the cleanest/correct way to do it.