Skip to content

Conversation

@William-Hill
Copy link

Description

Currently the S3 destination connector creates the object's name in the S3 bucket using a timestamp. This change uses the record's key as the object name if available. Otherwise it falls back to the previous method of using the timestamp

Fixes # (issue)

Quick checks:

  • I have followed the Code Guidelines.
  • There is no other pull request for the same update/change.
  • I have written unit tests.
  • I have made sure that the PR is of reasonable size and can be easily reviewed.

@William-Hill William-Hill requested a review from a team as a code owner March 25, 2025 23:47
Copy link
Contributor

@raulb raulb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only need to fix CI

var key string
if len(batch.Records) > 0 {
// Convert the key bytes to string
keyBytes := batch.Records[0].Key.Bytes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, that could go wrong. For instance, the key could be structured data, in which case you'd get a JSON string (which is not in line with the key naming guidelines). Also, the key is related to the first record, the rest of the batch might not be related to that. I'm not a fan of this change, tbh.

I'm thinking how else we could achieve what you're trying to do. We could introduce a metadata field (e.g. s3.key) and use that as the object key, if it's present. You could then use a processor to set that metadata field. The issue remains that the first record in the batch is the only record for which this field would be respected, unless we change the logic to group records based on the metadata field and create multiple objects.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lovromazgon this makes sense. I actually run into that exact issue of the key being structured data when the source was a Postgres database (I previously was testing with a Generator source). I can try the metadata approach to get by for the demo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants