Skip to content

Hashed Partition Numbers in Staged File Names (SNOW-1642799) #1100

@alonpr

Description

@alonpr

If I understand correctly, part of this change introduced hashing of the partition number in the staged file name. I'm not entirely clear on why this was necessary. Even in n:1 topic-to-table ingestion scenarios using topic2TableMap, the combination of topic and partition should already ensure uniqueness. I noticed a comment suggesting the hash was added due to filename length limitations, but I'm not certain.

That said, it seems this change has led to the following issues, when the partition number is hashed:

  1. It's no longer possible to identify the originating partition from the file name.

  2. These files are not being cleaned up automatically and must be removed manually.

Could you please confirm if these observations are accurate?

I can somewhat understand the reasoning behind the approach taken, but it feels odd that a broader breaking change was avoided at the cost of introducing a break in expected logic. Replacing what’s defined as the "partition number" portion of a staged file with a hash seems more like a bug than a feature.

It's true that before the change, when using topic2TableMap, the originating topic couldn't be identified from the file name, but at least the originating partition was still provided. Now, it's no longer possible to identify either.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions