-
Notifications
You must be signed in to change notification settings - Fork 106
Description
If I understand correctly, part of this change introduced hashing of the partition number in the staged file name. I'm not entirely clear on why this was necessary. Even in n:1 topic-to-table ingestion scenarios using topic2TableMap, the combination of topic and partition should already ensure uniqueness. I noticed a comment suggesting the hash was added due to filename length limitations, but I'm not certain.
That said, it seems this change has led to the following issues, when the partition number is hashed:
-
It's no longer possible to identify the originating partition from the file name.
-
These files are not being cleaned up automatically and must be removed manually.
Could you please confirm if these observations are accurate?
I can somewhat understand the reasoning behind the approach taken, but it feels odd that a broader breaking change was avoided at the cost of introducing a break in expected logic. Replacing what’s defined as the "partition number" portion of a staged file with a hash seems more like a bug than a feature.
It's true that before the change, when using topic2TableMap, the originating topic couldn't be identified from the file name, but at least the originating partition was still provided. Now, it's no longer possible to identify either.