Determine Azure directories more accurately using metadata and blob properties#6428
Determine Azure directories more accurately using metadata and blob properties#6428adamrtalbot wants to merge 10 commits intomasterfrom
Conversation
…roperties - Refactor AzFileAttributes to use the `hdi_isfolder` metadata key as the primary indicator for directories. - Fallback to blob name ending with '/' only if metadata is not present. - Add and update tests to cover edge cases for directory detection. - Improves compatibility with Azure Data Lake Storage Gen2 and hierarchical namespace scenarios. Signed-off-by: adamrtalbot <[email protected]>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
- Improved logic to determine if a blob is a directory by checking for the "hdi_isfolder" metadata key and its value. - If the blob is a directory, set size to 0 and mark as directory. - If not, treat as file and set size and timestamps from blob properties. - This ensures compatibility with Azure's explicit directory markers and avoids misclassification. Signed-off-by: adamrtalbot <[email protected]>
Signed-off-by: adamrtalbot <[email protected]>
…adata Signed-off-by: adamrtalbot <[email protected]>
|
|
||
| AzFileAttributes(String containerName, BlobItem item) { | ||
| objectId = "/${containerName}/${item.name}" | ||
| directory = item.name.endsWith('/') |
There was a problem hiding this comment.
@bentsherman what will break if we incorrectly classify something as a file when it's a pseudo-directory? What's the impact of getting it wrong?
…tory bug (#6427) Add a testingest to document and demonstrate the bug in AzFileAttributes where any blobName ending with '/' is treated as a directory, regardless of actual Azure semantics. This test will help verify and prevent regressions when fixing the directory detection logic. Signed-off-by: adamrtalbot <[email protected]>
…directory
- Only treat paths ending with a trailing slash as directories (e.g. `/container/dir/`).
- Paths without a trailing slash (even if they exist as directories in Azure) are treated as files unless Azure metadata explicitly marks them as folders.
- This fixes inconsistent behavior in Nextflow pipelines where `file("az://container/dir")` and `file("az://container/dir/")` were detected differently.
- Updates `AzFileAttributes` logic and adds/updates tests to ensure correct directory detection for all path formats.
Signed-off-by: adamrtalbot <[email protected]>
| } | ||
| } | ||
|
|
||
| return false |
There was a problem hiding this comment.
Bug: Azure API Calls Cause Performance Issues
The isDirectory() method now makes multiple uncached Azure API calls (exists(), getProperties(), listBlobs()) on every invocation when the directory field is false. This causes a significant performance regression. It also has a logic flaw that can lead to incorrect directory detection and lacks exception handling for these network calls.
| } | ||
| } | ||
|
|
||
| return false |
There was a problem hiding this comment.
Bug: Trailing Slash Bug Affects Directory Detection
The AzPath constructor's handling of trailing slashes causes isDirectory() to return early, bypassing its new Azure-querying logic for directory detection. This also creates unreachable code in the AzFileAttributes constructor and leads to conflicting test expectations regarding paths ending with a slash.
Additional Locations (4)
plugins/nf-azure/src/test/nextflow/cloud/azure/nio/AzPathTest.groovy#L43-L44plugins/nf-azure/src/test/nextflow/cloud/azure/nio/AzPathTest.groovy#L65-L66plugins/nf-azure/src/main/nextflow/cloud/azure/nio/AzFileAttributes.groovy#L136-L137plugins/nf-azure/src/test/nextflow/cloud/azure/nio/AzPathTest.groovy#L161-L185
plugins/nf-azure/src/main/nextflow/cloud/azure/nio/AzFileAttributes.groovy
Show resolved
Hide resolved
Enhance the logic for determining if a blob represents a directory in Azure. Now considers both the `hdi_isfolder` metadata and the presence of child blobs, providing more accurate directory status detection. Updates related tests to reflect the improved behavior. Signed-off-by: adamrtalbot <[email protected]>
|
I have a branch that breaks out these functions into a separate AzDirectoryUtils class for DRY...too much complexity? |
d9fa5cd to
d752bc2
Compare
hdi_isfoldermetadata key as the primary indicator for directories.hdi_isfoldermetadata is not present or blob does not exist, it will list blobs at path. If there are "sub blobs", it will determine it is a directory.Fixes #6427
Signed-off-by: adamrtalbot [email protected]
Hi! Thanks for contributing to Nextflow.
When submitting a Pull Request, please sign-off the DCO [1] to certify that you are the author of the contribution and you adhere to Nextflow's open source license [2] by adding a
Signed-off-byline to the contribution commit message. See [3] for more details.Note
Cursor Bugbot is generating a summary for commit 4dfec66. Configure here.