-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[release test] move azure related functions to cloud_util.py
#57675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ff1b565
to
51474ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the codebase by moving Azure-related utility functions from util.py
into a new, dedicated cloud_util.py
module. This is a good improvement for code organization and modularity. The corresponding BUILD.bazel
files and tests are updated accordingly. My review includes a couple of suggestions to improve consistency in URL handling for Azure and to ensure temporary files are properly cleaned up.
archived_file_path = archive_directory(working_dir) | ||
archived_filename = os.path.basename(archived_file_path) | ||
azure_file_path = f"{azure_directory_uri}/{archived_filename}" | ||
upload_file_to_azure( | ||
local_file_path=archived_file_path, azure_file_path=azure_file_path | ||
) | ||
return azure_file_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The upload_working_dir_to_azure
function calls archive_directory
to create a temporary zip file, but this file is not deleted after being uploaded to Azure. This will leave temporary files in the execution environment. It's a good practice to ensure temporary resources are cleaned up. You can use a try...finally
block to ensure the file is removed even if the upload fails.
archived_file_path = archive_directory(working_dir)
try:
archived_filename = os.path.basename(archived_file_path)
azure_file_path = f"{azure_directory_uri}/{archived_filename}"
upload_file_to_azure(
local_file_path=archived_file_path, azure_file_path=azure_file_path
)
return azure_file_path
finally:
if os.path.exists(archived_file_path):
os.remove(archived_file_path)
def convert_abfss_uri_to_https(uri: str) -> str: | ||
"""Convert ABFSS URI to HTTPS URI. | ||
ABFSS URI format: abfss://[email protected]/path | ||
Returns: HTTPS URI format: https://account.dfs.core.windows.net/container/path | ||
""" | ||
account, container, path = _parse_abfss_uri(uri) | ||
return f"https://{account}.dfs.core.windows.net/{container}/{path}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function convert_abfss_uri_to_https
returns a URL with dfs.core.windows.net
, while upload_file_to_azure
in the same file uses blob.core.windows.net
when creating a BlobServiceClient
. This inconsistency can be confusing and lead to potential issues. For better maintainability and clarity, it's recommended to consistently use the blob storage endpoint (blob.core.windows.net
) throughout, especially since you are using azure.storage.blob.BlobServiceClient
.
def convert_abfss_uri_to_https(uri: str) -> str:
"""Convert ABFSS URI to HTTPS URI.
ABFSS URI format: abfss://[email protected]/path
Returns: HTTPS URI format: https://account.blob.core.windows.net/container/path
"""
account, container, path = _parse_abfss_uri(uri)
return f"https://{account}.blob.core.windows.net/{container}/{path}"
51474ad
to
927a938
Compare
out of `util.py`. also adding its own `py_library` Signed-off-by: Lonnie Liu <[email protected]>
927a938
to
12abb3a
Compare
out of
util.py
. also adding its ownpy_library