Skip to content

Conversation

@ruokun-niu
Copy link
Contributor

This pull request introduces several improvements and fixes to the test data store framework, focusing on supporting script-based test sources, enhancing Azure Storage authentication, and implementing missing GitHub script file download logic. This PR also add support for downloading dataset from huggingface

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request adds support for HuggingFace dataset repositories, implements the previously stubbed GitHub script file download functionality, and makes Azure Storage authentication more flexible by supporting identity-based authentication alongside access key authentication.

  • Implements complete GitHub repo folder and file download logic (replaces todo!() placeholders)
  • Adds new HuggingFace test repository client with dataset download support
  • Makes Azure Storage access key optional, enabling identity-based authentication (managed identity, Azure CLI, etc.)

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
e2e-test-framework/test-data-store/src/test_repo_storage/repo_clients/mod.rs Adds HuggingFace config variant, makes Azure access key optional, adds helper for masking optional secrets, adds comprehensive Azure auth documentation
e2e-test-framework/test-data-store/src/test_repo_storage/repo_clients/huggingface_test_repo_client.rs New client implementation for downloading test data from HuggingFace datasets
e2e-test-framework/test-data-store/src/test_repo_storage/repo_clients/github_test_repo_client.rs Implements previously stubbed GitHub folder/file download methods with recursive directory traversal
e2e-test-framework/test-data-store/src/test_repo_storage/repo_clients/azure_storage_blob_test_repo_client.rs Adds identity-based authentication support, improves error handling for strip_prefix
e2e-test-framework/test-data-store/src/test_repo_storage/models.rs Adds test for script-based test configuration parsing
e2e-test-framework/test-data-store/Cargo.toml Adds azure_identity dependency for identity-based authentication

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 112 to 116
let data_type_name = file_path.parent().unwrap().file_name().unwrap().to_str().unwrap().to_string();
if !file_path_map.contains_key(&data_type_name) {
file_path_map.insert(data_type_name.clone(), vec![]);
}
file_path_map.get_mut(&data_type_name).unwrap().push(file_path);
Copy link

Copilot AI Dec 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple unwrap() calls in a chain can cause panic if any intermediate value is None. Consider using proper error handling with Result or providing more descriptive error messages. For example, if the file path has no parent, or the parent has no file name, or the OS string cannot be converted to UTF-8, this will panic rather than returning a proper error.

Suggested change
let data_type_name = file_path.parent().unwrap().file_name().unwrap().to_str().unwrap().to_string();
if !file_path_map.contains_key(&data_type_name) {
file_path_map.insert(data_type_name.clone(), vec![]);
}
file_path_map.get_mut(&data_type_name).unwrap().push(file_path);
let data_type_name = file_path
.parent()
.and_then(|p| p.file_name())
.and_then(|name| name.to_str())
.ok_or_else(|| anyhow::anyhow!("Invalid bootstrap script path: {:?}", file_path))?
.to_string();
file_path_map
.entry(data_type_name.clone())
.or_insert_with(Vec::new)
.push(file_path);

Copilot uses AI. Check for mistakes.
ruokun-niu and others added 7 commits December 19, 2025 09:01
Signed-off-by: ruokun-niu <[email protected]>
…clients/github_test_repo_client.rs

Co-authored-by: Copilot <[email protected]>
…clients/huggingface_test_repo_client.rs

Co-authored-by: Copilot <[email protected]>
…clients/github_test_repo_client.rs

Co-authored-by: Copilot <[email protected]>
…clients/huggingface_test_repo_client.rs

Co-authored-by: Copilot <[email protected]>
…clients/huggingface_test_repo_client.rs

Co-authored-by: Copilot <[email protected]>
…clients/github_test_repo_client.rs

Co-authored-by: Copilot <[email protected]>
@ruokun-niu ruokun-niu marked this pull request as ready for review December 19, 2025 21:22
@ruokun-niu ruokun-niu changed the title Updated test repo clients Add support for github and hugginface clients; add identity support for the storage blob client Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant