-
Notifications
You must be signed in to change notification settings - Fork 4
Add support for github and hugginface clients; add identity support for the storage blob client #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: feature-lib
Are you sure you want to change the base?
Conversation
Signed-off-by: ruokun-niu <[email protected]>
…to storage blob client Signed-off-by: ruokun-niu <[email protected]>
Signed-off-by: ruokun-niu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds support for HuggingFace dataset repositories, implements the previously stubbed GitHub script file download functionality, and makes Azure Storage authentication more flexible by supporting identity-based authentication alongside access key authentication.
- Implements complete GitHub repo folder and file download logic (replaces
todo!()placeholders) - Adds new HuggingFace test repository client with dataset download support
- Makes Azure Storage access key optional, enabling identity-based authentication (managed identity, Azure CLI, etc.)
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| e2e-test-framework/test-data-store/src/test_repo_storage/repo_clients/mod.rs | Adds HuggingFace config variant, makes Azure access key optional, adds helper for masking optional secrets, adds comprehensive Azure auth documentation |
| e2e-test-framework/test-data-store/src/test_repo_storage/repo_clients/huggingface_test_repo_client.rs | New client implementation for downloading test data from HuggingFace datasets |
| e2e-test-framework/test-data-store/src/test_repo_storage/repo_clients/github_test_repo_client.rs | Implements previously stubbed GitHub folder/file download methods with recursive directory traversal |
| e2e-test-framework/test-data-store/src/test_repo_storage/repo_clients/azure_storage_blob_test_repo_client.rs | Adds identity-based authentication support, improves error handling for strip_prefix |
| e2e-test-framework/test-data-store/src/test_repo_storage/models.rs | Adds test for script-based test configuration parsing |
| e2e-test-framework/test-data-store/Cargo.toml | Adds azure_identity dependency for identity-based authentication |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...test-framework/test-data-store/src/test_repo_storage/repo_clients/github_test_repo_client.rs
Outdated
Show resolved
Hide resolved
...framework/test-data-store/src/test_repo_storage/repo_clients/huggingface_test_repo_client.rs
Outdated
Show resolved
Hide resolved
...framework/test-data-store/src/test_repo_storage/repo_clients/huggingface_test_repo_client.rs
Outdated
Show resolved
Hide resolved
...test-framework/test-data-store/src/test_repo_storage/repo_clients/github_test_repo_client.rs
Outdated
Show resolved
Hide resolved
...framework/test-data-store/src/test_repo_storage/repo_clients/huggingface_test_repo_client.rs
Outdated
Show resolved
Hide resolved
...test-framework/test-data-store/src/test_repo_storage/repo_clients/github_test_repo_client.rs
Outdated
Show resolved
Hide resolved
| let data_type_name = file_path.parent().unwrap().file_name().unwrap().to_str().unwrap().to_string(); | ||
| if !file_path_map.contains_key(&data_type_name) { | ||
| file_path_map.insert(data_type_name.clone(), vec![]); | ||
| } | ||
| file_path_map.get_mut(&data_type_name).unwrap().push(file_path); |
Copilot
AI
Dec 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple unwrap() calls in a chain can cause panic if any intermediate value is None. Consider using proper error handling with Result or providing more descriptive error messages. For example, if the file path has no parent, or the parent has no file name, or the OS string cannot be converted to UTF-8, this will panic rather than returning a proper error.
| let data_type_name = file_path.parent().unwrap().file_name().unwrap().to_str().unwrap().to_string(); | |
| if !file_path_map.contains_key(&data_type_name) { | |
| file_path_map.insert(data_type_name.clone(), vec![]); | |
| } | |
| file_path_map.get_mut(&data_type_name).unwrap().push(file_path); | |
| let data_type_name = file_path | |
| .parent() | |
| .and_then(|p| p.file_name()) | |
| .and_then(|name| name.to_str()) | |
| .ok_or_else(|| anyhow::anyhow!("Invalid bootstrap script path: {:?}", file_path))? | |
| .to_string(); | |
| file_path_map | |
| .entry(data_type_name.clone()) | |
| .or_insert_with(Vec::new) | |
| .push(file_path); |
Signed-off-by: ruokun-niu <[email protected]>
…clients/github_test_repo_client.rs Co-authored-by: Copilot <[email protected]>
…clients/huggingface_test_repo_client.rs Co-authored-by: Copilot <[email protected]>
…clients/github_test_repo_client.rs Co-authored-by: Copilot <[email protected]>
…clients/huggingface_test_repo_client.rs Co-authored-by: Copilot <[email protected]>
…clients/huggingface_test_repo_client.rs Co-authored-by: Copilot <[email protected]>
…clients/github_test_repo_client.rs Co-authored-by: Copilot <[email protected]>
This pull request introduces several improvements and fixes to the test data store framework, focusing on supporting script-based test sources, enhancing Azure Storage authentication, and implementing missing GitHub script file download logic. This PR also add support for downloading dataset from huggingface