Add ACL support for cloud ingestion pipeline #2917
Open
+780
−404
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose
Fixes #2831
This PR extends the cloud ingestion pipeline to automatically extract and apply Access Control Lists (ACLs) from Azure Data Lake Storage Gen2, enabling document-level security filtering in Azure AI Search.
Notable, we could not use the built-in indexer capability to extract ACLs, as the indexer does not like that we have a Custom Web API skill in the skillset. We've requested that as a feature from Azure AI Search, but in the meantime, we implemented ACL extraction using ADLS2 client in the document extractor function.
As part of this change, we also are deprecating the ADLS local file strategy. Developers who want ADLS support will need to use cloud ingestion. There were numerous issues with the local strategy, and going forward, we think it's better to just maintain the cloud option, since that is more scalable anyway.
Features
Cloud ingestion with ACLs
oidsandgroupsfields for security filteringo::r--are accessible to all authenticated users)Bring your own (BYO) ADLS storage account
New environment variables
USE_CLOUD_INGESTION_ACLSUSE_EXISTING_ADLS_STORAGEAZURE_ADLS_GEN2_STORAGE_ACCOUNTAZURE_ADLS_GEN2_STORAGE_RESOURCE_GROUPDocumentation
Does this introduce a breaking change?
When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.
Does this require changes to learn.microsoft.com docs?
This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.
Type of change
Code quality checklist
See CONTRIBUTING.md for more details.
python -m pytest).python -m pytest --covto verify 100% coverage of added linespython -m mypyto check for type errorsruffandblackmanually on my code.