Skip to content

Conversation

@pamelafox
Copy link
Collaborator

Purpose

Fixes #2831

This PR extends the cloud ingestion pipeline to automatically extract and apply Access Control Lists (ACLs) from Azure Data Lake Storage Gen2, enabling document-level security filtering in Azure AI Search.
Notable, we could not use the built-in indexer capability to extract ACLs, as the indexer does not like that we have a Custom Web API skill in the skillset. We've requested that as a feature from Azure AI Search, but in the meantime, we implemented ACL extraction using ADLS2 client in the document extractor function.

As part of this change, we also are deprecating the ADLS local file strategy. Developers who want ADLS support will need to use cloud ingestion. There were numerous issues with the local strategy, and going forward, we think it's better to just maintain the cloud option, since that is more scalable anyway.

Features

Cloud ingestion with ACLs

  • Extract user and group ACLs from ADLS Gen2 files during indexing
  • Index documents with oids and groups fields for security filtering
  • Support for global document access via the ADLS "other" ACL entry (files with o::r-- are accessible to all authenticated users)

Bring your own (BYO) ADLS storage account

  • Use an existing ADLS Gen2 account instead of provisioning a new one
  • Support for ADLS accounts in different resource groups
  • Automatic RBAC role assignment at the storage account level

New environment variables

Variable Description
USE_CLOUD_INGESTION_ACLS Enable ACL extraction from ADLS Gen2
USE_EXISTING_ADLS_STORAGE Use an existing ADLS account
AZURE_ADLS_GEN2_STORAGE_ACCOUNT Name of the ADLS storage account
AZURE_ADLS_GEN2_STORAGE_RESOURCE_GROUP Resource group for BYO ADLS (optional)

Documentation

  • Updated login_and_acl.md with cloud ingestion ACL setup instructions
  • Added section for BYO ADLS storage account configuration

Does this introduce a breaking change?

When developers merge from main and run the server, azd up, or azd deploy, will this produce an error?
If you're not sure, try it out on an old environment.

[X] Yes - Removed ADLS file strategy
[ ] No

Does this require changes to learn.microsoft.com docs?

This repository is referenced by this tutorial
which includes deployment, settings and usage instructions. If text or screenshot need to change in the tutorial,
check the box below and notify the tutorial author. A Microsoft employee can do this for you if you're an external contributor.

[ ] Yes
[X] No

Type of change

[X] Bugfix
[X] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[ ] Other... Please describe:

Code quality checklist

See CONTRIBUTING.md for more details.

  • The current tests all pass (python -m pytest).
  • I added tests that prove my fix is effective or that my feature works
  • I ran python -m pytest --cov to verify 100% coverage of added lines
  • I ran python -m mypy to check for type errors
  • I either used the pre-commit hooks or ran ruff and black manually on my code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for ACLs on the indexer in cloud ingestion

2 participants