Skip to content

Conversation

@phact
Copy link
Collaborator

@phact phact commented Jan 22, 2026

This pull request implements comprehensive extraction, propagation, and storage of Access Control List (ACL) information for documents ingested from Google Drive, OneDrive, and SharePoint connectors. It introduces connector-specific logic to fetch detailed user and group permissions from each provider's API and ensures that this ACL data is consistently passed through the document processing pipeline and indexed in OpenSearch. This enables fine-grained access control and auditing for ingested documents.

The most important changes are:

Connector-specific ACL Extraction:

  • Added _extract_google_drive_acl to google_drive/connector.py to fetch and parse user/group permissions from the Google Drive API for each file, and propagate this ACL into ConnectorDocument instances. [1] [2] [3] [4]
  • Added _extract_onedrive_acl to onedrive/connector.py to retrieve permissions from the Microsoft Graph API for OneDrive items, and use this ACL in document creation. [1] [2]
  • Added _extract_sharepoint_acl to sharepoint/connector.py to obtain permissions from the Microsoft Graph API for SharePoint files, and use this ACL in document creation. [1] [2]

Pipeline and Metadata Propagation:

  • Modified the document processing pipeline (service.py and processors.py) to accept and propagate the acl field from connectors through to chunk indexing, ensuring ACLs are stored with each chunk in OpenSearch. [1] [2] [3] [4]

Efficient ACL Indexing and Updates:

  • Refactored _update_connector_metadata in service.py to call a dedicated update_document_acl utility, optimizing ACL updates using hashing to skip unchanged ACLs and updating only when necessary. Other metadata is now updated via a single update_by_query call for efficiency.

These changes collectively provide end-to-end support for extracting, storing, and updating document-level ACLs from external storage providers, improving security and compliance in the document indexing pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants