feat(policy): cache entitlement policy in auth service v2 #2326

jakedoublev · 2025-05-28T17:45:47Z

Proposed Changes

Checklist

I have added or updated unit tests
I have added or updated integration tests (if appropriate)
I have added or updated documentation

Testing Instructions

…egistrations are complete

…-poc

gemini-code-assist

Hello @jakedoublev, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request to help everyone get up to speed quickly.

This PR introduces a significant performance improvement for the authorization service (v2) by implementing an in-memory cache for entitlement policy data. Specifically, it caches attribute definitions and subject mappings, which are frequently needed for policy decisions. Instead of fetching this data from the policy service on every request, the authorization service will now load it into memory upon startup and periodically refresh it based on a configurable interval. This change aims to reduce latency and load on the policy service, leading to faster authorization decisions.

Highlights

Entitlement Policy Caching: An in-memory cache for policy attributes and subject mappings is introduced in the authorization service (v2).
Periodic Cache Refresh: The cache is configured to periodically refresh its data from the policy service to stay up-to-date. The refresh interval is configurable.
Just-In-Time PDP Integration: The Just-In-Time Policy Decision Point (PDP) within the authorization service is updated to utilize the new cache for fetching policy data instead of making direct calls to the policy service for each request.
Service Registration Hook: A new mechanism is added to the service registry and server startup process to allow services to register hooks that run after all services have been registered. This is used here to initialize and start the policy cache and the cached PDP instance.
Configuration Option: A new configuration option (cache_refresh_interval_seconds) is added to control the cache refresh frequency.

Changelog

Click here to see the changelog

opentdf-dev.yaml
- Enabled the policy configuration section (lines 45-48).
- Added cache_refresh_interval_seconds configuration option with a default value of 60 seconds (line 49).
opentdf-example.yaml
- Added a commented-out example for cache_refresh_interval_seconds (line 40).
service/authorization/v2/authorization.go
- Imported the new policy cache package (line 16).
- Added a pdp field of type *access.JustInTimePDP to the Service struct (line 27).
- Added OnCompleteServiceRegistration function to create and start the EntitlementPolicyCache and initialize the JustInTimePDP using the cache (lines 34-54).
- Registered the OnCompleteServiceRegistration hook in the NewRegistration function (line 65).
- Modified GetEntitlements to use the cached PDP instance (as.pdp) instead of creating a new one on each call (line 109).
- Modified GetDecision to use the cached PDP instance (as.pdp) instead of creating a new one on each call (line 139).
- Modified GetDecisionMultiResource to use the cached PDP instance (as.pdp) instead of creating a new one on each call (line 169).
- Corrected the error return type in GetDecisionMultiResource from connect.NewError to the raw error (line 179).
service/internal/access/v2/just_in_time_pdp.go
- Imported the new policy cache package (line 19).
- Added a policyCache field of type *policyCache.EntitlementPolicyCache to the JustInTimePDP struct (line 32).
- Added NewJustInTimePDPWithCachedEntitlementPolicy function to create a PDP that uses the provided cache for fetching policy data (lines 77-128).
- Modified the logic within NewJustInTimePDPWithCachedEntitlementPolicy to fetch attributes and subject mappings from the cache if it's provided, otherwise fall back to fetching directly (lines 106-119).
service/pkg/config/config.go
- Added ServiceRegistrationCompleteHook type (line 18).
- Added onServiceRegistrationCompleteHooks slice to the Config struct (line 54).
- Added AddOnServiceRegistrationCompleteHook method to append hooks (lines 119-121).
- Added RunRegistrationCompleteHooks method to execute the registered hooks (lines 137-146).
service/pkg/server/services.go
- Added logic to register the OnCompleteServiceRegistrationHook from the service options during service startup (lines 195-197).
service/pkg/server/start.go
- Added a call to cfg.RunRegistrationCompleteHooks after all services have been registered and before the server starts (lines 317-319).
service/pkg/serviceregistry/serviceregistry.go
- Added OnCompleteServiceRegistrationHook type (line 60).
- Added RegisterOnCompleteServiceRegistrationHook method to the IService interface (line 83).
- Added OnCompleteServiceRegistration field to the ServiceOptions struct (line 114).
- Implemented RegisterOnCompleteServiceRegistrationHook method for the Service struct (lines 198-211).
service/policy/cache/cache.go
- New file: Implements the EntitlementPolicyCache struct and its methods for caching policy attributes and subject mappings.
- Includes methods for creating (NewEntitlementPolicyCache), starting (Start), stopping (Stop), refreshing (Refresh), and retrieving data (GetAttributes, GetSubjectMappings).
- Implements periodic refresh using a ticker and goroutine (periodicRefresh).
- Includes helper methods (fetchAllDefinitions, fetchAllSubjectMappings) to fetch data from the SDK with pagination handling.
service/policy/config/config.go
- Added configuredRefreshInterval variable with a default value (line 14).
- Added CacheRefreshIntervalSeconds field to the Config struct (line 24).
- Modified GetSharedPolicyConfig to set the global configuredRefreshInterval based on the config value (line 52).
- Added GetPolicyEntitlementCacheRefreshInterval function to expose the configured interval (lines 59-61).

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Policy data flows,
Cached in memory it goes,
Decisions are fast.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a caching mechanism for entitlement policy (attributes and subject mappings) within the authorization service. This is a significant improvement for performance, as it avoids fetching policy data from the policy service on every decision/entitlement request. The implementation involves creating a new policy cache package, integrating it into the authorization service, and updating the service registration flow to initialize the cache and the PDP using the cached data. The changes also include adding configuration options for the cache refresh interval and updating the example YAML files.

The overall approach seems sound and addresses a potential performance bottleneck. However, there are a few areas that require attention, particularly regarding error handling, service readiness, and potential issues with the cache implementation.

Summary of Findings

Cache Data Mutability: The GetAttributes and GetSubjectMappings methods in the policy cache return internal slices directly. This allows external callers to modify the cached data, which can lead to race conditions and incorrect behavior. It is recommended to return copies of the slices to prevent external modification.
Error Handling Consistency: The error handling in GetDecisionMultiResource was changed to return the raw error from a helper function instead of wrapping it in a connect.NewError with CodeInternal. This alters the gRPC status code returned to the client for this specific error path, potentially breaking clients that expect CodeInternal.
Service Readiness Check: The readiness check registration for the authorization service is commented out. A proper readiness check is essential to ensure the service is fully initialized (including the policy cache and PDP) before it is considered ready to handle requests. This should be restored and implemented to reflect the actual readiness state.
Efficiency of Fetching All Policy: The cache implementation fetches all active attributes and all subject mappings from the policy service. While the pagination logic correctly handles fetching large datasets, the total volume of policy data could become very large in some deployments, potentially impacting memory usage and refresh times. This is a scalability consideration inherent to caching all policy data.

Merge Readiness

This pull request introduces a valuable caching feature for the authorization service, which should improve performance. However, the identified high severity issues related to cache data mutability, error handling consistency, and the missing readiness check should be addressed before merging. These issues could impact the correctness and reliability of the service. I am unable to approve this pull request directly; please have other reviewers assess the changes after the suggested modifications are made.

gemini-code-assist · 2025-05-28T17:46:52Z

service/authorization/v2/authorization.go

@@ -166,7 +176,7 @@ func (as *Service) GetDecisionMultiResource(ctx context.Context, req *connect.Re
 	resourceDecisions, err := rollupMultiResourceDecision(decisions)
 	if err != nil {
 		as.logger.ErrorContext(ctx, "failed to rollup multi resource decision", slog.String("error", err.Error()))
-		return nil, connect.NewError(connect.CodeInternal, err)
+		return nil, err


This change modifies the error returned by the GetDecisionMultiResource handler. Previously, it wrapped the error from rollupMultiResourceDecision in a connect.NewError with connect.CodeInternal. Now, it returns the raw error. If rollupMultiResourceDecision returns a standard Go error, the Connect handler will wrap it in connect.NewError(connect.CodeUnknown, err). This changes the gRPC status code returned to the client from Internal to Unknown for this specific error path. Is this change in API behavior intended? It might be better to maintain consistency and return CodeInternal.

Suggested change

return nil, err

return nil, connect.NewError(connect.CodeInternal, err)

gemini-code-assist · 2025-05-28T17:46:52Z

service/policy/cache/cache.go

+	// TODO: we may want to copy this so callers cannot modify the cached data
+	return c.attributes


The TODO comment here points out a potential issue. Returning the internal slice directly allows callers to modify the cached data, which could lead to race conditions or unexpected behavior if multiple goroutines are reading and writing to the cache concurrently or if a caller unintentionally modifies the shared state. It's generally safer to return a copy of the slice.

// TODO: we may want to copy this so callers cannot modify the cached data attributesCopy := make([]*policy.Attribute, len(c.attributes)) copy(attributesCopy, c.attributes) return attributesCopy

gemini-code-assist · 2025-05-28T17:46:52Z

service/policy/cache/cache.go

+	// TODO: we may want to copy this so callers cannot modify the cached data
+	return c.subjectMappings


Similar to the GetAttributes method, returning the internal slice of subject mappings directly allows callers to modify the cached data. This is a potential source of bugs and race conditions. Returning a copy is safer.

// TODO: we may want to copy this so callers cannot modify the cached data subjectMappingsCopy := make([]*policy.SubjectMapping, len(c.subjectMappings)) copy(subjectMappingsCopy, c.subjectMappings) return subjectMappingsCopy

github-actions · 2025-05-28T17:49:16Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	454.519222ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	272.522271ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	343.064291ms
Throughput	291.49 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	48.29579536s
Average Latency	479.994809ms
Throughput	103.53 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4947
Failed Requests	53
Concurrent Requests	50
Total Time	39.110817494s
Average Latency	386.890343ms
Throughput	126.49 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	53 occurrences

Standard Benchmark Metrics Skipped or Failed

jakedoublev · 2025-05-28T17:50:02Z

Closing for #2327

jakedoublev and others added 30 commits April 28, 2025 15:54

feat(authz): GetEntitlementsV2 protos and gencode

afe3ad1

fix copilot-flagged comment

c547edc

proto fixes

8843729

put back auth service change accidentally committed

65fd691

latest changes

288592d

draft v2 protos

999a876

put back original authz protos

50d8436

Merge branch 'main' into feat/entitle-actions-protos

6993846

undo change to original authn proto file

88c25d2

rename reg res fqn to use value

b3c41f3

add entity chain comment

13cf017

Merge branch 'main' into feat/entitle-actions-protos

ec22eb2

Merge branch 'main' into feat/entitle-actions-protos

decdd36

rename bulk methods to use same prefix

7e9e22b

update protos per meeting outcome

2364f25

cleanup

c25a26f

Merge branch 'main' into feat/entitle-actions-protos

8d3b5ca

wip: new authz library approach considering actions in ABAC decisions

884e39c

WIP

4af31a8

Merge branch 'main' into feat/entitle-actions-protos

fe76f4b

Merge branch 'feat/entitle-actions-protos' into feat/authz-actions

b9e0d53

cleanup

9c18432

refactoring

95bb501

feat(core): introduce FQN utility library

c7d1082

wait to remove existing utility funcitons until library is consumable

b2b4e90

go work not ready yet

19192b5

Merge branch 'feat/fqn-lib' into feat/authz-actions

a8a10f9

go.work

38316fc

WIP

6490167

ERS should take new authv2 entity proto

03651a8

jakedoublev added 19 commits May 22, 2025 13:38

improve logs

d3de07e

improve logs

8751910

lint fix

c97d09a

Merge remote-tracking branch 'origin' into feat/authz-actions

9da164f

put back v1 service changes

7b2d8ab

Merge branch 'main' into feat/authz-actions

eae6024

feat(authz): run benchmark CI with blindly cached policy

39ea7a1

add a hook for authorization service to run a hook when all service r…

66909ba

…egistrations are complete

fix comment

f53079c

tweak

81371e7

tweak 2

2856428

lint fixes

59b3ef5

policy caching

6247b2f

trigger CI

bdc3fc8

Merge remote-tracking branch 'origin' into feat/authz-actions-caching…

0f3f5fd

…-poc

add entitlement policy cache

223f1c9

sample configs

97451bd

POC policy caching at configured interval

b8f269c

Merge branch 'main' into feat/authz-actions-caching-poc

4973282

gemini-code-assist bot reviewed May 28, 2025

View reviewed changes

gemini-code-assist bot suggested changes May 28, 2025

View reviewed changes

jakedoublev closed this May 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(policy): cache entitlement policy in auth service v2 #2326

feat(policy): cache entitlement policy in auth service v2 #2326

Uh oh!

jakedoublev commented May 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot May 28, 2025

Uh oh!

gemini-code-assist bot May 28, 2025

Uh oh!

gemini-code-assist bot May 28, 2025

Uh oh!

github-actions bot commented May 28, 2025

Benchmark authorization.GetDecisions Results:

Benchmark authorization.v2.GetMultiResourceDecision Results:

Bulk Benchmark Results

TDF3 Benchmark Results:

NANOTDF Benchmark Results:

Error Summary:

Standard Benchmark Metrics Skipped or Failed

Uh oh!

jakedoublev commented May 28, 2025

Uh oh!

Uh oh!

	return nil, err
	return nil, connect.NewError(connect.CodeInternal, err)

		// TODO: we may want to copy this so callers cannot modify the cached data
		return c.attributes

		// TODO: we may want to copy this so callers cannot modify the cached data
		return c.subjectMappings

feat(policy): cache entitlement policy in auth service v2 #2326

feat(policy): cache entitlement policy in auth service v2 #2326

Uh oh!

Conversation

jakedoublev commented May 28, 2025

Proposed Changes

Checklist

Testing Instructions

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot May 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 28, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot May 28, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented May 28, 2025

Benchmark authorization.GetDecisions Results:

Benchmark authorization.v2.GetMultiResourceDecision Results:

Bulk Benchmark Results

TDF3 Benchmark Results:

NANOTDF Benchmark Results:

Error Summary:

Standard Benchmark Metrics Skipped or Failed

Uh oh!

jakedoublev commented May 28, 2025

Uh oh!

Uh oh!