feat(policy): policy should maintain a cache of entitlement policy #2327

jakedoublev · 2025-05-28T17:49:31Z

Proposed Changes

configured refresh interval (with option to set to 0 to bypass caching)
go-cache/ristretto direct implementation POC instead of waiting for shared cache work
returns from cache in listattributes/listsubjectmappings if state is active and cache is enabled

Checklist

I have added or updated unit tests
I have added or updated integration tests (if appropriate)
I have added or updated documentation

Testing Instructions

gemini-code-assist

Hello @jakedoublev, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Gemini here, providing a summary of this pull request. This PR introduces caching for entitlement policy data (attributes and subject mappings) within the Authorization v2 service. The primary goal is to improve performance by reducing the need for the Authorization service to fetch this data from the Policy service on every decision request. A new EntitlementPolicyCache component is added which periodically fetches and stores this data in memory. The Authorization v2 service is updated to utilize this cache via a new constructor for the JustInTimePDP. A new service registration hook mechanism is implemented to ensure the cache is initialized and started only after all necessary services (like Policy) are registered and available. A configuration option is added to control the cache refresh interval.

Highlights

Entitlement Policy Caching: Introduces an in-memory cache for policy entitlement data (attributes and subject mappings) within the Authorization v2 service.
New Cache Component: Adds a new EntitlementPolicyCache component responsible for fetching and storing policy data periodically.
Authorization Service Integration: Modifies the Authorization v2 service to use the new cache when making policy decisions, replacing direct fetches for attributes and subject mappings.
Service Registration Hook: Implements a new hook mechanism (OnCompleteServiceRegistration) in the service registry and config handling to allow services to perform actions (like starting the cache) after all services are registered.
Configurable Cache Refresh: Adds a configuration option (cache_refresh_interval_seconds) to control how often the entitlement policy cache is refreshed.

Changelog

Click here to see the changelog

opentdf-dev.yaml
- Uncommented and enabled the policy service configuration.
- Added cache_refresh_interval_seconds: 60 to the policy configuration.
opentdf-example.yaml
- Added a commented-out example for cache_refresh_interval_seconds: 300 to the policy configuration.
service/authorization/v2/authorization.go
- Imported the new policy/cache package.
- Added a pdp field of type *access.JustInTimePDP to the Service struct (line 27).
- Added OnCompleteServiceRegistration function (lines 34-54) to create and start the policy cache and initialize the JustInTimePDP using the cache.
- Added the OnCompleteServiceRegistration hook to the ServiceOptions (line 65).
- Removed the creation of a new JustInTimePDP within GetEntitlements (lines 82-87 removed, now uses as.pdp on line 109).
- Removed the creation of a new JustInTimePDP within GetDecision (lines 114-118 removed, now uses as.pdp on line 139).
- Removed the creation of a new JustInTimePDP within GetDecisionMultiResource (lines 149-153 removed, now uses as.pdp on line 169).
- Changed the error return in GetDecisionMultiResource from connect.NewError(connect.CodeInternal, err) to just err (line 179).
service/internal/access/v2/just_in_time_pdp.go
- Imported the new policy/cache package.
- Added a policyCache field of type *policyCache.EntitlementPolicyCache to the JustInTimePDP struct (line 32).
- Added NewJustInTimePDPWithCachedEntitlementPolicy function (lines 76-128) which accepts a cache and uses it to fetch policy data before creating the embedded PolicyDecisionPoint.
service/pkg/config/config.go
- Added ServiceRegistrationCompleteHook type (line 18).
- Added onServiceRegistrationCompleteHooks slice to the Config struct (line 54).
- Added AddOnServiceRegistrationCompleteHook method (lines 118-121) to register hooks.
- Added RunRegistrationCompleteHooks method (lines 136-146) to execute registered hooks.
service/pkg/server/services.go
- Added a call to svc.RegisterOnCompleteServiceRegistrationHook during service registration (lines 195-197).
service/pkg/server/start.go
- Added a call to cfg.RunRegistrationCompleteHooks(ctx) after service registration is complete and before the server starts (lines 316-319).
service/policy/cache/cache.go
- Added a new file cache.go implementing the EntitlementPolicyCache struct and its methods (NewEntitlementPolicyCache, Start, Stop, periodicRefresh, Refresh, GetAttributes, GetSubjectMappings, fetchAllDefinitions, fetchAllSubjectMappings).
service/policy/config/config.go
- Added CacheRefreshIntervalSeconds field to the Config struct with a default of 300 (line 24).
- Added a global configuredRefreshInterval variable (line 14).
- Updated GetSharedPolicyConfig to read CacheRefreshIntervalSeconds and update the global configuredRefreshInterval (line 52).
- Added GetPolicyEntitlementCacheRefreshInterval function (lines 59-61) to expose the configured interval.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

A cache for policy,
Fetched from the service, you see.
Decisions are fast,
The performance will last,
From calls to the Policy, set free.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a caching mechanism for entitlement policies (attributes and subject mappings) within the Authorization v2 service. The goal is to improve performance by reducing repeated fetches of policy data. The core changes involve:

A new EntitlementPolicyCache in service/policy/cache/ for storing and periodically refreshing policy data.
Modifications to the service lifecycle and registration (service/pkg/config, service/pkg/server, service/pkg/serviceregistry) to initialize this cache after all services are registered but before the server starts handling requests.
Updates to service/authorization/v2/authorization.go and service/internal/access/v2/just_in_time_pdp.go to utilize this cached policy data.

The overall approach seems sound and addresses a potential performance bottleneck. The introduction of a lifecycle hook for post-registration, pre-start initialization is a good pattern for this kind of setup. Configuration for the cache refresh interval has also been added.

However, there are a few areas that could be improved, particularly around error propagation and potential data mutability in the cache. A significant concern is the lack of unit tests for the new caching logic, which is critical for ensuring its correctness and reliability.

Summary of Findings

Missing Unit Tests: The pull request introduces significant new functionality with the EntitlementPolicyCache and its integration. However, the checklist in the PR description indicates that unit tests have not yet been added or updated. Comprehensive unit tests are crucial for this caching logic to ensure its correctness, especially regarding concurrency, refresh behavior, and edge cases. It's highly recommended to add these tests.
Error Propagation in Authorization Service: In service/authorization/v2/authorization.go, a change in error handling for rollupMultiResourceDecision might lead to losing gRPC-specific error codes, potentially affecting client-side error interpretation.
Cache Data Mutability: The GetAttributes and GetSubjectMappings methods in service/policy/cache/cache.go return direct slices to the cached data. This could allow callers to inadvertently modify the cache's internal state. Returning copies would enhance data integrity.
Configuration Management for Cache Interval: The service/policy/config/config.go uses a package-level variable (configuredRefreshInterval) to store the cache refresh interval. This variable is updated by GetSharedPolicyConfig. While functional, this introduces a form of global state. This was not commented on directly due to review settings but is noted for awareness.

Merge Readiness

This pull request makes a valuable contribution by introducing caching for entitlement policies, which should enhance performance. The core logic and integration into the service lifecycle are well-thought-out.

However, before this PR is merged, I strongly recommend addressing the following:

Add Comprehensive Unit Tests: The new caching mechanism (EntitlementPolicyCache) and its integration points need thorough unit testing to ensure reliability and correctness under various conditions (e.g., concurrent access, refresh failures, empty policy data).
Review Error Propagation: The change in error handling in service/authorization/v2/authorization.go should be revisited to ensure gRPC error codes are consistently propagated to clients.
Consider Cache Data Immutability: The potential for callers to modify cached data via direct slice returns in service/policy/cache/cache.go should be addressed, perhaps by returning copies.

Due to the high severity concern regarding the lack of unit tests and the medium severity issues identified, I am requesting changes. Addressing these points will significantly improve the robustness and maintainability of this feature. I am not authorized to approve pull requests, so further review and approval by other maintainers will be necessary after these changes are considered.

service/authorization/v2/authorization.go

service/policy/cache/cache.go

github-actions · 2025-05-28T17:53:02Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	466.684799ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	287.863794ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	335.163614ms
Throughput	298.36 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	47.468095037s
Average Latency	471.84216ms
Throughput	105.33 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4939
Failed Requests	61
Concurrent Requests	50
Total Time	38.177281913s
Average Latency	376.906651ms
Throughput	129.37 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	61 occurrences

Standard Benchmark Metrics Skipped or Failed

github-actions · 2025-05-28T17:58:40Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	474.806668ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	281.163692ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	324.08774ms
Throughput	308.56 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	45.89274452s
Average Latency	457.06466ms
Throughput	108.95 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4948
Failed Requests	52
Concurrent Requests	50
Total Time	37.599055917s
Average Latency	371.712354ms
Throughput	131.60 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	52 occurrences

Standard Benchmark Metrics Skipped or Failed

github-actions · 2025-05-28T19:31:52Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	468.637562ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	289.213579ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	340.258643ms
Throughput	293.89 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	59.128841266s
Average Latency	589.513841ms
Throughput	84.56 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4947
Failed Requests	53
Concurrent Requests	50
Total Time	53.529524842s
Average Latency	530.194509ms
Throughput	92.42 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	53 occurrences

Standard Benchmark Metrics Skipped or Failed

github-actions · 2025-05-28T19:34:05Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	472.167216ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	276.669118ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	346.867306ms
Throughput	288.29 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	58.400501487s
Average Latency	581.743211ms
Throughput	85.62 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4952
Failed Requests	48
Concurrent Requests	50
Total Time	49.708482247s
Average Latency	491.590463ms
Throughput	99.62 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	48 occurrences

Standard Benchmark Metrics Skipped or Failed

github-actions · 2025-05-28T19:50:07Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	473.085791ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	286.506701ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	337.176117ms
Throughput	296.58 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	57.937139034s
Average Latency	577.322854ms
Throughput	86.30 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4964
Failed Requests	36
Concurrent Requests	50
Total Time	49.368893455s
Average Latency	489.856379ms
Throughput	100.55 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	36 occurrences

Standard Benchmark Metrics Skipped or Failed

…Hooks

github-actions · 2025-05-28T21:49:01Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	457.136397ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	277.459197ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	348.916258ms
Throughput	286.60 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	55.952761575s
Average Latency	556.759918ms
Throughput	89.36 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4946
Failed Requests	54
Concurrent Requests	50
Total Time	46.780561041s
Average Latency	462.029165ms
Throughput	105.73 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	54 occurrences

Standard Benchmark Metrics Skipped or Failed

github-actions · 2025-05-29T00:37:16Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	451.044132ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	273.43791ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	332.005638ms
Throughput	301.20 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	48.235077958s
Average Latency	480.010907ms
Throughput	103.66 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4938
Failed Requests	62
Concurrent Requests	50
Total Time	38.100363999s
Average Latency	375.737804ms
Throughput	129.61 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	62 occurrences

Standard Benchmark Metrics Skipped or Failed

github-actions · 2025-05-29T00:40:24Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	489.38833ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	268.680595ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	325.34703ms
Throughput	307.36 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	48.270898203s
Average Latency	480.919354ms
Throughput	103.58 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4936
Failed Requests	64
Concurrent Requests	50
Total Time	37.374699769s
Average Latency	367.922437ms
Throughput	132.07 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	64 occurrences

Standard Benchmark Metrics Skipped or Failed

jakedoublev · 2025-05-29T19:51:12Z

Moving back to draft until the go-cache is ready

jakedoublev · 2025-05-29T21:24:21Z

/gemini could any of these changes result or cause a memory leak?

gemini-code-assist · 2025-05-29T21:24:50Z

That's a great question, and it's important to consider potential resource leaks when introducing caching mechanisms.

The changes in this pull request introduce an in-memory cache for entitlement policies (attributes and subject mappings) within the policy service. This cache is designed to periodically refresh its data from the database and also refreshes whenever a policy mutation (like creating or deactivating an attribute/subject mapping) occurs.

Here's a breakdown of potential memory leak considerations:

Goroutine Leak: The cache uses a background goroutine (periodicRefresh) to handle timed refreshes. While the EntitlementPolicyCache struct has a Stop() method designed to signal this goroutine to exit and wait for its completion, I don't see a corresponding call to cache.Stop() during the application's shutdown sequence in the provided patches. If the application exits without calling Stop(), the periodicRefresh goroutine could potentially continue running until the main context is cancelled, which might not happen cleanly in all shutdown scenarios. This could lead to a goroutine leak, which includes the memory allocated for the goroutine's stack and any objects it holds references to.
Cached Data Memory: The cache stores slices of pointers to protobuf messages ([]*policy.Attribute and []*policy.SubjectMapping). The Refresh method replaces these slices entirely with fresh data from the database. In Go, when a slice is replaced, the old underlying array becomes eligible for garbage collection unless other parts of the program still hold references to it (e.g., if ListCachedAttributes returned a sub-slice that is held onto long-term by a caller). The current implementation of ListCachedAttributes and ListCachedSubjectMappings returns sub-slices, and there's a TODO comment about potentially copying the data. If callers hold onto these returned sub-slices indefinitely, the underlying arrays from previous cache refreshes could be retained, leading to increased memory usage over time. However, the primary cache storage itself is bounded by the size of the data in the database, as it replaces the full dataset on each refresh.
Frequent Refreshes: The cache refreshes on a configured interval and also on every policy mutation. If policy mutations are very frequent, this could lead to rapid and potentially large memory allocations and deallocations as the cache is constantly being rebuilt. While this isn't strictly a "leak" in the sense of unreleasable memory, it could put significant pressure on the garbage collector and lead to higher peak memory usage and potential performance degradation under heavy write loads.

Recommendation:

The most direct potential leak is the goroutine leak. To mitigate this, I recommend ensuring that the cache.Stop() method is called as part of the application's graceful shutdown process. This will signal the periodic refresh goroutine to exit cleanly.

Regarding the cached data and frequent refreshes, the current approach of refreshing the entire cache on any mutation might be inefficient for large datasets and high mutation rates. Depending on performance requirements and expected data size/mutation frequency, you might consider alternative cache invalidation strategies (e.g., invalidating specific entries on mutation rather than a full refresh) or exploring different caching libraries that offer more sophisticated memory management or eviction policies.

Overall, the core cache logic seems reasonable for bounded memory usage related to the data itself, but the goroutine shutdown needs to be handled explicitly.

github-actions · 2025-05-30T19:46:59Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	466.860007ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	274.985638ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	351.536859ms
Throughput	284.47 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	56.467346627s
Average Latency	562.246521ms
Throughput	88.55 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4952
Failed Requests	48
Concurrent Requests	50
Total Time	48.632754037s
Average Latency	482.243973ms
Throughput	101.82 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	48 occurrences

Standard Benchmark Metrics Skipped or Failed

github-actions · 2025-05-30T20:01:00Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	460.586215ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	280.627438ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	354.523438ms
Throughput	282.07 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	55.517263782s
Average Latency	552.056049ms
Throughput	90.06 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4947
Failed Requests	53
Concurrent Requests	50
Total Time	46.058992272s
Average Latency	455.930149ms
Throughput	107.41 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	53 occurrences

Standard Benchmark Metrics Skipped or Failed

github-actions · 2025-05-30T20:04:29Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	431.082536ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	282.262555ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	341.034188ms
Throughput	293.23 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	52.75078501s
Average Latency	525.354033ms
Throughput	94.79 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4945
Failed Requests	55
Concurrent Requests	50
Total Time	43.678547615s
Average Latency	432.659398ms
Throughput	113.21 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	55 occurrences

Standard Benchmark Metrics Skipped or Failed

github-actions · 2025-06-02T16:39:35Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	446.462294ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	275.610997ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	340.204808ms
Throughput	293.94 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	52.977095581s
Average Latency	527.635249ms
Throughput	94.38 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4943
Failed Requests	57
Concurrent Requests	50
Total Time	43.568290891s
Average Latency	430.578756ms
Throughput	113.45 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	57 occurrences

Standard Benchmark Metrics Skipped or Failed

github-actions · 2025-06-02T17:03:32Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	450.608567ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	275.113554ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	336.306019ms
Throughput	297.35 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	54.390681556s
Average Latency	541.286112ms
Throughput	91.93 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4943
Failed Requests	57
Concurrent Requests	50
Total Time	45.46605111s
Average Latency	449.563623ms
Throughput	108.72 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	57 occurrences

Standard Benchmark Metrics Skipped or Failed

github-actions · 2025-06-02T17:17:27Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	459.140692ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	281.209539ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	346.861038ms
Throughput	288.30 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	53.992591889s
Average Latency	538.287347ms
Throughput	92.61 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4948
Failed Requests	52
Concurrent Requests	50
Total Time	45.743480309s
Average Latency	451.649511ms
Throughput	108.17 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	52 occurrences

Standard Benchmark Metrics Skipped or Failed

github-actions · 2025-06-02T17:28:33Z

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	460.079977ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric	Value
Approved Decision Requests	5000
Denied Decision Requests	0
Total Time	288.681795ms

Bulk Benchmark Results

Metric	Value
Total Decrypts	100
Successful Decrypts	100
Failed Decrypts	0
Total Time	335.786923ms
Throughput	297.81 requests/second

TDF3 Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	5000
Failed Requests	0
Concurrent Requests	50
Total Time	53.046228291s
Average Latency	527.456774ms
Throughput	94.26 requests/second

NANOTDF Benchmark Results:

Metric	Value
Total Requests	5000
Successful Requests	4969
Failed Requests	31
Concurrent Requests	50
Total Time	43.712056477s
Average Latency	432.967303ms
Throughput	113.68 requests/second

Error Summary:

Error Message	Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error rpc error: code = Internal desc = could not perform access	31 occurrences

Standard Benchmark Metrics Skipped or Failed

Copilot

Pull Request Overview

Adds an in-memory entitlement policy cache with configurable refresh, integrates it into subject mapping and attribute services, and wires up service‐started hooks to initialize the cache.

Introduce CacheRefreshIntervalSeconds config and shared EntitlementPolicyCache
Implement caching in ListAttributes and ListSubjectMappings when state is active
Wire up an on‐services‐started hook through service registry and startup

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
service/policy/unsafe/unsafe.go	Minor formatting adjustment in `NewRegistration`
service/policy/subjectmapping/subject_mapping.go	Add cache field, on‐start hook, and caching in list method
service/policy/db/subject_mappings.go	Add `ListAllSubjectMappings` helper for cache refresh
service/policy/db/attributes.go	Restrict `ListAllAttributes` to active state
service/policy/config/config.go	Add `CacheRefreshIntervalSeconds` to config
service/policy/config/cache.go	New `EntitlementPolicyCache` implementation
service/policy/attributes/attributes.go	Wire cache into attributes service and caching logic
service/pkg/serviceregistry/serviceregistry.go	Add `OnServicesStarted` hook support
service/pkg/server/start.go	Invoke on‐services‐started hooks during startup
service/pkg/server/services.go	Register on‐services‐started hooks per service
service/pkg/config/config.go	Store and run on‐services‐started hooks
opentdf-example.yaml, opentdf-dev.yaml	Update example config to show cache option

Comments suppressed due to low confidence (1)

service/policy/subjectmapping/subject_mapping.go:138

Add unit and integration tests to cover the cache-enabled code path in ListSubjectMappings, ensuring correct pagination and error handling.

// If caching enabled, return from cache instead of DB

Copilot · 2025-06-02T18:36:33Z

service/policy/unsafe/unsafe.go

@@ -61,6 +61,7 @@ func NewRegistration(ns string, dbRegister serviceregistry.DBRegister) *servicer
 				unsafeSvc.logger = logger
 				unsafeSvc.dbClient = policydb.NewClient(srp.DBClient, logger, int32(cfg.ListRequestLimitMax), int32(cfg.ListRequestLimitDefault))
 				unsafeSvc.config = cfg
+


[nitpick] Remove the extra blank line before the return unsafeSvc, nil to keep the surrounding code block compact.

Suggested change

Copilot · 2025-06-02T18:36:34Z

service/policy/db/subject_mappings.go

+			},
+		})
+		if err != nil {
+			return nil, fmt.Errorf("failed to list all attributes: %w", err)


The error message refers to "attributes" but this method lists subject mappings; update it to "failed to list all subject mappings" for accuracy.

Suggested change

return nil, fmt.Errorf("failed to list all attributes: %w", err)

return nil, fmt.Errorf("failed to list all subject mappings: %w", err)

Copilot · 2025-06-02T18:36:34Z

service/policy/db/attributes.go

@@ -198,7 +198,7 @@ func (c PolicyDBClient) ListAllAttributes(ctx context.Context) ([]*policy.Attrib

 	for {
 		listed, err := c.ListAttributes(ctx, &attributes.ListAttributesRequest{
-			State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ANY,
+			State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ACTIVE,


The ListAllAttributes helper is intended to load all states, but filtering to active only contradicts its name. Either rename the method or include all states as originally documented.

Suggested change

State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ACTIVE,

State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ANY,

Copilot · 2025-06-02T18:36:34Z

service/policy/config/cache.go

+			BufferItems: bufferItems,
+		})
+		if err != nil {
+			panic(err)


[nitpick] Avoid using panic for initialization failures; return an error or log the failure to prevent crashing the entire service at startup.

Suggested change

panic(err)

l.ErrorContext(ctx, "Failed to initialize ristretto cache", "error", err)

initErr = err

return

Copilot · 2025-06-02T18:36:34Z

service/pkg/server/start.go

@@ -300,6 +300,11 @@ func Start(f ...StartOptions) error {
 	}
 	defer cfg.Close(ctx)

+	// Run the services started hooks
+	if err := cfg.RunServicesStartedHooks(ctx); err != nil {


The RunServicesStartedHooks call appears twice in this function; remove the earlier invocation to avoid running the hooks twice.

Copilot · 2025-06-02T18:36:35Z

service/policy/attributes/attributes.go

@@ -120,6 +135,50 @@ func (s *AttributesService) ListAttributes(ctx context.Context,
 	state := req.Msg.GetState().String()
 	s.logger.Debug("listing attribute definitions", slog.String("state", state))

+	// If active state and caching enabled, return from cache instead of DB


[nitpick] The caching logic in ListAttributes closely mirrors ListSubjectMappings; consider extracting shared pagination+cache handling into a helper to reduce duplication.

jakedoublev added 4 commits May 28, 2025 10:47

feat(policy): cache entitlement policy in authz v2

4923115

hook to run after services are started

c86d9d7

policy config

12e2c4d

consume policy cache

9096218

github-actions bot added external-contributor External Org Member comp:policy Policy Configuration ( attributes, subject mappings, resource mappings, kas registry) comp:authorization labels May 28, 2025

jakedoublev mentioned this pull request May 28, 2025

feat(policy): cache entitlement policy in auth service v2 #2326

Closed

3 tasks

github-actions bot added the size/m label May 28, 2025

gemini-code-assist bot reviewed May 28, 2025

View reviewed changes

gemini-code-assist bot suggested changes May 28, 2025

View reviewed changes

service/authorization/v2/authorization.go Outdated Show resolved Hide resolved

service/policy/cache/cache.go Outdated Show resolved Hide resolved

service/policy/cache/cache.go Outdated Show resolved Hide resolved

lint fixes

0bfd2e3

jakedoublev added 2 commits May 28, 2025 12:27

fix cache utilization in auth service and lower defaults

603fb63

lint fixes

cc99503

test: sleep to ensure roundtrip tests hit cached policy

5747db1

jakedoublev added 3 commits May 28, 2025 13:48

put back rttests

281f41d

put back authz doing any caching

a5859e1

set cache on relevant policy services and refactor to servicesStarted…

a14f9ae

…Hooks

jakedoublev added 3 commits May 28, 2025 15:49

improve log

a668dd7

working cache

9505dd0

lint fixes

3ba044c

fix limit/offset

b1ec09a

pflynn-virtru marked this pull request as ready for review May 29, 2025 19:33

pflynn-virtru requested review from a team as code owners May 29, 2025 19:33

pflynn-virtru previously approved these changes May 29, 2025

View reviewed changes

jakedoublev marked this pull request as draft May 29, 2025 19:51

jakedoublev added 4 commits May 30, 2025 12:29

Merge branch 'main' into feat/authz-v2-cache-policy

5abe1fa

rm mutation cache refreshes

ffa528c

disable caching by default

e18c04a

fixes

0103264

jakedoublev dismissed pflynn-virtru’s stale review via 0103264 May 30, 2025 19:42

jakedoublev added 3 commits May 30, 2025 12:56

ensure close is called to shut down db clients and caches

f0a56a6

lint fixes

184cfff

lint fixes

f864a33

Merge remote-tracking branch 'origin' into feat/authz-v2-cache-policy

07d6849

ristretto cache/go-cache implementation

a1f8ac6

lint fix

b7cd66c

fix shutdown panic

57873d8

jrschumacher requested a review from Copilot June 2, 2025 18:33

Copilot AI reviewed Jun 2, 2025

View reviewed changes

	return nil, fmt.Errorf("failed to list all attributes: %w", err)
	return nil, fmt.Errorf("failed to list all subject mappings: %w", err)

	State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ACTIVE,
	State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ANY,

-			panic(err)
+			l.ErrorContext(ctx, "Failed to initialize ristretto cache", "error", err)
+			initErr = err
+			return

feat(policy): policy should maintain a cache of entitlement policy #2327

Are you sure you want to change the base?

feat(policy): policy should maintain a cache of entitlement policy #2327

Uh oh!

Conversation

jakedoublev commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed Changes

Checklist

Testing Instructions

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented May 28, 2025

Benchmark authorization.GetDecisions Results:

Benchmark authorization.v2.GetMultiResourceDecision Results:

Bulk Benchmark Results

TDF3 Benchmark Results:

NANOTDF Benchmark Results:

Error Summary:

Standard Benchmark Metrics Skipped or Failed

Uh oh!

github-actions bot commented May 28, 2025

Benchmark authorization.GetDecisions Results:

Benchmark authorization.v2.GetMultiResourceDecision Results:

Bulk Benchmark Results

TDF3 Benchmark Results:

NANOTDF Benchmark Results:

Error Summary:

Standard Benchmark Metrics Skipped or Failed

Uh oh!

github-actions bot commented May 28, 2025

Benchmark authorization.GetDecisions Results:

Benchmark authorization.v2.GetMultiResourceDecision Results:

Bulk Benchmark Results

TDF3 Benchmark Results:

NANOTDF Benchmark Results:

Error Summary:

Standard Benchmark Metrics Skipped or Failed

Uh oh!

github-actions bot commented May 28, 2025

Benchmark authorization.GetDecisions Results:

Benchmark authorization.v2.GetMultiResourceDecision Results:

Bulk Benchmark Results

TDF3 Benchmark Results:

NANOTDF Benchmark Results:

Error Summary:

Standard Benchmark Metrics Skipped or Failed

Uh oh!

github-actions bot commented May 28, 2025

Benchmark authorization.GetDecisions Results:

Benchmark authorization.v2.GetMultiResourceDecision Results:

Bulk Benchmark Results

TDF3 Benchmark Results:

NANOTDF Benchmark Results:

Error Summary:

Standard Benchmark Metrics Skipped or Failed

Uh oh!

github-actions bot commented May 28, 2025

Benchmark authorization.GetDecisions Results:

Benchmark authorization.v2.GetMultiResourceDecision Results:

Bulk Benchmark Results

TDF3 Benchmark Results:

NANOTDF Benchmark Results:

Error Summary:

Standard Benchmark Metrics Skipped or Failed

jakedoublev commented May 28, 2025 •

edited

Loading