Skip to content

feat(policy): policy should maintain a cache of entitlement policy #2327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 28 commits into
base: main
Choose a base branch
from

Conversation

jakedoublev
Copy link
Contributor

@jakedoublev jakedoublev commented May 28, 2025

Proposed Changes

  • configured refresh interval (with option to set to 0 to bypass caching)
  • go-cache/ristretto direct implementation POC instead of waiting for shared cache work
  • returns from cache in listattributes/listsubjectmappings if state is active and cache is enabled

Checklist

  • I have added or updated unit tests
  • I have added or updated integration tests (if appropriate)
  • I have added or updated documentation

Testing Instructions

@github-actions github-actions bot added external-contributor External Org Member comp:policy Policy Configuration ( attributes, subject mappings, resource mappings, kas registry) comp:authorization labels May 28, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @jakedoublev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Gemini here, providing a summary of this pull request. This PR introduces caching for entitlement policy data (attributes and subject mappings) within the Authorization v2 service. The primary goal is to improve performance by reducing the need for the Authorization service to fetch this data from the Policy service on every decision request. A new EntitlementPolicyCache component is added which periodically fetches and stores this data in memory. The Authorization v2 service is updated to utilize this cache via a new constructor for the JustInTimePDP. A new service registration hook mechanism is implemented to ensure the cache is initialized and started only after all necessary services (like Policy) are registered and available. A configuration option is added to control the cache refresh interval.

Highlights

  • Entitlement Policy Caching: Introduces an in-memory cache for policy entitlement data (attributes and subject mappings) within the Authorization v2 service.
  • New Cache Component: Adds a new EntitlementPolicyCache component responsible for fetching and storing policy data periodically.
  • Authorization Service Integration: Modifies the Authorization v2 service to use the new cache when making policy decisions, replacing direct fetches for attributes and subject mappings.
  • Service Registration Hook: Implements a new hook mechanism (OnCompleteServiceRegistration) in the service registry and config handling to allow services to perform actions (like starting the cache) after all services are registered.
  • Configurable Cache Refresh: Adds a configuration option (cache_refresh_interval_seconds) to control how often the entitlement policy cache is refreshed.

Changelog

Click here to see the changelog
  • opentdf-dev.yaml
    • Uncommented and enabled the policy service configuration.
    • Added cache_refresh_interval_seconds: 60 to the policy configuration.
  • opentdf-example.yaml
    • Added a commented-out example for cache_refresh_interval_seconds: 300 to the policy configuration.
  • service/authorization/v2/authorization.go
    • Imported the new policy/cache package.
    • Added a pdp field of type *access.JustInTimePDP to the Service struct (line 27).
    • Added OnCompleteServiceRegistration function (lines 34-54) to create and start the policy cache and initialize the JustInTimePDP using the cache.
    • Added the OnCompleteServiceRegistration hook to the ServiceOptions (line 65).
    • Removed the creation of a new JustInTimePDP within GetEntitlements (lines 82-87 removed, now uses as.pdp on line 109).
    • Removed the creation of a new JustInTimePDP within GetDecision (lines 114-118 removed, now uses as.pdp on line 139).
    • Removed the creation of a new JustInTimePDP within GetDecisionMultiResource (lines 149-153 removed, now uses as.pdp on line 169).
    • Changed the error return in GetDecisionMultiResource from connect.NewError(connect.CodeInternal, err) to just err (line 179).
  • service/internal/access/v2/just_in_time_pdp.go
    • Imported the new policy/cache package.
    • Added a policyCache field of type *policyCache.EntitlementPolicyCache to the JustInTimePDP struct (line 32).
    • Added NewJustInTimePDPWithCachedEntitlementPolicy function (lines 76-128) which accepts a cache and uses it to fetch policy data before creating the embedded PolicyDecisionPoint.
  • service/pkg/config/config.go
    • Added ServiceRegistrationCompleteHook type (line 18).
    • Added onServiceRegistrationCompleteHooks slice to the Config struct (line 54).
    • Added AddOnServiceRegistrationCompleteHook method (lines 118-121) to register hooks.
    • Added RunRegistrationCompleteHooks method (lines 136-146) to execute registered hooks.
  • service/pkg/server/services.go
    • Added a call to svc.RegisterOnCompleteServiceRegistrationHook during service registration (lines 195-197).
  • service/pkg/server/start.go
    • Added a call to cfg.RunRegistrationCompleteHooks(ctx) after service registration is complete and before the server starts (lines 316-319).
  • service/policy/cache/cache.go
    • Added a new file cache.go implementing the EntitlementPolicyCache struct and its methods (NewEntitlementPolicyCache, Start, Stop, periodicRefresh, Refresh, GetAttributes, GetSubjectMappings, fetchAllDefinitions, fetchAllSubjectMappings).
  • service/policy/config/config.go
    • Added CacheRefreshIntervalSeconds field to the Config struct with a default of 300 (line 24).
    • Added a global configuredRefreshInterval variable (line 14).
    • Updated GetSharedPolicyConfig to read CacheRefreshIntervalSeconds and update the global configuredRefreshInterval (line 52).
    • Added GetPolicyEntitlementCacheRefreshInterval function (lines 59-61) to expose the configured interval.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A cache for policy,
Fetched from the service, you see.
Decisions are fast,
The performance will last,
From calls to the Policy, set free.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a caching mechanism for entitlement policies (attributes and subject mappings) within the Authorization v2 service. The goal is to improve performance by reducing repeated fetches of policy data. The core changes involve:

  • A new EntitlementPolicyCache in service/policy/cache/ for storing and periodically refreshing policy data.
  • Modifications to the service lifecycle and registration (service/pkg/config, service/pkg/server, service/pkg/serviceregistry) to initialize this cache after all services are registered but before the server starts handling requests.
  • Updates to service/authorization/v2/authorization.go and service/internal/access/v2/just_in_time_pdp.go to utilize this cached policy data.

The overall approach seems sound and addresses a potential performance bottleneck. The introduction of a lifecycle hook for post-registration, pre-start initialization is a good pattern for this kind of setup. Configuration for the cache refresh interval has also been added.

However, there are a few areas that could be improved, particularly around error propagation and potential data mutability in the cache. A significant concern is the lack of unit tests for the new caching logic, which is critical for ensuring its correctness and reliability.

Summary of Findings

  • Missing Unit Tests: The pull request introduces significant new functionality with the EntitlementPolicyCache and its integration. However, the checklist in the PR description indicates that unit tests have not yet been added or updated. Comprehensive unit tests are crucial for this caching logic to ensure its correctness, especially regarding concurrency, refresh behavior, and edge cases. It's highly recommended to add these tests.
  • Error Propagation in Authorization Service: In service/authorization/v2/authorization.go, a change in error handling for rollupMultiResourceDecision might lead to losing gRPC-specific error codes, potentially affecting client-side error interpretation.
  • Cache Data Mutability: The GetAttributes and GetSubjectMappings methods in service/policy/cache/cache.go return direct slices to the cached data. This could allow callers to inadvertently modify the cache's internal state. Returning copies would enhance data integrity.
  • Configuration Management for Cache Interval: The service/policy/config/config.go uses a package-level variable (configuredRefreshInterval) to store the cache refresh interval. This variable is updated by GetSharedPolicyConfig. While functional, this introduces a form of global state. This was not commented on directly due to review settings but is noted for awareness.

Merge Readiness

This pull request makes a valuable contribution by introducing caching for entitlement policies, which should enhance performance. The core logic and integration into the service lifecycle are well-thought-out.

However, before this PR is merged, I strongly recommend addressing the following:

  1. Add Comprehensive Unit Tests: The new caching mechanism (EntitlementPolicyCache) and its integration points need thorough unit testing to ensure reliability and correctness under various conditions (e.g., concurrent access, refresh failures, empty policy data).
  2. Review Error Propagation: The change in error handling in service/authorization/v2/authorization.go should be revisited to ensure gRPC error codes are consistently propagated to clients.
  3. Consider Cache Data Immutability: The potential for callers to modify cached data via direct slice returns in service/policy/cache/cache.go should be addressed, perhaps by returning copies.

Due to the high severity concern regarding the lack of unit tests and the medium severity issues identified, I am requesting changes. Addressing these points will significantly improve the robustness and maintainability of this feature. I am not authorized to approve pull requests, so further review and approval by other maintainers will be necessary after these changes are considered.

Copy link
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 466.684799ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 287.863794ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 335.163614ms
Throughput 298.36 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 47.468095037s
Average Latency 471.84216ms
Throughput 105.33 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4939
Failed Requests 61
Concurrent Requests 50
Total Time 38.177281913s
Average Latency 376.906651ms
Throughput 129.37 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
61 occurrences

Standard Benchmark Metrics Skipped or Failed

Copy link
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 474.806668ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 281.163692ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 324.08774ms
Throughput 308.56 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 45.89274452s
Average Latency 457.06466ms
Throughput 108.95 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4948
Failed Requests 52
Concurrent Requests 50
Total Time 37.599055917s
Average Latency 371.712354ms
Throughput 131.60 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
52 occurrences

Standard Benchmark Metrics Skipped or Failed

Copy link
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 468.637562ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 289.213579ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 340.258643ms
Throughput 293.89 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 59.128841266s
Average Latency 589.513841ms
Throughput 84.56 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4947
Failed Requests 53
Concurrent Requests 50
Total Time 53.529524842s
Average Latency 530.194509ms
Throughput 92.42 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
53 occurrences

Standard Benchmark Metrics Skipped or Failed

Copy link
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 472.167216ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 276.669118ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 346.867306ms
Throughput 288.29 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 58.400501487s
Average Latency 581.743211ms
Throughput 85.62 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4952
Failed Requests 48
Concurrent Requests 50
Total Time 49.708482247s
Average Latency 491.590463ms
Throughput 99.62 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
48 occurrences

Standard Benchmark Metrics Skipped or Failed

Copy link
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 473.085791ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 286.506701ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 337.176117ms
Throughput 296.58 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 57.937139034s
Average Latency 577.322854ms
Throughput 86.30 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4964
Failed Requests 36
Concurrent Requests 50
Total Time 49.368893455s
Average Latency 489.856379ms
Throughput 100.55 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
36 occurrences

Standard Benchmark Metrics Skipped or Failed

Copy link
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 457.136397ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 277.459197ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 348.916258ms
Throughput 286.60 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 55.952761575s
Average Latency 556.759918ms
Throughput 89.36 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4946
Failed Requests 54
Concurrent Requests 50
Total Time 46.780561041s
Average Latency 462.029165ms
Throughput 105.73 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
54 occurrences

Standard Benchmark Metrics Skipped or Failed

Copy link
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 451.044132ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 273.43791ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 332.005638ms
Throughput 301.20 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 48.235077958s
Average Latency 480.010907ms
Throughput 103.66 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4938
Failed Requests 62
Concurrent Requests 50
Total Time 38.100363999s
Average Latency 375.737804ms
Throughput 129.61 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
62 occurrences

Standard Benchmark Metrics Skipped or Failed

Copy link
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 489.38833ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 268.680595ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 325.34703ms
Throughput 307.36 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 48.270898203s
Average Latency 480.919354ms
Throughput 103.58 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4936
Failed Requests 64
Concurrent Requests 50
Total Time 37.374699769s
Average Latency 367.922437ms
Throughput 132.07 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
64 occurrences

Standard Benchmark Metrics Skipped or Failed

@pflynn-virtru pflynn-virtru marked this pull request as ready for review May 29, 2025 19:33
@pflynn-virtru pflynn-virtru requested review from a team as code owners May 29, 2025 19:33
pflynn-virtru
pflynn-virtru previously approved these changes May 29, 2025
@jakedoublev
Copy link
Contributor Author

Moving back to draft until the go-cache is ready

@jakedoublev jakedoublev marked this pull request as draft May 29, 2025 19:51
@jakedoublev
Copy link
Contributor Author

/gemini could any of these changes result or cause a memory leak?

Copy link
Contributor

That's a great question, and it's important to consider potential resource leaks when introducing caching mechanisms.

The changes in this pull request introduce an in-memory cache for entitlement policies (attributes and subject mappings) within the policy service. This cache is designed to periodically refresh its data from the database and also refreshes whenever a policy mutation (like creating or deactivating an attribute/subject mapping) occurs.

Here's a breakdown of potential memory leak considerations:

  1. Goroutine Leak: The cache uses a background goroutine (periodicRefresh) to handle timed refreshes. While the EntitlementPolicyCache struct has a Stop() method designed to signal this goroutine to exit and wait for its completion, I don't see a corresponding call to cache.Stop() during the application's shutdown sequence in the provided patches. If the application exits without calling Stop(), the periodicRefresh goroutine could potentially continue running until the main context is cancelled, which might not happen cleanly in all shutdown scenarios. This could lead to a goroutine leak, which includes the memory allocated for the goroutine's stack and any objects it holds references to.

  2. Cached Data Memory: The cache stores slices of pointers to protobuf messages ([]*policy.Attribute and []*policy.SubjectMapping). The Refresh method replaces these slices entirely with fresh data from the database. In Go, when a slice is replaced, the old underlying array becomes eligible for garbage collection unless other parts of the program still hold references to it (e.g., if ListCachedAttributes returned a sub-slice that is held onto long-term by a caller). The current implementation of ListCachedAttributes and ListCachedSubjectMappings returns sub-slices, and there's a TODO comment about potentially copying the data. If callers hold onto these returned sub-slices indefinitely, the underlying arrays from previous cache refreshes could be retained, leading to increased memory usage over time. However, the primary cache storage itself is bounded by the size of the data in the database, as it replaces the full dataset on each refresh.

  3. Frequent Refreshes: The cache refreshes on a configured interval and also on every policy mutation. If policy mutations are very frequent, this could lead to rapid and potentially large memory allocations and deallocations as the cache is constantly being rebuilt. While this isn't strictly a "leak" in the sense of unreleasable memory, it could put significant pressure on the garbage collector and lead to higher peak memory usage and potential performance degradation under heavy write loads.

Recommendation:

The most direct potential leak is the goroutine leak. To mitigate this, I recommend ensuring that the cache.Stop() method is called as part of the application's graceful shutdown process. This will signal the periodic refresh goroutine to exit cleanly.

Regarding the cached data and frequent refreshes, the current approach of refreshing the entire cache on any mutation might be inefficient for large datasets and high mutation rates. Depending on performance requirements and expected data size/mutation frequency, you might consider alternative cache invalidation strategies (e.g., invalidating specific entries on mutation rather than a full refresh) or exploring different caching libraries that offer more sophisticated memory management or eviction policies.

Overall, the core cache logic seems reasonable for bounded memory usage related to the data itself, but the goroutine shutdown needs to be handled explicitly.

Copy link
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 466.860007ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 274.985638ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 351.536859ms
Throughput 284.47 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 56.467346627s
Average Latency 562.246521ms
Throughput 88.55 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4952
Failed Requests 48
Concurrent Requests 50
Total Time 48.632754037s
Average Latency 482.243973ms
Throughput 101.82 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
48 occurrences

Standard Benchmark Metrics Skipped or Failed

Copy link
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 460.586215ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 280.627438ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 354.523438ms
Throughput 282.07 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 55.517263782s
Average Latency 552.056049ms
Throughput 90.06 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4947
Failed Requests 53
Concurrent Requests 50
Total Time 46.058992272s
Average Latency 455.930149ms
Throughput 107.41 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
53 occurrences

Standard Benchmark Metrics Skipped or Failed

Copy link
Contributor

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 431.082536ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 282.262555ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 341.034188ms
Throughput 293.23 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 52.75078501s
Average Latency 525.354033ms
Throughput 94.79 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4945
Failed Requests 55
Concurrent Requests 50
Total Time 43.678547615s
Average Latency 432.659398ms
Throughput 113.21 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
55 occurrences

Standard Benchmark Metrics Skipped or Failed

Copy link
Contributor

github-actions bot commented Jun 2, 2025

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 446.462294ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 275.610997ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 340.204808ms
Throughput 293.94 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 52.977095581s
Average Latency 527.635249ms
Throughput 94.38 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4943
Failed Requests 57
Concurrent Requests 50
Total Time 43.568290891s
Average Latency 430.578756ms
Throughput 113.45 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
57 occurrences

Standard Benchmark Metrics Skipped or Failed

Copy link
Contributor

github-actions bot commented Jun 2, 2025

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 450.608567ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 275.113554ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 336.306019ms
Throughput 297.35 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 54.390681556s
Average Latency 541.286112ms
Throughput 91.93 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4943
Failed Requests 57
Concurrent Requests 50
Total Time 45.46605111s
Average Latency 449.563623ms
Throughput 108.72 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
57 occurrences

Standard Benchmark Metrics Skipped or Failed

Copy link
Contributor

github-actions bot commented Jun 2, 2025

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 459.140692ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 281.209539ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 346.861038ms
Throughput 288.30 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 53.992591889s
Average Latency 538.287347ms
Throughput 92.61 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4948
Failed Requests 52
Concurrent Requests 50
Total Time 45.743480309s
Average Latency 451.649511ms
Throughput 108.17 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
52 occurrences

Standard Benchmark Metrics Skipped or Failed

Copy link
Contributor

github-actions bot commented Jun 2, 2025

Benchmark results, click to expand

Benchmark authorization.GetDecisions Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 460.079977ms

Benchmark authorization.v2.GetMultiResourceDecision Results:

Metric Value
Approved Decision Requests 5000
Denied Decision Requests 0
Total Time 288.681795ms

Bulk Benchmark Results

Metric Value
Total Decrypts 100
Successful Decrypts 100
Failed Decrypts 0
Total Time 335.786923ms
Throughput 297.81 requests/second

TDF3 Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 5000
Failed Requests 0
Concurrent Requests 50
Total Time 53.046228291s
Average Latency 527.456774ms
Throughput 94.26 requests/second

NANOTDF Benchmark Results:

Metric Value
Total Requests 5000
Successful Requests 4969
Failed Requests 31
Concurrent Requests 50
Total Time 43.712056477s
Average Latency 432.967303ms
Throughput 113.68 requests/second

Error Summary:

Error Message Occurrences
ReadNanoTDF error: getNanoRewrapKey: rewrapError: internal: internal error
rpc error: code = Internal desc = could not perform access
31 occurrences

Standard Benchmark Metrics Skipped or Failed

@jrschumacher jrschumacher requested a review from Copilot June 2, 2025 18:33
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds an in-memory entitlement policy cache with configurable refresh, integrates it into subject mapping and attribute services, and wires up service‐started hooks to initialize the cache.

  • Introduce CacheRefreshIntervalSeconds config and shared EntitlementPolicyCache
  • Implement caching in ListAttributes and ListSubjectMappings when state is active
  • Wire up an on‐services‐started hook through service registry and startup

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
service/policy/unsafe/unsafe.go Minor formatting adjustment in NewRegistration
service/policy/subjectmapping/subject_mapping.go Add cache field, on‐start hook, and caching in list method
service/policy/db/subject_mappings.go Add ListAllSubjectMappings helper for cache refresh
service/policy/db/attributes.go Restrict ListAllAttributes to active state
service/policy/config/config.go Add CacheRefreshIntervalSeconds to config
service/policy/config/cache.go New EntitlementPolicyCache implementation
service/policy/attributes/attributes.go Wire cache into attributes service and caching logic
service/pkg/serviceregistry/serviceregistry.go Add OnServicesStarted hook support
service/pkg/server/start.go Invoke on‐services‐started hooks during startup
service/pkg/server/services.go Register on‐services‐started hooks per service
service/pkg/config/config.go Store and run on‐services‐started hooks
opentdf-example.yaml, opentdf-dev.yaml Update example config to show cache option
Comments suppressed due to low confidence (1)

service/policy/subjectmapping/subject_mapping.go:138

  • Add unit and integration tests to cover the cache-enabled code path in ListSubjectMappings, ensuring correct pagination and error handling.
// If caching enabled, return from cache instead of DB

@@ -61,6 +61,7 @@ func NewRegistration(ns string, dbRegister serviceregistry.DBRegister) *servicer
unsafeSvc.logger = logger
unsafeSvc.dbClient = policydb.NewClient(srp.DBClient, logger, int32(cfg.ListRequestLimitMax), int32(cfg.ListRequestLimitDefault))
unsafeSvc.config = cfg

Copy link
Preview

Copilot AI Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Remove the extra blank line before the return unsafeSvc, nil to keep the surrounding code block compact.

Suggested change

Copilot uses AI. Check for mistakes.

},
})
if err != nil {
return nil, fmt.Errorf("failed to list all attributes: %w", err)
Copy link
Preview

Copilot AI Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message refers to "attributes" but this method lists subject mappings; update it to "failed to list all subject mappings" for accuracy.

Suggested change
return nil, fmt.Errorf("failed to list all attributes: %w", err)
return nil, fmt.Errorf("failed to list all subject mappings: %w", err)

Copilot uses AI. Check for mistakes.

@@ -198,7 +198,7 @@ func (c PolicyDBClient) ListAllAttributes(ctx context.Context) ([]*policy.Attrib

for {
listed, err := c.ListAttributes(ctx, &attributes.ListAttributesRequest{
State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ANY,
State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ACTIVE,
Copy link
Preview

Copilot AI Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ListAllAttributes helper is intended to load all states, but filtering to active only contradicts its name. Either rename the method or include all states as originally documented.

Suggested change
State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ACTIVE,
State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ANY,

Copilot uses AI. Check for mistakes.

BufferItems: bufferItems,
})
if err != nil {
panic(err)
Copy link
Preview

Copilot AI Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Avoid using panic for initialization failures; return an error or log the failure to prevent crashing the entire service at startup.

Suggested change
panic(err)
l.ErrorContext(ctx, "Failed to initialize ristretto cache", "error", err)
initErr = err
return

Copilot uses AI. Check for mistakes.

@@ -300,6 +300,11 @@ func Start(f ...StartOptions) error {
}
defer cfg.Close(ctx)

// Run the services started hooks
if err := cfg.RunServicesStartedHooks(ctx); err != nil {
Copy link
Preview

Copilot AI Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RunServicesStartedHooks call appears twice in this function; remove the earlier invocation to avoid running the hooks twice.

Copilot uses AI. Check for mistakes.

@@ -120,6 +135,50 @@ func (s *AttributesService) ListAttributes(ctx context.Context,
state := req.Msg.GetState().String()
s.logger.Debug("listing attribute definitions", slog.String("state", state))

// If active state and caching enabled, return from cache instead of DB
Copy link
Preview

Copilot AI Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The caching logic in ListAttributes closely mirrors ListSubjectMappings; consider extracting shared pagination+cache handling into a helper to reduce duplication.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:authorization comp:policy Policy Configuration ( attributes, subject mappings, resource mappings, kas registry) external-contributor External Org Member size/m
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants