-
Notifications
You must be signed in to change notification settings - Fork 18
feat(policy): policy should maintain a cache of entitlement policy #2327
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @jakedoublev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Gemini here, providing a summary of this pull request. This PR introduces caching for entitlement policy data (attributes and subject mappings) within the Authorization v2 service. The primary goal is to improve performance by reducing the need for the Authorization service to fetch this data from the Policy service on every decision request. A new EntitlementPolicyCache
component is added which periodically fetches and stores this data in memory. The Authorization v2 service is updated to utilize this cache via a new constructor for the JustInTimePDP
. A new service registration hook mechanism is implemented to ensure the cache is initialized and started only after all necessary services (like Policy) are registered and available. A configuration option is added to control the cache refresh interval.
Highlights
- Entitlement Policy Caching: Introduces an in-memory cache for policy entitlement data (attributes and subject mappings) within the Authorization v2 service.
- New Cache Component: Adds a new
EntitlementPolicyCache
component responsible for fetching and storing policy data periodically. - Authorization Service Integration: Modifies the Authorization v2 service to use the new cache when making policy decisions, replacing direct fetches for attributes and subject mappings.
- Service Registration Hook: Implements a new hook mechanism (
OnCompleteServiceRegistration
) in the service registry and config handling to allow services to perform actions (like starting the cache) after all services are registered. - Configurable Cache Refresh: Adds a configuration option (
cache_refresh_interval_seconds
) to control how often the entitlement policy cache is refreshed.
Changelog
Click here to see the changelog
- opentdf-dev.yaml
- Uncommented and enabled the
policy
service configuration. - Added
cache_refresh_interval_seconds: 60
to the policy configuration.
- Uncommented and enabled the
- opentdf-example.yaml
- Added a commented-out example for
cache_refresh_interval_seconds: 300
to the policy configuration.
- Added a commented-out example for
- service/authorization/v2/authorization.go
- Imported the new
policy/cache
package. - Added a
pdp
field of type*access.JustInTimePDP
to theService
struct (line 27). - Added
OnCompleteServiceRegistration
function (lines 34-54) to create and start the policy cache and initialize theJustInTimePDP
using the cache. - Added the
OnCompleteServiceRegistration
hook to theServiceOptions
(line 65). - Removed the creation of a new
JustInTimePDP
withinGetEntitlements
(lines 82-87 removed, now usesas.pdp
on line 109). - Removed the creation of a new
JustInTimePDP
withinGetDecision
(lines 114-118 removed, now usesas.pdp
on line 139). - Removed the creation of a new
JustInTimePDP
withinGetDecisionMultiResource
(lines 149-153 removed, now usesas.pdp
on line 169). - Changed the error return in
GetDecisionMultiResource
fromconnect.NewError(connect.CodeInternal, err)
to justerr
(line 179).
- Imported the new
- service/internal/access/v2/just_in_time_pdp.go
- Imported the new
policy/cache
package. - Added a
policyCache
field of type*policyCache.EntitlementPolicyCache
to theJustInTimePDP
struct (line 32). - Added
NewJustInTimePDPWithCachedEntitlementPolicy
function (lines 76-128) which accepts a cache and uses it to fetch policy data before creating the embeddedPolicyDecisionPoint
.
- Imported the new
- service/pkg/config/config.go
- Added
ServiceRegistrationCompleteHook
type (line 18). - Added
onServiceRegistrationCompleteHooks
slice to theConfig
struct (line 54). - Added
AddOnServiceRegistrationCompleteHook
method (lines 118-121) to register hooks. - Added
RunRegistrationCompleteHooks
method (lines 136-146) to execute registered hooks.
- Added
- service/pkg/server/services.go
- Added a call to
svc.RegisterOnCompleteServiceRegistrationHook
during service registration (lines 195-197).
- Added a call to
- service/pkg/server/start.go
- Added a call to
cfg.RunRegistrationCompleteHooks(ctx)
after service registration is complete and before the server starts (lines 316-319).
- Added a call to
- service/policy/cache/cache.go
- Added a new file
cache.go
implementing theEntitlementPolicyCache
struct and its methods (NewEntitlementPolicyCache
,Start
,Stop
,periodicRefresh
,Refresh
,GetAttributes
,GetSubjectMappings
,fetchAllDefinitions
,fetchAllSubjectMappings
).
- Added a new file
- service/policy/config/config.go
- Added
CacheRefreshIntervalSeconds
field to theConfig
struct with a default of 300 (line 24). - Added a global
configuredRefreshInterval
variable (line 14). - Updated
GetSharedPolicyConfig
to readCacheRefreshIntervalSeconds
and update the globalconfiguredRefreshInterval
(line 52). - Added
GetPolicyEntitlementCacheRefreshInterval
function (lines 59-61) to expose the configured interval.
- Added
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
A cache for policy,
Fetched from the service, you see.
Decisions are fast,
The performance will last,
From calls to the Policy, set free.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a caching mechanism for entitlement policies (attributes and subject mappings) within the Authorization v2 service. The goal is to improve performance by reducing repeated fetches of policy data. The core changes involve:
- A new
EntitlementPolicyCache
inservice/policy/cache/
for storing and periodically refreshing policy data. - Modifications to the service lifecycle and registration (
service/pkg/config
,service/pkg/server
,service/pkg/serviceregistry
) to initialize this cache after all services are registered but before the server starts handling requests. - Updates to
service/authorization/v2/authorization.go
andservice/internal/access/v2/just_in_time_pdp.go
to utilize this cached policy data.
The overall approach seems sound and addresses a potential performance bottleneck. The introduction of a lifecycle hook for post-registration, pre-start initialization is a good pattern for this kind of setup. Configuration for the cache refresh interval has also been added.
However, there are a few areas that could be improved, particularly around error propagation and potential data mutability in the cache. A significant concern is the lack of unit tests for the new caching logic, which is critical for ensuring its correctness and reliability.
Summary of Findings
- Missing Unit Tests: The pull request introduces significant new functionality with the
EntitlementPolicyCache
and its integration. However, the checklist in the PR description indicates that unit tests have not yet been added or updated. Comprehensive unit tests are crucial for this caching logic to ensure its correctness, especially regarding concurrency, refresh behavior, and edge cases. It's highly recommended to add these tests. - Error Propagation in Authorization Service: In
service/authorization/v2/authorization.go
, a change in error handling forrollupMultiResourceDecision
might lead to losing gRPC-specific error codes, potentially affecting client-side error interpretation. - Cache Data Mutability: The
GetAttributes
andGetSubjectMappings
methods inservice/policy/cache/cache.go
return direct slices to the cached data. This could allow callers to inadvertently modify the cache's internal state. Returning copies would enhance data integrity. - Configuration Management for Cache Interval: The
service/policy/config/config.go
uses a package-level variable (configuredRefreshInterval
) to store the cache refresh interval. This variable is updated byGetSharedPolicyConfig
. While functional, this introduces a form of global state. This was not commented on directly due to review settings but is noted for awareness.
Merge Readiness
This pull request makes a valuable contribution by introducing caching for entitlement policies, which should enhance performance. The core logic and integration into the service lifecycle are well-thought-out.
However, before this PR is merged, I strongly recommend addressing the following:
- Add Comprehensive Unit Tests: The new caching mechanism (
EntitlementPolicyCache
) and its integration points need thorough unit testing to ensure reliability and correctness under various conditions (e.g., concurrent access, refresh failures, empty policy data). - Review Error Propagation: The change in error handling in
service/authorization/v2/authorization.go
should be revisited to ensure gRPC error codes are consistently propagated to clients. - Consider Cache Data Immutability: The potential for callers to modify cached data via direct slice returns in
service/policy/cache/cache.go
should be addressed, perhaps by returning copies.
Due to the high
severity concern regarding the lack of unit tests and the medium
severity issues identified, I am requesting changes. Addressing these points will significantly improve the robustness and maintainability of this feature. I am not authorized to approve pull requests, so further review and approval by other maintainers will be necessary after these changes are considered.
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Moving back to draft until the |
/gemini could any of these changes result or cause a memory leak? |
That's a great question, and it's important to consider potential resource leaks when introducing caching mechanisms. The changes in this pull request introduce an in-memory cache for entitlement policies (attributes and subject mappings) within the policy service. This cache is designed to periodically refresh its data from the database and also refreshes whenever a policy mutation (like creating or deactivating an attribute/subject mapping) occurs. Here's a breakdown of potential memory leak considerations:
Recommendation: The most direct potential leak is the goroutine leak. To mitigate this, I recommend ensuring that the Regarding the cached data and frequent refreshes, the current approach of refreshing the entire cache on any mutation might be inefficient for large datasets and high mutation rates. Depending on performance requirements and expected data size/mutation frequency, you might consider alternative cache invalidation strategies (e.g., invalidating specific entries on mutation rather than a full refresh) or exploring different caching libraries that offer more sophisticated memory management or eviction policies. Overall, the core cache logic seems reasonable for bounded memory usage related to the data itself, but the goroutine shutdown needs to be handled explicitly. |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
Benchmark results, click to expandBenchmark authorization.GetDecisions Results:
Benchmark authorization.v2.GetMultiResourceDecision Results:
Bulk Benchmark Results
TDF3 Benchmark Results:
NANOTDF Benchmark Results:
Error Summary:
Standard Benchmark Metrics Skipped or Failed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds an in-memory entitlement policy cache with configurable refresh, integrates it into subject mapping and attribute services, and wires up service‐started hooks to initialize the cache.
- Introduce
CacheRefreshIntervalSeconds
config and sharedEntitlementPolicyCache
- Implement caching in
ListAttributes
andListSubjectMappings
when state is active - Wire up an on‐services‐started hook through service registry and startup
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 6 comments.
Show a summary per file
File | Description |
---|---|
service/policy/unsafe/unsafe.go | Minor formatting adjustment in NewRegistration |
service/policy/subjectmapping/subject_mapping.go | Add cache field, on‐start hook, and caching in list method |
service/policy/db/subject_mappings.go | Add ListAllSubjectMappings helper for cache refresh |
service/policy/db/attributes.go | Restrict ListAllAttributes to active state |
service/policy/config/config.go | Add CacheRefreshIntervalSeconds to config |
service/policy/config/cache.go | New EntitlementPolicyCache implementation |
service/policy/attributes/attributes.go | Wire cache into attributes service and caching logic |
service/pkg/serviceregistry/serviceregistry.go | Add OnServicesStarted hook support |
service/pkg/server/start.go | Invoke on‐services‐started hooks during startup |
service/pkg/server/services.go | Register on‐services‐started hooks per service |
service/pkg/config/config.go | Store and run on‐services‐started hooks |
opentdf-example.yaml, opentdf-dev.yaml | Update example config to show cache option |
Comments suppressed due to low confidence (1)
service/policy/subjectmapping/subject_mapping.go:138
- Add unit and integration tests to cover the cache-enabled code path in
ListSubjectMappings
, ensuring correct pagination and error handling.
// If caching enabled, return from cache instead of DB
@@ -61,6 +61,7 @@ func NewRegistration(ns string, dbRegister serviceregistry.DBRegister) *servicer | |||
unsafeSvc.logger = logger | |||
unsafeSvc.dbClient = policydb.NewClient(srp.DBClient, logger, int32(cfg.ListRequestLimitMax), int32(cfg.ListRequestLimitDefault)) | |||
unsafeSvc.config = cfg | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Remove the extra blank line before the return unsafeSvc, nil
to keep the surrounding code block compact.
Copilot uses AI. Check for mistakes.
}, | ||
}) | ||
if err != nil { | ||
return nil, fmt.Errorf("failed to list all attributes: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message refers to "attributes" but this method lists subject mappings; update it to "failed to list all subject mappings" for accuracy.
return nil, fmt.Errorf("failed to list all attributes: %w", err) | |
return nil, fmt.Errorf("failed to list all subject mappings: %w", err) |
Copilot uses AI. Check for mistakes.
@@ -198,7 +198,7 @@ func (c PolicyDBClient) ListAllAttributes(ctx context.Context) ([]*policy.Attrib | |||
|
|||
for { | |||
listed, err := c.ListAttributes(ctx, &attributes.ListAttributesRequest{ | |||
State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ANY, | |||
State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ACTIVE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ListAllAttributes
helper is intended to load all states, but filtering to active only contradicts its name. Either rename the method or include all states as originally documented.
State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ACTIVE, | |
State: common.ActiveStateEnum_ACTIVE_STATE_ENUM_ANY, |
Copilot uses AI. Check for mistakes.
BufferItems: bufferItems, | ||
}) | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Avoid using panic
for initialization failures; return an error or log the failure to prevent crashing the entire service at startup.
panic(err) | |
l.ErrorContext(ctx, "Failed to initialize ristretto cache", "error", err) | |
initErr = err | |
return |
Copilot uses AI. Check for mistakes.
@@ -300,6 +300,11 @@ func Start(f ...StartOptions) error { | |||
} | |||
defer cfg.Close(ctx) | |||
|
|||
// Run the services started hooks | |||
if err := cfg.RunServicesStartedHooks(ctx); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RunServicesStartedHooks
call appears twice in this function; remove the earlier invocation to avoid running the hooks twice.
Copilot uses AI. Check for mistakes.
@@ -120,6 +135,50 @@ func (s *AttributesService) ListAttributes(ctx context.Context, | |||
state := req.Msg.GetState().String() | |||
s.logger.Debug("listing attribute definitions", slog.String("state", state)) | |||
|
|||
// If active state and caching enabled, return from cache instead of DB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The caching logic in ListAttributes
closely mirrors ListSubjectMappings
; consider extracting shared pagination+cache handling into a helper to reduce duplication.
Copilot uses AI. Check for mistakes.
Proposed Changes
Checklist
Testing Instructions