Skip to content

Feat/scan url cache #5625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Apr 22, 2025
Merged

Feat/scan url cache #5625

merged 18 commits into from
Apr 22, 2025

Conversation

AugmentedMode
Copy link
Contributor

@AugmentedMode AugmentedMode commented Apr 10, 2025

Explanation

The PhishingController's scanUrl method currently makes an API call for every URL scan request, even when we've recently scanned the same URL.

To address this, I've added a caching layer that:

  1. Stores scan results in a persistent cache
  2. Checks the cache before making API calls
  3. Automatically expires cached entries after a configurable TTL (default: 5 minutes)
  4. Manages cache size with an LRU-based eviction policy when it exceeds the maximum size (default: 100 entries)

This should reduce redundant API calls while maintaining security by respecting TTL for cached entries.

References

Checklist

  • I've updated the test suite for new or updated code as appropriate
  • I've updated documentation (JSDoc, Markdown, etc.) for new or updated code as appropriate
  • I've communicated my changes to consumers by updating changelogs for packages I've changed, highlighting breaking changes as necessary
  • I've prepared draft pull requests for clients and consumer packages to resolve any breaking changes

Sorry, something went wrong.

@AugmentedMode AugmentedMode requested a review from a team as a code owner April 10, 2025 04:07
@AugmentedMode AugmentedMode self-assigned this Apr 10, 2025
@@ -233,24 +235,46 @@ export type PhishingControllerState = {
hotlistLastFetched: number;
stalelistLastFetched: number;
c2DomainBlocklistLastFetched: number;
urlScanCache: Record<string, UrlScanCacheEntry>;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be put into a class that is defined in a separate file.

All 5 of your functions (setUrlScanCacheTTL, setUrlScanCacheMaxSize, clearUrlScanCache, getFromUrlScanCache, addToUrlScanCache) should really just be methods of the cache class. The PhishingController should not be responsible for caching logic, which is why I think it makes sense to abstract this. It also organizes the files in a significant way which makes understanding the implementation a lot easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

@@ -304,6 +328,10 @@ export class PhishingController extends BaseController<

#c2DomainBlocklistRefreshInterval: number;

#urlScanCacheTTL: number;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same benefit here as described below - I think you can remove params like this from the Phishing Controller (and should not be its responsibility to be initialized with it) so that these are scoped to the cache itself

* @param config.messenger - The controller restricted messenger.
* @param config.state - Initial state to set on this controller.
*/
constructor({
stalelistRefreshInterval = STALELIST_REFRESH_INTERVAL,
hotlistRefreshInterval = HOTLIST_REFRESH_INTERVAL,
c2DomainBlocklistRefreshInterval = C2_DOMAIN_BLOCKLIST_REFRESH_INTERVAL,
urlScanCacheTTL = DEFAULT_URL_SCAN_CACHE_TTL,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've already made my point, but I think this is unnecessary to define the cache values within the constructor here. The Cache is part of the PhishingControllerState so it feels the PhishingController should be in charge of setting those values, rather than letting them be defined within initialization.

Copy link
Contributor

@imblue-dabadee imblue-dabadee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can agree with @mindofmar's suggestions re: cache. I would add that two improvements for the caching logic could include:

  1. tracking whether a url is in-flight and have that request await the promise rather than send out another request
  2. having a more efficient add/remove pattern for the cache e.g. LRU or the cache having two states (the map + the array) so that clients are not constantly reprocessing this array.

@AugmentedMode AugmentedMode requested a review from a team as a code owner April 22, 2025 04:37
imblue-dabadee
imblue-dabadee previously approved these changes Apr 22, 2025
@AugmentedMode AugmentedMode enabled auto-merge (squash) April 22, 2025 14:33
@AugmentedMode AugmentedMode merged commit d91e7b0 into main Apr 22, 2025
202 checks passed
@AugmentedMode AugmentedMode deleted the feat/scan-url-cache branch April 22, 2025 14:46
AugmentedMode added a commit that referenced this pull request Apr 22, 2025
#5688)

### Explanation

The bulk URL scanning functionality in PhishingController previously
didn't leverage the URL scan cache that was already implemented for
single URL scanning. This meant that even if a URL had been recently
scanned, it would be scanned again when included in a bulk scan request,
causing unnecessary API calls and increased response times.

This PR extends the caching functionality to the `bulkScanUrls` method,
allowing it to:
1. Check the cache for each URL before sending API requests
2. Only scan URLs that aren't already in the cache
3. Add newly scanned results to the cache for future use
4. Return a combined response of both cached and newly scanned results

This optimization significantly reduces API calls and improves response
times for bulk scan operations, especially when the same URLs are
frequently scanned.

### References

Related to #5625 (Add URL scan cache functionality)
Extends functionality from #5682 (Add bulk scan functionality)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants