Skip to content

Provide a way to get a read-only snapshot of the FrequencySketch of Cache #313

@tatsuya6502

Description

@tatsuya6502

#311 (comment) by @peter-scholtens

Similar as mentioned in #235, I try to store and read-in my Cache content locally during a re-start of my server. However, this means the expiration state will be reset: the frequency is set to zero and the TTL/TTI maximized. Would it be possible to iterate with Expiration state? So this state information can be stored too.

Instead of providing the read frequency of a key in a Cache, we will provide a read-only snapshot of the FrequencySketch, the historic popularity estimator, of the Cache.

  • This will be preferable because we can preserve all the information in the FrequencyScketch.
    • FrequencySketch tracks the access counts of not only the keys that are currently in the cache, but also all the keys that have been attempted to get from the cache. (frequency is incremented for a key for both cache hit and miss).
    • We do not know what the all keys are as the FrequencySketch does not store the keys.
  • The FrequencySketch is a part of the Cache state and guarded by a mutex.
    • We do not want to expose it directly, which may cause lock contentions.
    • Instead, we will provide a read-only snapshot (clone) of it.

By #314, we can recreate a Cache with a given FrequencySketch snapshot and BuildHasher.

For the FrequencySketch snapshot, we will provide the following methods and functions:

  • Get an estimated frequency (u8) of a given key:
    • The stored frequency is unsigned 4-bit integer.
  • Convert a FrequencySketch into a serialized form Box<[u8]>.
    • The Box<[u8]> should contain all the information of the FrequencySketch snapshot, plus:
      • A hard-coded UUIDv4 as the serialize format version identifier.
      • A check sum, which is the hash value of the serialized data calculated by the hasher of the cache.
  • Recreate it from a serialized form.
    • Verify the followings:
      • The hard-coded UUIDv4.
    • Provide validate method, which takes a hasher and verify the check sum.
      • CacheBuilder will call it when building a CacheLoader with this frequency sketch snapshot and a BuildHasher.
use ahash::RandomState;
use moka::sync::Cache;

fn generate_high_qualyty_seeds() -> [u64; 4] {
    // ...
}

let seeds = generate_high_quality_seeds();
let build_hasher = RandomState::with_seeds(seeds[0], seeds[1], seeds[2], seeds[3]);

let cache = Cache::builder()
    .max_capacity(10_000)
    .build_with_hasher(build_hasher);

// Force the cache to enable the frequency sketch because it
// should be disabled when the cache is less than 50% full.
//
// NOTE: If the max capacity of the cache is not set, this will
// be no-op.
cache.policy().enable_frequency_sketch();

cache.get(&0); // => frequency 1
cache.insert(0, 'a');
cache.get(&0); // => frequency 2

// Get a read only snapshot of the frequency sketch.
let sketch = cache.policy().frequency_sketch().unwrap();

sketch.estimated_frequency(&0); // => maybe 2u8
sketch.estimated_frequency(&1); // => maybe 0u8

let serialized = sketch.into_bytes(); // Box<[u8]>

// Rebuild the cache with the serialized frequency sketch.

let deserialized = FrequencySketch::from_bytes(serialized);
// Ensure to use the exact same hasher, because the sketch
// will return correct estimates only when the correct hasher
// is used.
let build_hasher = RandomState::with_seeds(seeds[0], seeds[1], seeds[2], seeds[3]);
let cache = Cache::builder()
    // See https://github.com/moka-rs/moka/issues/314
    .loader_with_frequency_sketch(deserialized, build_hasher)
    .unwrap()
    .finish();

// The cache is empty but it should have the same frequency sketch
// as the previous one.
assert_eq!(cache.entry_count(), 0);
let sketch = cache.policy().frequency_sketch().unwrap();
sketch.estimated_frequency(&0); // => maybe 2u8
sketch.estimated_frequency(&1); // => maybe 0u8

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions