-
Notifications
You must be signed in to change notification settings - Fork 1
A priority cache with a (better?) api #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -0,0 +1,84 @@ | |||
## Whats up | |||
|
|||
So far, we've used the components in this library to successfully tackle some visualization problems, and we have a small collection of examples of how to render different types of data (DZI, OMEZARR, scatterplots, etc). The problems so far all tend to have the following characteristics: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe start here?
high: Set<string> | ||
notify: undefined | ((cacheKey: string) => void) | ||
} | ||
export class FancySharedCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the shared cache - I should rename it... maybe just SharedPriorityCache?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
return pri - (high.has(key) ? 2 : (low.has(key) ? 1 : 0)) | ||
} | ||
const setPriorities = (low: Set<Key>, high: Set<Key>) => { | ||
const newLow = new Set<string>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the meat of the new approach - every time a client updates its priorities, the system has to re-balance the heap of all priorities. This also has a bug due to how Set() deals with equality, and thus membership
todo: rewrite and test more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've re-written this - now I'm going to try to write some tests for it
// they expect that the things coming out of the cache are the type they expect (what they put in it) | ||
// this is not strictly true, as the cache is shared, and other clients may use different types | ||
// also also - there will not be a 1:1 relation between items and | ||
type CacheInterface<Key, Value extends Record<string, Resource>> = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
todo: rename generic parameter Key here - like our current cache, we let the users of the cache deal in objects which can require multiple, distinct entries in the cache itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed
}); | ||
return { | ||
render: (target: REGL.Framebuffer2D | null, dataset: OmeZarrMetadata, settings: RenderSettings) => { | ||
const items = renderer.getVisibleItems(dataset, settings); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is one way we might use the new cache -
- get the items we want to draw
return { | ||
render: (target: REGL.Framebuffer2D | null, dataset: OmeZarrMetadata, settings: RenderSettings) => { | ||
const items = renderer.getVisibleItems(dataset, settings); | ||
const baselayer = renderer.getVisibleItems(dataset, { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- get the lowest resolution data for the same view
camera: { ...settings.camera, screenSize: [1, 1] }, | ||
}); | ||
regl.clear({ framebuffer: target, color: [0, 0, 0, 1], depth: 1 }); | ||
client.setPriorities( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mark each set as low or high priority
new Set(items.map((tile) => ({ tile, dataset, settings }))), | ||
new Set(baselayer.map((tile) => ({ tile, dataset, settings }))) | ||
); | ||
for (const tile of [...baselayer, ...items]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
draw everything, in order, if we have it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing no major issues with the approach taken here! I like it and think it will make our life easier going forward as we do more complex rendering approaches.
new Set(items.map((tile) => ({ tile, dataset, settings }))), | ||
new Set(baselayer.map((tile) => ({ tile, dataset, settings }))) | ||
); | ||
for (const tile of [...baselayer, ...items]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THIS is where it clicked for me.
With this sort of flexibility we should be able to get rid of the current major pixelation on zoom because we can always make sure to draw the existing zoomed out layer first while the others are loading, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup - flexibility is exactly the goal - you're still stuck if you dont have what you want in the cache, but other than that you can do whatever you want - you could draw dots only if you have the pixels that go with them, you could draw some, clear the depth buffer, go get coffee, then come back and render the dots - whatever you can think of!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my current plan (for that pixelation issue) would be to keep a list of tiles drawn recently - and use them to fill in holes while we wait for new stuff to load. In addition, this makes it easy to pre-fetch things - we could pre-fetch slices N-1 and N+1 simply by listing them as low-priority (and then simply not drawing them!)
@@ -225,18 +226,20 @@ export class AsyncDataCache<SemanticKey extends RecordKey, CacheKey extends Reco | |||
return resolvedCacheData; | |||
} | |||
} else { | |||
promise = getter(); | |||
const abort = new AbortController(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think AbortControllers are all that expensive, but this will be called often.
Should be fine, but if they are expensive, that could be an issue!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they have not seemed to be expensive in my testing so far - this change does use a lot more of them though - the old system could only cancel on a per-item basis (one Item -> many cache entries), but for the fine grained sharing and prioritization, using an abort signal per cache entry made more sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also also - even though we supported aborting fetches in the previous cache, it often wasnt used, due to its poor behavior. You'd cancel a frame, that would abort a fetch for item X, and then the very next frame would fire up a fetch for item X :( ... that should not happen under this cache system!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good arguments, set the stage well for why this change makes sense so the PR review has been much easier. Thanks!
I think this could be moved into the /vis/packages/core/
documentation site with a couple small tweaks and be great vis
library developer documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicates the existing render-server-provder.tsx
, so that's a good sign that this proposal doesn't result in sweeping breaking API changes 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking awesome! A few thoughts for ya to look over, but I'm definitely excited to see this in action!
if (this.store.has(key)) { | ||
return false; | ||
} | ||
const size = item.sizeInBytes?.() ?? 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to use a larger number as the default here, or perhaps fail the put
entirely if no size could be calculated for the item? Seems like having it default to 1
(which I assume means 1 byte in this case) could be risky from a memory management perspective 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm - so because the items in the cache come from users of the cache, we cant figure out their size for realsies - we have to ask the client to tell us the size. They could of course lie and just return 1, or zero, or -50.
- we should certainly sanitize the value that comes out of this to prevent negative or non-finite values.
- I think there are things you could reasonably put in the cache that you dont really want to count against a memory limit - regular js objects that will get GC'd normally, for example. This cache isnt really for that, but I still think its a valid use case.
- I guess I'd prefer a more accepting api (with perhaps heavy sanitization) over cache puts that can fail...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, okay, makes sense 👍🏽 My gut's telling me that this is potentially going to end up being a site of the classic tradeoff between flexibility and performance, but since we're not at that point yet I see no need to worry overly much about it now 🙂 And certainly better to start with flexibility before optimizing!
} | ||
const size = item.sizeInBytes?.() ?? 1; | ||
if (this.used + size > this.limit) { | ||
this.evictUntil(Math.max(0, this.limit - size)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, would it maybe be a good idea to evict a little more than was strictly needed, if we assume that puts will happen with significant frequency? Having to go through the eviction process more frequently feels like it might be more costly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
evicting more than needed - I see what you mean, but it tends to work out as the equivalent of lowering the cache limit, which is confusing right ? if the limit is the limit, its better to just use the limit :)
as far as the cost of eviction - in terms of overhead, its very low. in terms of thrashyness, who's to say that thing you aggressively evicted wont be needed the very next frame? caches with smaller limits will churn more than caches with higher limits (the risk being of course that you actually do run out of memory)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
think of it this way also - yes, we would like put() to happen frequently (that means data is arriving, and yes that means going through the eviction logic repeatedly if the cache is nearly full). The secret we know though, is that puts only happen at the end of a fetch(s3://huge-pile-of-data.zarr)! that means they're actually super duper rare. even though the upper levels of the system know that 50 puts are on the way, each one takes a lifetime. Wouldnt it be better to be able to have data (low priority data, granted) in the cache to serve up while we wait for each of those 50 fetch().then(put)
s? If we delete 50 entries from the cache up front, when we know we want to put 50 entries in, theres a large period of time where cache misses are more likely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote some tests - over 1million puts and 999000 evictions, the cost of a single put (which can and does call evict, that time is included in the count) is on average, less than 10 microseconds on my computer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All very good points! I would argue that since all caches have to operate on a heuristic (no crystal balls here!), the most efficient management of the cache would probably be based on recent history -- like if there were a lot of evictions happening in the last, say, 100ms, then it isn't unreasonable to assume that there will be a lot more evictions happening in the near future, so evicting a larger number of entries at once to reduce the (admittedly slight) overhead would potentially be faster overall.
That said, you make a good point about how long it takes (from the system's perspective) for the fetches to complete before the eviction is needed. Better to have the data available for re-renderings for as long as possible. Arguably, evicting relatively rarely-used or low-priority items shouldn't be a major issue (assuming the users have prioritized their chunks correctly), since that's a part of what that prioritization represents, at least by my read of your design.
With all that said, the numbers speak for themselves 🙂 Seeing as evictions appear to be very minimal in their performance impact, there's very little to worry about!
registerClient<Key, Value extends Record<string, Resource>>( | ||
spec: ClientSpec<Key, Value>, | ||
): CacheInterface<Key, Value> { | ||
const id = uniqueId('client'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pondering this, I'm wondering if -- given the complexity and significance of the closure being produced within registerClient
-- it might make sense to instead have Clients be another type of class? I'm thinking in terms of both code structure/extensibility and memory management -- closures, as a pattern, at least as I understand them, do have some memory management implications that bear keeping an eye on.
That said, I do see that the pattern in use here involves the returned Client object having access to various private methods and objects within the SharedPriorityCache
, and I'd much rather a closure-based approach than something involving lots of weird back-and-forth between different classes and unnecessary exposure of private fields/methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah - weird back and forth between actually independant objects (like ye olde friend classes) is messy and error prone. I'd like to avoid that. I see what you mean, but I think the ad-hoc object we return here is a good thing - in a way its just an extension of the shared-cache, not really indpendant at all right?
as for memory management - closures follow the same rules as classes, but its harder (for humans, not for V8) to track the lifetimes of things in lexical scopes, vs. where its spelled out in a class. I really dont think you'd see any GC / perf improvement with a class here.
as for what is closed over by these functions, its the cache itself, const id
and the functions themselves. not really much to worry about to my eye.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so:
- thanks for the careful thoughts! All of these are good points worth gnawing on!
- the main feeling I'm getting from your comments are concerns about memory and performance. That is valid - perhaps we could set up some perf tests to observe behavior in a few of these scenarios?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thing I recently added #172 😁
I have no experience with these tools though so I don't have any concrete recommendations, but I think getting some performance testing set up for the repo will be an excellent long-term decision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@froyo-np Yep, that's a fair read! Looking at how this fits into the longer-term picture, my sense is that memory and performance are going to be sticking points for folks who are inclined to compare Vis to NG (alongside rendering accuracy, but I consider that a completely separate concern, at least right now). It may not be front of mind for everyone, but when it comes down to it, people judge a product by how seamless it appears to be, and loading performance is one of the most visible things that people key off of for that.
And yes, performance tests would be great! 😀
… which could easily be busted when given items that were == but not ===
…Institute/vis into noah/shared-priority-cache
…pattern in a shared data context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, a variety of comments. Some are minor items to fix, some are questions, some are me being happy to see cool stuff, and one is how to get them all rendering on the first page load!
this.items = new Map(); | ||
} | ||
|
||
addItem(item: T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just spitballing, would it be useful to return whether the item was added or not? At the moment in the way it's used, that's not a useful property for this to have though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm yeah happy to make the change if we see a good reason for it - but lets leave it be for now
}; | ||
type ObjectValue<T extends Record<string, any>> = T extends Record<string, infer Value> ? Value : never; | ||
|
||
type KV<T extends Record<string, any>> = readonly [keyof T, ObjectValue<T>]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple any
s here that would be nice to resolve. TypeScript and the linter seemed perfectly happy if they're unknown
s instead!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so since the any's are in the type constraint they dont end up poisoning everything... - I'm certain I used unknowns to start... I'll give it a shot anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unknowns work great - not sure what was up when I switched away from them
{ timeout: 10000 }, | ||
); | ||
test( | ||
'enqueue with instant fetching - overall speed', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one fails intermittently with me. Been getting between 8500ms and 9100ms. Maybe we bump up the time a little bit more
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regarding the timings and expectations of these - they are completely dependant on the machine on which you run them, and the circumstances of that machine! they're easy to make, and they can help track down or think through bottlenecks... but they are pretty hard to use as something like a perf/regression test
// a photon travels about 1000 feet in that time | ||
expect(numEvicted).toBe(999000); | ||
}); | ||
test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test
and the other one with the timeout below is telling me they're deprecated.
Looks like it just wants the options
parameter to come second instead of third though, easy fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh is that what it was saying - I was very confused about that deprecation message, so thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
...ePri('q', 2), | ||
}); | ||
}); | ||
test('repeat items have priorities summed', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Another "ah ha!" moment for me where the utility of this really clicks. We'll get the stuff we need across all priorities first if both need it. Love it.
C.setPriorities(Items('a', 'b', 'c'), Items('p', 'q')); | ||
// this should immediately enqueue p,q,a,b,c | ||
// it would be great to be able to peek at the priorities! | ||
const spyCache = cache as any as { importance: Record<string, number> }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regular reminder that TypeScript is all smoke and mirrors. Took me a minute to figure out what was going on here since I didn't see a test framework spy utility lol
Maybe we start using actual private elements in classes one of these days, although then we couldn't cheat our way into testing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haha yea this is gross on a few levels. I think the other test, I did this by observing the order, and its was very tedious
cache: SharedPriorityCache; | ||
} | null>(null); | ||
|
||
export function SharedCacheProvider(props: PropsWithChildren) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Found the issue! On first render, state
is undefined
and then it's not rendering again because nothing is happening. Hence why it pops up on reload.
Instead, we can do it like this, and it all renders!
export function SharedCacheProvider(props: PropsWithChildren) {
const [state] = useState(() => {
const newState = new StateHelper(2000 * 1024 * 1024, 50, ['oes_texture_float']);
logger.info('server started...');
return newState;
});
const { children } = props;
useEffect(() => {
return () => {
logger.info('shared cache disposed');
state.destroy();
};
}, [state]);
return (
<SharedCacheContext.Provider
value={{ regl: state.regl, cache: state.cache }}
>
{children}
</SharedCacheContext.Provider>
);
}
I totally forgot the useState
initializing funciton was a thing... We can clean up SO MANY useRef
s with that now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After sleeping on it, I'm not sure that useState
is the best option, since the render server is one of those things we don't want being deleted or recreated with any of React's lifecycles.
To keep it a useRef
but fix it, we just take the state.current =
stuff out of the useEffect
and do an if
statement checking if it's null and setting it if it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm - this does work, but I'm noticing it seems to rebuild itself every frame... surely I did something wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's rebuilding itself? I'm playing with it and the provider looks to be stable. Logs in the constructor only fire once, and the destructor isn't firing at all.
I do see lots of uncaught abort errors in the console though as I pan and zoom 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
somehow the zoomHandler was constantly a new object - anyway fixed it this is working well now
}, {}); | ||
}, | ||
fetch: (item) => { | ||
// the Renderer<...> type obscures the fact that these are always cached textures... TODO fix that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A TODO to fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah lets leave it for now if you dont mind
// }; | ||
const [width, height] = screenSize; | ||
const target: REGL.Framebuffer2D = regl.framebuffer({ width, height }); | ||
return { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woooo loving this. More straightforward for sure!
Co-authored-by: Lane Sawyer <[email protected]>
Co-authored-by: Lane Sawyer <[email protected]>
…Institute/vis into noah/shared-priority-cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! One question that I think was an oversight on a refactor?
I'm also seeing lots of the "AbortError: the operation was aborted" and "uncaught exception: cancelled" in the dev console. It would be nice to catch those and log them with our logger if they're needed, or just suppress them if they're part of the normal flow (which is my assumption, since we're aborting things a lot more smartly now)
The error clean up doesn't have to happen now since it's not integrated with any apps yet, but it'd be nice to not fill up the console with errors if we don't need to
.catch((_reason) => { | ||
//ignore | ||
.catch((reason) => { | ||
this.notify?.(key, { status: 'failure', reason }); | ||
}) | ||
.finally(() => { | ||
this.pendingFetches.delete(key); | ||
this.fetchToLimit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we notify success?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do - on line 88, which you cant see in this diff :)
Remaining:
What
add a new cache, intended for sharing. Users are not required to couple fetching-order with rendering-order, and the amount of life-cycle callback wrangling is much reduced.
How
Replace this txt describing what kind of technical overlaying code changes were introduced here.
Screenshots
This section is optional if there are no visible changes
PR Checklist
main
?