-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate flushing Onyx cache to disk on idle #281
Comments
Hi @marcaaron, Technically, it's not so much that To illustrate, as noted in this thread: https://expensify.slack.com/archives/C01GTK53T8Q/p1689267792230089?thread_ts=1689036681.307159&cid=C01GTK53T8Q:
In light of this, I propose that we consider implementing a storage provider that circumvents the use of Alternatively, we could explore other storage driver libraries that might be a better fit for our needs. There exist libraries that streamline the IndexedDB interface, offering ease of use without forfeiting granular control. I like the idea of batching a few changes and applying them at the best time. It appears that the app does not require immediate data writes. In the worst-case scenario, we might "skip of a beat," and some information might need to be fetched remotely due to delayed persistence. |
Great thoughts @kidroca. So it seems like our options are:
I do like this 3rd option since it does not seem to difficult to implement something like: let timer;
const idleTime = 2000; // or whatever we want to use for this
function resetTimer() {
pauseSetItemQueue();
clearTimeout(timer);
timer = setTimeout(processSetItemQueue, idleTime);
}
document.addEventListener('mousemove', resetTimer);
document.addEventListener('keypress', resetTimer); |
@marcaaron I think this library might work better for our case: https://www.npmjs.com/package/idb-keyval the following methods are noteworthy:
Regarding deferring until idle, there's this API available to browsers as well as react-native: https://caniuse.com/requestidlecallback - It accepts callback function that will be executed when the browser becomes idle Safari lacks support for |
|
Hey, I somehow overlooked this ticket and was doing a performance audit on a guides account and came to the exact same issue of the massive amount of data we are persisting at app init.
async function saveSettings () {
// A task queue of functions
// Imagine this being tasks e.g. inside onyx where we we execute database operations.
const tasks = [
validateForm,
showSpinner,
saveToDatabase,
updateUI,
sendAnalytics
];
// The value of 50 is important here.
// Any task that takes longer than 50ms is considered
// too long and blocking. So we don't want our tasks to
// take longer than that (so the browser has a chance to
// squeeze in other tasks e.g. handling user interactions)
let deadline = performance.now() + 50;
while (tasks.length > 0) {
// Optional chaining operator used here helps to avoid
// errors in browsers that don't support `isInputPending`:
if (navigator.scheduling?.isInputPending() || performance.now() >= deadline) {
// There's a pending user input, or the
// deadline has been reached. Yield here:
await yieldToMain();
// Extend the deadline:
deadline = performance.now() + 50;
// Stop the execution of the current loop and
// move onto the next iteration:
continue;
}
// Shift the task out of the queue:
const task = tasks.shift();
// Run the task:
task();
}
}``` |
I agree with @kidroca about using the https://www.npmjs.com/package/idb-keyval as a new provider. I have tested this package before, and it was significantly faster compared to import {
get, set, createStore, clear, keys as getKeys, del, getMany, setMany,
} from 'idb-keyval';
import _ from 'underscore';
import fastMerge from '../../fastMerge';
import SyncQueue from '../../SyncQueue';
const dbName = 'OnyxDB';
const storeName = 'keyvaluepairs';
const OnyxStore = createStore(dbName, storeName);
const syncQueue = new SyncQueue(({keyVal, command}) => {
switch (command) {
case 'multiSet': {
const label = `multiSet ${keyVal.length}`;
// eslint-disable-next-line no-console
console.time(label);
return setMany(keyVal, OnyxStore).then(() => {
// eslint-disable-next-line no-console
console.timeEnd(label);
return Promise.resolve();
});
}
case 'multiMerge': {
const label = `multiMerge ${keyVal.length}`;
// eslint-disable-next-line no-console
console.time(label);
const keys = _.map(keyVal, ([key]) => key);
// We use getMany to get all existing data in one transaction
// then we merge the existing data with new one
// finally we use setMany to update new data in one transaction
return getMany(keys, OnyxStore)
.then(values => _.object(_.map(keys, (key, i) => [key, values[i]])))
.then((existingPairsObject) => {
const newPairs = _.map(
keyVal,
([key, value]) => {
// if value is not an object or an array then overwrite the value stored
if (!_.isObject(value) && !_.isArray(value)) {
return [key, value];
}
const existingValue = existingPairsObject[key];
const newValue = _.isObject(existingValue) ? ({...fastMerge(existingValue, value)}) : value;
return ([key, newValue]);
},
);
return setMany(newPairs, OnyxStore).then(() => {
// eslint-disable-next-line no-console
console.timeEnd(label);
return Promise.resolve();
});
});
}
case 'setItem': return set(keyVal[0], keyVal[1], OnyxStore);
default: return set(keyVal[0], keyVal[1], OnyxStore);
}
});
const provider = {
getItem: key => get(key, OnyxStore),
setItem: (key, value) => syncQueue.push({keyVal: [key, value], command: 'setItem'}),
clear: () => clear(OnyxStore),
getAllKeys: () => getKeys(OnyxStore),
removeItem: key => del(key, OnyxStore),
multiGet: keys => getMany(keys, OnyxStore).then(values => _.map(keys, (key, i) => [key, values[i]])),
multiSet: pairs => syncQueue.push({keyVal: pairs, command: 'multiSet'}),
multiMerge: pairs => syncQueue.push({keyVal: pairs, command: 'multiMerge'}),
};
export default provider; |
Curious to hear more about this. So, would we just defer writes to the worker thread by posting data to the worker and then have those messages get written to IndexedDB? Does it work the same when reading data? |
I am not sure how to get around that. Either have to save the data faster or expect to lose some of it 😄 Losing it might not be much of a concern depending on what is lost, when it would be refreshed again, and what bad things could happen if the data is not persisted (maybe nothing). |
Maybe as a next step we can try to test idb-keyval with That would probably give us an idea of what kind of performance improvements we might see from getting off localforage (which I think is kind of a separate issue). Then we can explore throttling storage after? FWIW, I think web workers are pretty cool, but most hesitant about this idea since it will add some build complexity. |
Hey 👋 I'd like to take the initiative to test with idb-keyval as we are currently hired to improve the performance, and this would just be my next step 😊 Also I'd still would like to test a OMT performance PoC.
Yes! (You are right this is a bit of a separate issue we are talking about atm, but I think that should come first)
Yes, the worker thread would be concerned with writing the data. I also thought an about it reading the data. It might adds a tiny little overhead but then we are keeping more work away from the main thread which is the whole idea of the change |
Results for 1.000 items: Onyx right now: 6500ms |
Wow. What are we waiting for??? 😄 Seems clearly much better. Next thought would be... Can we use the same DB? Or do we need some migration to move from Looks very promising though so I think we should go for it. |
I think no migration is needed. I will write a PR today, and check the results for when used in actual product 😊 // Edit: result shared here https://expensify.slack.com/archives/C035J5C9FAP/p1691063111819169 |
PR is up 😊 On another note in regards to the "Flush onyx cache to disk on idle": To give you an example, even with (However! In the advent of idb-keyval, where operations usually take ~10ms, and pagination reducing the size of the network responses, I am not sure how big the benefit of these changes would actually be 🤔 ) |
Hmm if this is true that I agree and not sure I see a clear reason why we would do this. The storage should be able to be "fire and forget". Nothing needs to "wait" on it. FWIW I don't think we were always doing this and if we started at some point it might not have been intentional. I think we are talking about this code here? If these "update promises" would resolve after the data has went into the cache then we should not be concerned with whether they are written to storage. |
Yes, I am talking about that code. because then we have the network's sequential queue here: which basically waits until the request is done including to process all middlewares, which includes writing to the disk. Only once that's done we call |
I don't have a great understanding of why we are waiting there. I think we were not always waiting and the change was introduced in this PR. Gonna tag in @neil-marcellini and @jasperhuangg to see if they are able to help. |
Hmm I'm honestly not too familiar with why we're waiting on the storage write to finish before continuing down the SequentialQueue. Correct me if I'm wrong, but I feel like once a request is queued its params can't change, even if those params depend on Onyx values that were changed by a a previous request that wrote to Onyx. I think @neil-marcellini would probably understand this issue a bit better. But I read through the thread and I think it makes sense to use the new library if it provides such dramatic performance improvements. Is there any reason why we would forgo using the new library and remove this "wait for write" stuff instead? |
That is correct. But would you mind clarifying how it relates to this convo? I am having trouble connecting the dots. Thanks!
They're loosely related. We're really discussing two separate considerations in this issue thread. |
Yeah that's been in place for a while, and it was actually introduce in this PR Expensify/App#9306. As explained in the comment I implemented that bit of code so that getCurrentRequest works. I did add also add the queueOnyxUpdates updateHandler. I don't know if we could safely continue with the next request before the data is written to disk, but I'm guessing we probably could. |
@marcaaron I was responding to this secondary consideration:
I was considering the ramifications of if we ignored waiting for writes to finish before unblocking the queue. |
Ok, so... is everyone saying they agree that we don't actually need to wait for the writes to happen? 😄 |
We have observed that IndexedDB can be super slow with writes and cause major lag and jank in the app when there is a huge amount of data to store on App init.
Our solution for now has been to just completely cease storing certain keys entirely. But, another potentially viable solution is to keep the data in memory for some time and then persist it to disk when the user is "idling".
cc @hannojg @kidroca curious if you guys have considered this one yet.
h/t @quinthar for the idea
The text was updated successfully, but these errors were encountered: