-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automate a flush of the name lookup cache #1059
Comments
alternatively we could create a new rabbit message, a flushing listener and emit these messages in the clb clis - to be configured with a list of dataset keys to be flushed |
If we do the message approach, I'd suggest the subscriber decide if the cache should be flushed, as it's the pipeline that decides if the short circuit IDs should be used, not CLB. Knowing we're moving off this edition of CLB and it'll all change soon, I wonder if it'd be sufficient to simply flush the cache daily knowing the lookup is now fast so rebuilding is not a big cost. What do you think? |
Another options:
|
I wonder if we would not face the same problems in the future too. We will still integrate COL, IUCN, WoRMS and maybe more checklists supplying identifiers. Just through a different system. But we'll be doing this more regulary on a monthly basis, so that might simply be the time to flush and start from scratch without the need to inform about changed lists? Actually we have the messaging already in place - clb clis emit a ChecklistSyncedMessage once done. The only thing needed would be a listener that knows which datasets to watch out for and then flush the cache. |
(Not a pipeline specific issue, but tracking in this repo as it's connected to pipeline function)
Now that the name lookup cache holds decorated records of lookups (e.g. IUCN) and has the scientificNameID etc. short-circuit lookup we need to flush the cache. It has potential to go stale on changes in the IUCN redlist, backbone or any checklist configured to short circuit with ID lookup (e.g. WoRMS LSIDs).
portal-feedback/#5239 is an example of cached responses causing confusion.
I suggest to simply
truncate_preserve
the HBase table weekly (edit: or daily) after verifying that nothing is running.The text was updated successfully, but these errors were encountered: