-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Push Solr Indexed Flavors into a Frictionless data package #14
Comments
@DiegoPino some "philosophical" thoughts about this, premised I agree with all you wrote and probably I have to analyze deeper single steps. |
@giancarlobi I totally agree. Interestingly enough we have almost every (data) we just need to code it and since we are building AMI this is maybe a good moment to start doing that. PS: I will copy your thoughts and also this post later this week into its own ISSUES to complement what is missing: Original Data:
Solr: if we have Data-packages reindexing is a breeze. It is also important that we have SBR processors kill switches, we do not want to re-process data when doing a full restore.
@alliomeria @dmer any thoughts on this? All this looks like code we can get rolling quite fast but then again we have our hands so full. Should we add this to the roadmap in a more concrete fashion? |
@DiegoPino I really like the ideas here (as far as I understand them :) It sounds like this would make a much more secure and robust backup (and more importantly) restore. Having time-based and/or single item level restore available via the UI would be a huge improvement on the current restore capabilities that I'm used to w/ Islandora. As to your question about when. My main input is that I'm wanting to start working with the AMI tools asap so anything that sounds like it might delay that I'm suspicious of! - perhaps I'll be able to better answer after our briefing. |
Why?
Or Strawberryfield data source is totally virtual. During a processing chain we use local storage key values to allow Search API to fetch the recently ingested data. But for a longer/complete reindex we want to have that data in a more stable place, specially for longer running/expensive operations like HORC.
The logic we want is that after a Processor's output has been tracked we push the data into a (new or existing) frictionless data package managed by us file. Idea is if the file exists and the content of a certain Flavor ID is inside we update, if not we create and add.
The the Flavor Data source can always try to fetch from the less expensive Key/Value store if found, or if not, see if the Node itself has one of the packages corresponding to the same FLV id.
Flavors indexed into Solr have the id pattern (Flavor ID)
"ss_search_api_id":"strawberryfield_flavor_datasource/2017:1:en:1d9ae1cd-b3d0-477c-8061-313bb1bc9273:ocr",
Which means:
strawberryfield_flavor_datasource
=> the data source2017
=> the Node ID1
= The sequence (remember this is one Node to many files to many sequences)1d9ae1cd-b3d0-477c-8061-313bb1bc9273
=> The File UUID that was processedocr
=> the Plugin type that generated thisDepending on how well I can deal with this issue esmero/strawberryfield#115 we may want to have many Frictionless Data Packages or a single one
The operation would be (pseudo buggy code)
On reindexing/indexing/update from Search API:
@giancarlobi ideas/thoughts?
The text was updated successfully, but these errors were encountered: