-
Notifications
You must be signed in to change notification settings - Fork 17
Search in shared files using a single index #10
Description
Originally opened as owncloud-archive/apps#1464
Steps to reproduce
- Alice shares a file with Bob containing the word 'secret'
- Bob searches for 'secret'
- He gets a search result for an occurence in the file shared by alice.
Expected behaviour
Users should be able to find files that have been shared with them by searching in the content.
Actual behaviour
Currently, only the users files are indexed.
Technical background
The lucene index is stored on a per user basis and resides in the /<userhome>/lucene_index
. While it is not encrypted for performance reasons, that is possible but would prevent using another users index for the full text search (because we cannot access his encrypted index without his secret key).
Planned Approach
The current plan is to make the documents in the lucene index contain the name of users and groups allowed to access the file. Whenever a file is shared / unshared we need to update the document in the lucene index. Unfortunately, lucene - by design - only allows adding or deleting documents in the index. Initial testing indicates that query hits can be used to obtain the original document, update it with the updated list of users / groups who can acces the document and then delete & reinsert the document into the index. All without having to reindex the original file. Which would take far too long.
Maintaining the permissisons like this is described in http://www.lucenetutorial.com/techniques/permission-filtering.html and we cann add the user that is querying to the query as a subquery as shown in http://framework.zend.com/manual/1.12/en/zend.search.lucene.searching.html
Further thoughts
When we add user / group permissions we could create a single global index and use that instead of querying each individual user index. Whether that will improve performance (because we only need to access one index) or degrade it (because the index might grow very large) remains to be tested.
Using a single index simplifies the whole architecture. And is the way to go.