-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: allow scaling of the search service #11029
base: master
Are you sure you want to change the base?
Conversation
Thanks for opening this pull request! The maintainers of this repository would appreciate it if you would create a changelog item based on your changes. |
services/search/pkg/config/engine.go
Outdated
@@ -9,4 +9,5 @@ type Engine struct { | |||
// EngineBleve configures the bleve engine | |||
type EngineBleve struct { | |||
Datapath string `yaml:"data_path" env:"SEARCH_ENGINE_BLEVE_DATA_PATH" desc:"The directory where the filesystem will store search data. If not defined, the root directory derives from $OCIS_BASE_DATA_PATH/search." introductionVersion:"pre5.0"` | |||
Scale bool `yaml:"scale" env:"SEARCH_ENGINE_BLEVE_SCALE" desc:"Enable scaling of the bleve index. If set to false, the service will have exclusive write access to the index as long as the service is running, locking out other processes. Defaults to false." introductionVersion:"%%NEXT%%"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scale bool `yaml:"scale" env:"SEARCH_ENGINE_BLEVE_SCALE" desc:"Enable scaling of the bleve index. If set to false, the service will have exclusive write access to the index as long as the service is running, locking out other processes. Defaults to false." introductionVersion:"%%NEXT%%"` | |
Scale bool `yaml:"scale" env:"SEARCH_ENGINE_BLEVE_SCALE" desc:"Enable scaling of the search index (bleve). If set to 'true', the instance of the search service will no longer have exclusive write access to the index. Note when scaling search, all instances of the search service must be set to true! For 'false', which is the default, the running search service will lock out other processes trying to access the index as long it is running." introductionVersion:"%%NEXT%%"` |
Proposal: as we are enable scaling, we should make the path to scaling more clear, order the cases and align the description.
|
@@ -9,4 +9,5 @@ type Engine struct { | |||
// EngineBleve configures the bleve engine | |||
type EngineBleve struct { | |||
Datapath string `yaml:"data_path" env:"SEARCH_ENGINE_BLEVE_DATA_PATH" desc:"The directory where the filesystem will store search data. If not defined, the root directory derives from $OCIS_BASE_DATA_PATH/search." introductionVersion:"pre5.0"` | |||
Scale bool `yaml:"scale" env:"SEARCH_ENGINE_BLEVE_SCALE" desc:"Enable scaling of the search index (bleve). If set to 'true', the instance of the search service will no longer have exclusive write access to the index. Note when scaling search, all instances of the search service must be set to true! For 'false', which is the default, the running search service will lock out other processes trying to access the index as long it is running." introductionVersion:"%%NEXT%%"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scale bool `yaml:"scale" env:"SEARCH_ENGINE_BLEVE_SCALE" desc:"Enable scaling of the search index (bleve). If set to 'true', the instance of the search service will no longer have exclusive write access to the index. Note when scaling search, all instances of the search service must be set to true! For 'false', which is the default, the running search service will lock out other processes trying to access the index as long it is running." introductionVersion:"%%NEXT%%"` | |
Scale bool `yaml:"scale" env:"SEARCH_ENGINE_BLEVE_SCALE" desc:"Enable scaling of the search index (bleve). If set to 'true', the instance of the search service will no longer have exclusive write access to the index. Note when scaling search, all instances of the search service must be set to true! For 'false', which is the default, the running search service will lock out other processes trying to access the index exclusively as long it is running." introductionVersion:"%%NEXT%%"` |
I missed one word 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure... with the new wording, it seems other processes might be able to access the index as long as they aren't requesting exclusive access.
To clarify, oCIS' search service will request exclusive access (write mode), which will cause other processes to be locked out regardless of the access type.
Also, with the Scale
flag active, the difference is that oCIS will connect and disconnect with different accesses (including exclusive access) multiple times. Exclusive access will be requested (depending on the operation), and it will be granted if possible, but the connection won't last long, so other connections will be able to access the index afterwards.
Based on data from owncloud/cdperf#73 With scaling active and 3 search replicas:
with no scaling (exclusive access)
Note that the tests were run on a single machine with limited memory, so the timings shown above might be worse than the actual ones on a real environment. In any case, having exclusive access is preferred if it's possible |
Question regarding the Loadtest results: What is the point in scaling search service if performance gets worse? Is there some amount of load where we achieve better results with scaling activated? |
I think scaling is always fine. While it's true that offloading the search service into a different host could work fine, we might still hit some limits, specially in huge environments. My main selling point, however, is that we can access the bleve index with the bleve cli without stopping the service. We can debug things in the index without bothering the users too much. Right now, the bleve cli gets stuck waiting for the search service's connection to finish (which doesn't happen until the service stops). Regarding the results, in general I don't think it will be that bad. I had everything (tests included) running on a quite limited VM, and I even had to reduce the test users because I was breaking the VM (not enough memory). Running the tests on a "real" test environment should show better results, even though not scaling might still be the preferred choice. |
// closed and reopened safely at any time, IndexCanBeClosed should | ||
// return true. | ||
type IndexGetter interface { | ||
GetIndex(opts ...GetIndexOption) (bleve.Index, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could change the interface slightly here. If we return the closer, we get around the IndexCanBeClosed
method
GetIndex(opts ...GetIndexOption) (bleve.Index, error) | |
GetIndex(opts ...GetIndexOption) (bleve.Index, func(), error) |
We can then always do
index, close, err := b.GetIndex()
if err != nil {
return
}
defer close()
This would allow us to skip all the bool checks
@@ -0,0 +1,21 @@ | |||
package bleve | |||
|
|||
type GetIndexOption func(o *GetIndexOptions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exported type GetIndexOption
should have comment or be unexported
|
||
type GetIndexOption func(o *GetIndexOptions) | ||
|
||
type GetIndexOptions struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exported type GetIndexOptions
should have comment or be unexported
ReadOnly bool | ||
} | ||
|
||
func ReadOnly(b bool) GetIndexOption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exported func ReadOnly
should have comment or be unexported
Fair point. Maybe we can see performance improvements if we hit the platform really hard. If not we still have a maintenance improvement 👍 |
abc113b
to
3b038d9
Compare
The closeFn implicitly checks the value as the implementation decides what to do (either do nothing or close the index). Calling the closeFn is safe and we don't need conditions.
3b038d9
to
6df657b
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice and clean 👍 Please wait for @mmattel approve before merging
Description
Previously, the search service opened a write connection to the bleve index and kept that connection opened as long as the service was running. This means that the search service was locking out other processes from accessing the index, including the bleve cli and other replicas of the search service. Replicas of the service can't be spawned because they won't be able to access the index.
We've added a new flag to the search service in order to scale the service.
For environments where scaling isn't needed, it's recommended to set the
SEARCH_ENGINE_BLEVE_SCALE
as false (default value). This will keep the current behavior of locking the bleve index for exclusive access. This is expected to have a better performance because we won't create new connections every time, and the load in the index should be reduced due to less locking overhead.Setting the
SEARCH_ENGINE_BLEVE_SCALE
to true will cause connections to be opened per "engine operation" (search, index, delete, prune...). Those connections will be closed as soon as the operation finishes.Concurrency control is delegated to the bleve index, which will need to handle the operations concurrently from multiple replicas.
On heavy workloads, it's expected operations to have delays because they'll have to wait for other operations to finish and release the lock. Read-only operations (search and count) are expected to run through the index in parallel as long as there is no other write operation on going.
Related Issue
#11008
Motivation and Context
Scaling the service should be possible regardless of the specific setup.
How Has This Been Tested?
SEARCH_ENGINE_BLEVE_SCALE=true
on all of them.You'll see logs coming from different replicas, and the UX isn't impacted in any way (all the replicas are being used and returning results).
In addition, it's possible to run the bleve cli against the index oCIS is using. Previously, the cli waited indefinitely.
Screenshots (if appropriate):
Types of changes
Checklist: