Skip to content

Conversation

@paulirwin
Copy link
Contributor

  • You've read the Contributor Guide and Code of Conduct.
  • You've included unit or integration tests for your change, where applicable.
  • You've included inline docs for your change, where applicable.
  • There's an open issue for the PR that you are making. If you'd like to propose a change, please open an issue to discuss the change or find an existing issue.

Fix MMapDirectory performance issue by using BufferedIndexInput instead of ByteBufferIndexInput.

Fixes #1151

Description

This is a draft PR to run tests on all platforms and gather feedback on this approach.

As noted in #1151, MMapDirectory has a severe performance problem that causes it to be as much as 51x slower in my testing than SimpleFSDirectory. I was able to narrow down the root cause: reading a single byte from a MemoryMappedViewAccessor is very slow in .NET. I believe this is because it has to do logic like acquire and release a pointer, in addition to the range checking. Reading a single byte at a time is very common in our codebase, such as in LZ4 decompression which was a hotspot in the profile. The performance difference of memory-mapped files between .NET and Java disappears when you read multiple bytes at once via ReadArray.

While it might be possible to create a ByteBuffer implementation for memory-mapped files that also has an internal buffer to read 1kB at a time, I decided to take a stab at making MMapDirectory use a similar approach to SimpleFSDirectory, and have its IndexInput implementation inherit from BufferedIndexInput, which maintains a 1kB buffer around the input. Then, when it needs to refill its buffer, it reads from the memory-mapped file view accessor. By buffering 1kB at a time, ReadByte reads from the already-filled buffer which is a fast array operation, resulting in a massive speed-up.

Initial performance results (compare to results in #1151), macOS arm64, .NET 9:

  • SimpleFS: 3.96s
  • NIOFS: 6.01s
  • MMap: 2.47s (0.62x SimpleFS)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Search performance issue with MMapDirectory under load

1 participant