-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce raw fwd index version V5 containing implicit num doc length, improving space efficiency #14105
Merged
Jackie-Jiang
merged 104 commits into
apache:master
from
y-scope:master-improved-MV-fixed-byte-index
Oct 17, 2024
Merged
Introduce raw fwd index version V5 containing implicit num doc length, improving space efficiency #14105
Changes from 12 commits
Commits
Show all changes
104 commits
Select commit
Hold shift + click to select a range
84987ed
Initial implementation of toggling explicit MV entry size for MVFixed…
jackluo923 d654fd9
Fixed uncovered code paths exposed via unit test
jackluo923 3d4b99b
Fix style issue
jackluo923 8c967b5
Refactored code to use new class versions.
jackluo923 c2359ec
Fixed style.
jackluo923 dd3410f
Refactored MultiValueFixedByteRawIndexCreatorTest.java
jackluo923 0c0df84
Fix style.
jackluo923 e7e091b
Modified existing unit test and extended it for MultiValueFixedByteRa…
jackluo923 153be16
Improved unit test for MultiValueFixedByteRawIndexCreatorTest and Mul…
jackluo923 0233905
Remove redundant blank line
jackluo923 69defe1
Adjusted comments content
jackluo923 e1173c0
Removed redundant constructor missed during refactoring.
jackluo923 34ac786
Upgrade MVFixedByteRawIndex reader and writer from V4 to V5, retain f…
jackluo923 b090676
Minor changes in MultiValueFixedByteRawIndexCreator
jackluo923 2ff1914
Fix minor style issue.
jackluo923 54b2709
Refactored FixByteChunkMVForwardIndexReader
jackluo923 318b826
Deleted FixByteChunkMVForwardIndexReaderV2
jackluo923 a9170b7
Deleted FixByteChunkMVForwardIndexReaderV2Test
jackluo923 d699c2a
Add VarByteChunkV5Test unit test
jackluo923 7137792
Add license to VarByteChunkV5Test unit test
jackluo923 6452c79
Improved unit test
jackluo923 9bfbd22
Refactored unit test
jackluo923 fac46c5
Add blank line
jackluo923 29a9fdb
Remove blank line
jackluo923 8abf7fe
Add blank line
jackluo923 1c877c6
Remove blank line
jackluo923 ce6870b
Rebase with lastest master
jackluo923 0d71f91
Fixed uncovered code paths exposed via unit test
jackluo923 736f23f
Fix style issue
jackluo923 f2faece
Refactored code to use new class versions.
jackluo923 01cbf56
Fixed style.
jackluo923 f8c2f24
Refactored MultiValueFixedByteRawIndexCreatorTest.java
jackluo923 a6ca351
Fix style.
jackluo923 61fee18
Modified existing unit test and extended it for MultiValueFixedByteRa…
jackluo923 500dae6
Improved unit test for MultiValueFixedByteRawIndexCreatorTest and Mul…
jackluo923 427bdb5
Remove redundant blank line
jackluo923 0cb705d
Adjusted comments content
jackluo923 1617ebd
Removed redundant constructor missed during refactoring.
jackluo923 813d360
Upgrade MVFixedByteRawIndex reader and writer from V4 to V5, retain f…
jackluo923 7a565e4
Fix minor style issue.
jackluo923 296959d
Deleted FixByteChunkMVForwardIndexReaderV2
jackluo923 0d7073b
Deleted FixByteChunkMVForwardIndexReaderV2Test
jackluo923 427af58
Add VarByteChunkV5Test unit test
jackluo923 6e8c3ae
Add license to VarByteChunkV5Test unit test
jackluo923 52976a7
Improved unit test
jackluo923 9bb453a
Refactored unit test
jackluo923 93d3100
Add blank line
jackluo923 79e91e9
Remove blank line
jackluo923 9a7676f
Add blank line
jackluo923 aa9eb74
Remove blank line
jackluo923 4ce5280
Refactored code to utilize changes from Extract common MV ser/de logi…
jackluo923 0d88f2f
Merge remote-tracking branch 'origin/master-improved-MV-fixed-byte-in…
jackluo923 ec1b628
Removed redundant RuntimeException from method signature
jackluo923 87dd327
Merge branch 'apache:master' into master-improved-MV-fixed-byte-index
jackluo923 e7f645c
Rebase with lastest master
jackluo923 5cb4575
Fixed uncovered code paths exposed via unit test
jackluo923 1ece331
Fix style issue
jackluo923 4c35683
Refactored code to use new class versions.
jackluo923 bd1da13
Fixed style.
jackluo923 2637e2d
Refactored MultiValueFixedByteRawIndexCreatorTest.java
jackluo923 731906e
Fix style.
jackluo923 171aaf4
Modified existing unit test and extended it for MultiValueFixedByteRa…
jackluo923 bca5eda
Improved unit test for MultiValueFixedByteRawIndexCreatorTest and Mul…
jackluo923 ce5eb7b
Remove redundant blank line
jackluo923 7274f4c
Adjusted comments content
jackluo923 ea29a13
Removed redundant constructor missed during refactoring.
jackluo923 256d774
Upgrade MVFixedByteRawIndex reader and writer from V4 to V5, retain f…
jackluo923 acfe864
Fix minor style issue.
jackluo923 a4751b6
Deleted FixByteChunkMVForwardIndexReaderV2
jackluo923 32062a1
Deleted FixByteChunkMVForwardIndexReaderV2Test
jackluo923 ef3f663
Add VarByteChunkV5Test unit test
jackluo923 38a8cb6
Add license to VarByteChunkV5Test unit test
jackluo923 b8dfacd
Improved unit test
jackluo923 bd9bdee
Refactored unit test
jackluo923 5b6e29e
Add blank line
jackluo923 597762f
Remove blank line
jackluo923 ff21345
Add blank line
jackluo923 cde2a6d
Remove blank line
jackluo923 d430d61
Refactored code to utilize changes from Extract common MV ser/de logi…
jackluo923 3f68f75
Rebased to latest
jackluo923 27328bf
Rebase to latest
jackluo923 79c6f66
Refactored code to use new class versions.
jackluo923 e9778d3
Fixed style.
jackluo923 b43f676
Refactored MultiValueFixedByteRawIndexCreatorTest.java
jackluo923 c6033b9
Fix style.
jackluo923 3f654e4
Modified existing unit test and extended it for MultiValueFixedByteRa…
jackluo923 12af1ce
Improved unit test for MultiValueFixedByteRawIndexCreatorTest and Mul…
jackluo923 063c5b4
Adjusted comments content
jackluo923 b8794f2
Upgrade MVFixedByteRawIndex reader and writer from V4 to V5, retain f…
jackluo923 9812e3e
Deleted FixByteChunkMVForwardIndexReaderV2
jackluo923 085aed6
Deleted FixByteChunkMVForwardIndexReaderV2Test
jackluo923 7e1d10c
Improved unit test
jackluo923 d06dda4
Refactored unit test
jackluo923 e9835c5
Add blank line
jackluo923 06c0b95
Remove blank line
jackluo923 07d6f75
Add blank line
jackluo923 680fc24
Remove blank line
jackluo923 1b22234
Removed redundant RuntimeException from method signature
jackluo923 cfbc9ee
Merge remote-tracking branch 'origin/master-improved-MV-fixed-byte-in…
jackluo923 0da0ca7
Updated javadoc for VarByteChunkForwardIndexWriterV5
jackluo923 89ec8af
Addressed code review comments to use `getVersion()` in forward index…
jackluo923 592967a
Addressed final minor code review suggestion.
jackluo923 44b0df8
Change getConcreteClassVersion back to getVersion
jackluo923 6fe4517
Adjusted member variable scope in VarByteChunkForwardIndexWriterV4
jackluo923 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
116 changes: 116 additions & 0 deletions
116
...he/pinot/segment/local/segment/creator/impl/fwd/MultiValueFixedByteRawIndexCreatorV2.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
/** | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.apache.pinot.segment.local.segment.creator.impl.fwd; | ||
|
||
import java.io.File; | ||
import java.io.IOException; | ||
import java.nio.ByteBuffer; | ||
import org.apache.pinot.segment.spi.compression.ChunkCompressionType; | ||
import org.apache.pinot.spi.data.FieldSpec.DataType; | ||
|
||
|
||
/** | ||
Same as MultiValueFixedByteRawIndexCreator, but without storing the number of elements for each row. | ||
*/ | ||
public class MultiValueFixedByteRawIndexCreatorV2 extends MultiValueFixedByteRawIndexCreator { | ||
/** | ||
* Create a var-byte raw index creator for the given column | ||
* | ||
* @param baseIndexDir Index directory | ||
* @param compressionType Type of compression to use | ||
* @param column Name of column to index | ||
* @param totalDocs Total number of documents to index | ||
* @param valueType Type of the values | ||
* @param deriveNumDocsPerChunk true if writer should auto-derive the number of rows per chunk | ||
* @param writerVersion writer format version | ||
* @param targetMaxChunkSizeBytes target max chunk size in bytes, applicable only for V4 or when | ||
* deriveNumDocsPerChunk is true | ||
*/ | ||
public MultiValueFixedByteRawIndexCreatorV2(File baseIndexDir, ChunkCompressionType compressionType, String column, | ||
int totalDocs, DataType valueType, int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk, | ||
int writerVersion, int targetMaxChunkSizeBytes, int targetDocsPerChunk) | ||
throws IOException { | ||
super(baseIndexDir, compressionType, column, totalDocs, valueType, maxNumberOfMultiValueElements, | ||
deriveNumDocsPerChunk, writerVersion, targetMaxChunkSizeBytes, targetDocsPerChunk); | ||
} | ||
|
||
public MultiValueFixedByteRawIndexCreatorV2(File indexFile, ChunkCompressionType compressionType, int totalDocs, | ||
DataType valueType, int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk, int writerVersion) | ||
throws IOException { | ||
super(indexFile, compressionType, totalDocs, valueType, maxNumberOfMultiValueElements, deriveNumDocsPerChunk, | ||
writerVersion); | ||
} | ||
|
||
public MultiValueFixedByteRawIndexCreatorV2(File indexFile, ChunkCompressionType compressionType, int totalDocs, | ||
DataType valueType, int maxNumberOfMultiValueElements, boolean deriveNumDocsPerChunk, int writerVersion, | ||
int targetMaxChunkSizeBytes, int targetDocsPerChunk) | ||
throws IOException { | ||
super(indexFile, compressionType, totalDocs, valueType, maxNumberOfMultiValueElements, deriveNumDocsPerChunk, | ||
writerVersion, targetMaxChunkSizeBytes, targetDocsPerChunk); | ||
} | ||
|
||
@Override | ||
protected int computeTotalMaxLength(int maxNumberOfMultiValueElements, DataType valueType) { | ||
return maxNumberOfMultiValueElements * valueType.getStoredType().size(); | ||
} | ||
|
||
@Override | ||
public void putIntMV(int[] values) { | ||
byte[] bytes = new byte[values.length * Integer.BYTES]; | ||
ByteBuffer byteBuffer = ByteBuffer.wrap(bytes); | ||
//write the content of each element | ||
for (int value : values) { | ||
byteBuffer.putInt(value); | ||
} | ||
_indexWriter.putBytes(bytes); | ||
} | ||
|
||
@Override | ||
public void putLongMV(long[] values) { | ||
byte[] bytes = new byte[values.length * Long.BYTES]; | ||
ByteBuffer byteBuffer = ByteBuffer.wrap(bytes); | ||
//write the content of each element | ||
for (long value : values) { | ||
byteBuffer.putLong(value); | ||
} | ||
_indexWriter.putBytes(bytes); | ||
} | ||
|
||
@Override | ||
public void putFloatMV(float[] values) { | ||
byte[] bytes = new byte[values.length * Float.BYTES]; | ||
ByteBuffer byteBuffer = ByteBuffer.wrap(bytes); | ||
//write the content of each element | ||
for (float value : values) { | ||
byteBuffer.putFloat(value); | ||
} | ||
_indexWriter.putBytes(bytes); | ||
} | ||
|
||
@Override | ||
public void putDoubleMV(double[] values) { | ||
byte[] bytes = new byte[values.length * Double.BYTES]; | ||
ByteBuffer byteBuffer = ByteBuffer.wrap(bytes); | ||
//write the content of each element | ||
for (double value : values) { | ||
byteBuffer.putDouble(value); | ||
} | ||
_indexWriter.putBytes(bytes); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
45 changes: 45 additions & 0 deletions
45
...not/segment/local/segment/index/readers/forward/FixedByteChunkMVForwardIndexReaderV2.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
/** | ||
* Licensed to the Apache Software Foundation (ASF) under one | ||
* or more contributor license agreements. See the NOTICE file | ||
* distributed with this work for additional information | ||
* regarding copyright ownership. The ASF licenses this file | ||
* to you under the Apache License, Version 2.0 (the | ||
* "License"); you may not use this file except in compliance | ||
* with the License. You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, | ||
* software distributed under the License is distributed on an | ||
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
* KIND, either express or implied. See the License for the | ||
* specific language governing permissions and limitations | ||
* under the License. | ||
*/ | ||
package org.apache.pinot.segment.local.segment.index.readers.forward; | ||
|
||
import java.nio.ByteBuffer; | ||
import org.apache.pinot.segment.spi.memory.PinotDataBuffer; | ||
import org.apache.pinot.spi.data.FieldSpec.DataType; | ||
|
||
|
||
/** | ||
Same as FixedByteChunkMVForwardIndexReader, but the number of elements for each row is inferred | ||
*/ | ||
public final class FixedByteChunkMVForwardIndexReaderV2 extends FixedByteChunkMVForwardIndexReader { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's call it V5 to be consistent with index version There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done |
||
|
||
public FixedByteChunkMVForwardIndexReaderV2(PinotDataBuffer dataBuffer, DataType storedType) { | ||
super(dataBuffer, storedType); | ||
} | ||
|
||
@Override | ||
public int getNumValuesMV(int docId, ChunkReaderContext context) { | ||
ByteBuffer byteBuffer = slice(docId, context); | ||
jackluo923 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return getNumValuesMV(byteBuffer); | ||
} | ||
|
||
@Override | ||
protected int getNumValuesMV(ByteBuffer byteBuffer) { | ||
return byteBuffer.remaining() / _storedType.size(); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to add a new creator because creator is used to handle creation of different version of forward index. Instead, we want to add a new raw index version
v5
for this new formatThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done