Skip to content

Create Dataclasses and Builder for GPU Index Build Config #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 24, 2025

Conversation

Rajrahane
Copy link
Member

@Rajrahane Rajrahane commented Feb 28, 2025

Description

Implements

  1. Base Interfaces for the IndexBuildService orchestrator and its workflow tasks - GPUIndexBuildService, GPUToCPUIndexConverter, CPUIndexWriter
  2. Base Response and Config models for GPU and CPU Index.
  3. Concrete Faiss specific data models. Response models for Response for GPU and CPU Index.
  4. Faiss Index Build Service build_index method to create a GPU Index, transform to CPU index and write CPU Index to local disc filepath, deletes the indexes after writing to file.
  5. Utils to create the params required for GPUIndexConfig from the API IndexBuildParameters.
Screenshot 2025-03-21 at 11 28 31 AM

Issues Resolved

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@Rajrahane
Copy link
Member Author

Still a WIP
Here's how you can initialize the config-

def create_index_config(**kwargs) -> GPUIndexBuildConfig:
builder = IndexConfigBuilder()
director = IndexConfigDirector(builder)
return director.construct_config(kwargs)

print(create_index_config(metric='cosinesimil', gpu_config={'graph_build_algo': 'NN_DECENT', 'ivf_pq_build_params': {'n_lists': 1040}}))

@Rajrahane Rajrahane marked this pull request as ready for review March 3, 2025 20:21
@Rajrahane Rajrahane force-pushed the build_gpu_index branch 5 times, most recently from fffcd66 to 66aa62a Compare March 5, 2025 23:03
@rchitale7
Copy link
Member

@Rajrahane For general feedback - I feel that the index_config_builder.py and index_config_director.py classes are unnecessary, and can be consolidated into faiss_index_builder.py and faiss_index_config_builder.py. It will help make the code more understandable.

@Rajrahane
Copy link
Member Author

I've tried consolidating creation of the GPUIndexBuildConfig to a simpler one function, rather than the elaborate index_config_builder and index_config_director as suggest by Rohan

The issue is that it was Difficult to parse a complex multilevel dict with
GPUIndexBuildConfig
-> GPUIndexCagraConfig ( with IVFPQBuild and IVFPQSearch configs)
-> IndexHNSWConfig

Because the dict sent is complex with missing fields it threw a lot of NULL issues,
it required me to create the inner objects manually using helpers.
Eg: ivf_pq_build_config = (
IVFPQBuildCagraConfig(**ivf_pq_build_params)
if ivf_pq_build_params
else IVFPQBuildCagraConfig()
)

And i used the builder pattern in a different filejust to make it more structured, else the single function would be very large.

Also had to maintain the single responsibility of the main function build_gpu_index of the FaissIndexBuilder.py file to only call all the steps ->

  1. build the core dataclass from dict
  2. build the faiss binding config version from the dataclass
  3. create the gpu index
  4. create the cpu index and write to disk

@Rajrahane
Copy link
Member Author

Also the problem to solve in a gist
Goal is to completely configure the GPUIndexConfig from the IndexBuildParameters and convert it into Faiss IndexConfig bindings

before feeding it to the _create_gpu_index and _create_and_write_cpu_index_to_file helper methods

@Rajrahane Rajrahane force-pushed the build_gpu_index branch 2 times, most recently from a143bfd to 9a2cb6b Compare March 18, 2025 17:21
Signed-off-by: Rajvaibhav Rahane <[email protected]>
@Rajrahane Rajrahane marked this pull request as draft March 18, 2025 18:13
@Rajrahane Rajrahane force-pushed the build_gpu_index branch 3 times, most recently from b98d948 to ae5d420 Compare March 18, 2025 22:09
Signed-off-by: Rajvaibhav Rahane <[email protected]>
@Rajrahane Rajrahane marked this pull request as ready for review March 21, 2025 18:09
Signed-off-by: Rajvaibhav Rahane <[email protected]>
@Rajrahane
Copy link
Member Author

Screenshot 2025-03-21 at 11 28 31 AM

Updated Design

rchitale7
rchitale7 previously approved these changes Mar 21, 2025
Copy link
Collaborator

@jed326 jed326 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Rajrahane , this PR mostly LGTM, just a few spots of discussion/nits

Signed-off-by: Rajvaibhav Rahane <[email protected]>
@Rajrahane Rajrahane merged commit 93a4558 into opensearch-project:main Mar 24, 2025
4 checks passed
@Rajrahane Rajrahane deleted the build_gpu_index branch June 3, 2025 18:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants