Skip to content

[Implementation] Create Docker image for Remote Vector Index Builder Core library #20

@rchitale7

Description

@rchitale7

Description

This issue tracks the creation of the Remote Vector Index Builder core Docker image, that will be published on the opensearch-staging dockerhub. This image will contain the core tasks needed to construct an index on a hardware accelerator (such as GPUs) fleet, for an Opensearch cluster. The tasks are:

  • create_vectors_dataset - Downloads vector and doc id blobs from a Remote Store repository. The vector and doc id blobs have been previously uploaded by the OpenSearch vector engine.
  • build_index - Creates an index on a hardware accelerator fleet, from the vector and doc id blobs
  • upload_index - Uploads the index to the Remote Store repository. The Opensearch vector engine will then download this index, and store it on disk for subsequent search requests.

The core Docker image does not offer out of the box APIs that directly integrate with the Opensearch vector engine. Instead, the user is responsible for implementing the data and control plane that provide these APIs, and integrate with core under the hood. The vector engine sends build requests to the control plane, which then schedules the requests on data plane workers. For a given build request, the worker gets the request parameters, and calls the core tasks in sequence. One possible implementation for the worker is given here: opensearch-project/k-NN#2545; however, users are free to choose a different implementation.

For now, the core image will only support building faiss indices on GPUs, with the s3 Remote Store. Future enhancements will be tracked in separate issues.

The GitHub CI that publishes the core image to opensearch staging docker hub will use the GPU 2xlarge runners, to reuse the cached layers of the faiss-base image and speed up the builds. However, if are later capacity constrained by integration tests that use the runners, we can utilize the ubuntu-latest default runners for building the core image.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions