Descriptor Memory Usage

Memory Usage of Different VCL Indexing Engines

The user can select different engines to build an index file for descriptors. The indices are used for similarity search (KNN) queries using an input descriptor with D dimensions. Choosing the right engine and the index to use depends on the available system memory, total number of descriptors, the dimension of the descriptor, the overhead for insert and search process, and the target accuracy the user is interested in for that specific use case.

FAISS Engine: The VCL library supports FLAT-L2, FLAT-IP, IVFFLAT-L2, IVFLAT-IP indices
- The FLAT index is the only index that can guarantee exact results (both for L2 and IP metrics). It provides the baseline for results for the other indexes. It does not compress the vectors, and has the maximum memory footprint out of all the indices. For N descriptors, with D dimensions each, the index memory footprint N x D x 4 bytes (float datatype size). For example, for 10,000 embeddings each of dimension 1K, the memory footprint for the FLAT index will be 40MB.
- The IVFFLAT index returns approximate result for similarity search queries. It implements an optimization for search speed by clustering the descriptors (organizes the descriptors into buckets) and return nearest neighbors within the cluster. The nprobe parameter determines the number of clusters searched before the nearest neighbor results are returned (increasing the nprobe will increase the search latency linearly). As for the memory footprint, since IVFFLAT stores the exact descriptors uncompressed (similar to the FLAT index) it will use N x D x 4 bytes to store N descriptors with D dimension each. For example, for 10,000 embeddings each of dimension 1K, the memory footprint for the IVFFLAT index will be 40MB.
FLINNG Engine: The VCL Library supports FLINNG-L2 and FLINNG-IP. FLINNG is an approximate index that returns approximate results for nearest neighbor queries. It does not store the exact descriptors but uses randomized hash-based data-structures to represent the index. FLINNG is attractive for datasets with a large number of dimensions (curse of dimensionality) (e.g., >1000 dimension) and in the case when the total number of descriptors to be used is also very large (millions). FLINNG will have lower accuracy compared to other libraries when the number of dimensions is small. FLINNG uses almost a fixed memory size irrespective of the total number of vectors. The parameter num_rows (number of hash tables used) is an important parameter for the query time. The query time (and similarly indexing time) is linearly proportional to this parameter. The returned query accuracy is also asymptotically increasing with num_rows parameter. The total memory footprint of FLINNG is approximately (2^cells_per_row) x num_rows x 8B + 2^num_hash_tables x 8B + N x (8B). The default parameters of the library is set in src/vcl/DescriptorParams.h and can be set with the function call to create the FLINNG index. The default parameters should support 10M embeddings and should consume around 4G of memory and irrespective of the dimensionality of the descriptors.