-
Notifications
You must be signed in to change notification settings - Fork 33
Descriptor Memory Usage
The user can select different engines to build an index file for descriptors. The indices are used for similarity search (KNN) queries using an input descriptor with D
dimensions. Choosing the right engine and the index to use depends on the available system memory, total number of descriptors, the dimension of the descriptor, the overhead for insert and search process, and the target accuracy the user is interested in for that specific use case.
-
FAISS Engine: The VCL library supports
FLAT-L2
,FLAT-IP
,IVFFLAT-L2
,IVFLAT-IP
indices-
The
FLAT
index is the only index that can guarantee exact results (both forL2
andIP
metrics). It provides the baseline for results for the other indexes. It does not compress the vectors, and has the maximum memory footprint out of all the indices. ForN
descriptors, withD
dimensions each, the index memory footprintN x D x 4
bytes (float datatype size). For example, for 10,000 embeddings each of dimension 1K, the memory footprint for theFLAT
index will be 40MB. -
The
IVFFLAT
index returns approximate result for similarity search queries. It implements an optimization for search speed by clustering the descriptors (organizes the descriptors into buckets) and return nearest neighbors within the cluster. Thenprobe
parameter determines the number of clusters searched before the nearest neighbor results are returned (increasing thenprobe
will increase the search latency linearly). As for the memory footprint, sinceIVFFLAT
stores the exact descriptors uncompressed (similar to theFLAT
index) it will useN x D x 4
bytes to storeN
descriptors withD
dimension each. For example, for 10,000 embeddings each of dimension 1K, the memory footprint for theIVFFLAT
index will be 40MB.
-
-
FLINNG Engine: The VCL Library supports
FLINNG-L2
andFLINNG-IP
. FLINNG is an approximate index that returns approximate results for nearest neighbor queries. It does not store the exact descriptors but uses randomized hash-based data-structures to represent the index. FLINNG is attractive for datasets with a large number of dimensions (curse of dimensionality) (e.g., >1000 dimension) and in the case when the total number of descriptors to be used is also very large (millions). FLINNG will have lower accuracy compared to other libraries when the number of dimensions is small. FLINNG uses almost a fixed memory size irrespective of the total number of vectors. The parameternum_rows
(number of hash tables used) is an important parameter for the query time. The query time (and similarly indexing time) is linearly proportional to this parameter. The returned query accuracy is also asymptotically increasing withnum_rows
parameter. The total memory footprint of FLINNG is approximately(2^cells_per_row) x num_rows x 8B + 2^num_hash_tables x 8B + N x (8B)
. The default parameters of the library is set insrc/vcl/DescriptorParams.h
and can be set with the function call to create the FLINNG index. The default parameters should support 10M embeddings and should consume around 4G of memory and irrespective of the dimensionality of the descriptors.
Visual Data Management System - Intel Labs
FLINNG Library and Performance
Basic Building Blocks
Insert
- AddBlob
- AddBoundingBox
- AddConnection
- AddDescriptor
- AddDescriptorSet
- AddEntity
- AddImage
- AddVideo
- NeoAdd
Query
- ClassifyDescriptor
- FindBlob
- FindBoundingBox
- FindConnection
- FindDescriptor
- FindDescriptorSet
- FindEntity
- FindFrames
- FindImage
- FindVideo
- NeoFind
Update