|
1 | | -# Backoff Logic when encountering Resource Exhausted |
2 | | - |
3 | | -This only applies when "ignoreExhaustedEvent" is true in the config.yml (default is false). When this value is True, any exhausted resource event will be handled by the healer. |
4 | | - |
5 | | -# The “back-off” logic is as follows: |
6 | | - |
7 | | -- When the exception is received: |
8 | | - - First record in error will perform the following actions: |
9 | | - - Signal the “main” populating task to go into “sleep mode” so that additional records will not be upserted. |
10 | | - - A warning message is logged. |
11 | | - - All records in error will do the following: |
12 | | - - Call “wait for index completion.” |
13 | | - - Once the index is built the following occurs: |
14 | | - - Re-upsert the error records |
15 | | - - If successful, signal the “main” populating task to re-start populating. |
16 | | - - A warning message is logged, stating population has re-started. |
17 | | - |
18 | | -# Environmental Variables |
19 | | - |
20 | | -Below are the Environmental Variables: |
21 | | - |
22 | | -- AVS_LOGLEVEL -- The Vector Client API's Log Level. Defaults "WARNING" |
23 | | - - Possible Values: |
24 | | - - CRITICAL |
25 | | - - FATAL |
26 | | - - ERROR |
27 | | - - WARNING |
28 | | - - WARN |
29 | | - - INFO |
30 | | - - DEBUG |
31 | | - - NOTSET |
32 | | - |
33 | | - Note: The logging file is determined by "APP_LOGFILE" |
34 | | - |
35 | | -- AVS_HOST -- The AVS Server's Address. Defaults to "localhost" |
36 | | -- AVS_PORT -- The AVS Server's Port. Defaults to 5000 |
37 | | -- AVS_USELOADBALANCER -- The AVS Server's Address is a Load Balancer. Default False. |
38 | | -- AVS_NAMESPACE -- The Vector's Namespace. Defaults to "test" |
39 | | -- AVS_SET -- The Vector's Set name. Defaults to "ANN-data" |
40 | | - |
41 | | - This behavior is determined by the "uniqueSetIdxName" argument defined in the config.yml file. |
42 | | - |
43 | | - The default (True) behavior is to create a unique Set name where this is the prefix to that name. |
44 | | - |
45 | | -The name has the following parts: |
46 | | - |
47 | | -``` |
48 | | -{AVS_SET}_{ANN Distance Type}_{AVS Idx Type}_{Dimension}_{hnsw m}_{hnsw ef construction}_{hnsw ef} |
49 | | -``` |
50 | | - |
51 | | -Example: |
52 | | - |
53 | | -ANN-data_angular_COSINE_20_16_100_100 |
54 | | - |
55 | | -If "uniqueSetIdxName" is false, the Set name is as follows: |
56 | | - |
57 | | -``` |
58 | | -{AVS_SET}__{ANN Distance Type}_{AVS Idx Type} |
59 | | -``` |
60 | | - |
61 | | -Example: |
62 | | - |
63 | | -ANN-data_angular_COSINE |
64 | | - |
65 | | -- APP_LOGFILE -- The Aerospike's ANN Logging file. Default is "AerospikeANN.log". |
66 | | - |
67 | | - The folder is always the current working directory. |
68 | | - |
69 | | -- APP_LOGLEVEL -- The Aerospike's ANN Log Level. Defaults "INFO" |
70 | | - - Possible Values: |
71 | | - - CRITICAL |
72 | | - - FATAL |
73 | | - - ERROR |
74 | | - - WARNING |
75 | | - - WARN |
76 | | - - INFO |
77 | | - - DEBUG |
78 | | - - NOTSET |
79 | | - |
80 | | -Note: For performance testing this should be set to "NOTSET". |
81 | | - |
82 | | -When running in a docker container, logging is disabled. |
83 | | - |
84 | | -- APP_DROP_IDX -- A Boolean value that will determine if the Vector index is dropped if it already exists. The default is to use "dropIdx" argument in the config.yml file. |
85 | | -- APP_INDEX_SLEEP -- The amount of time to sleep after the index is dropped. The default is 0. |
86 | | - |
87 | | - Possible values are: |
88 | | - |
89 | | - - 0 -- Don't Sleep |
90 | | - - \< 0 -- The number of seconds to sleep |
91 | | -- APP_POPULATE_TASKS -- The number of concurrent records upserted (put) tasks that are performed during the index population phase. When this number of records are upserted, the app will wait until all upserts are completed and then process the next set of records. The default is 5000. |
92 | | - |
93 | | - Values: |
94 | | - |
95 | | - - \< 0 -- All records are upserted, concurrently, and the app will only wait for the upsert completion before waiting for index completion. |
96 | | - - 0 or 1 -- One record is upserted at a time (sync) |
97 | | - - \> 1 -- The number of records upserted, concurrently (async), before the app waits for the upserts to complete. |
98 | | -- APP_PINGAVS -- Checks to determine if the AVS server is reachable via ping. Default is False. |
99 | | -- APP_CHECKRESULT -- Checks the Vector Search results for failed results or Zero Distance. Default is True |
100 | | - |
101 | | - Note: This value is always false if running in a docker container. |
102 | | - |
103 | | - This should be set to False when conducting performance testing! |
104 | | - |
105 | | -The default bin name for the vectors is always "ANN_embedding". |
106 | | - |
107 | | -# config.yml file |
108 | | - |
109 | | -Using the config.yml file. The Aerospike ANN config.yml file can support the different ANN run group configurations. It is suggested that the Aerospike ANN application is ran using the ANN Distance Type configuration. |
110 | | - |
111 | | -Using this configuration, we can match each ANN distance type (i.e., Angular, Euclidean, Jaccard, etc.) to the "best" Aerospike Vector index type. Below is an example of this configuration with comments regarding the behavior of each parameter: |
112 | | - |
113 | | -``` |
114 | | -float: |
115 | | -#This defines a run group based on the ANN angular datasets. |
116 | | - angular: |
117 | | -#All entries to “run_groups” keyword are required as-is (cannot change the values or structure)! |
118 | | - - base_args: ['@metric', '@dimension'] |
119 | | - constructor: Aerospike |
120 | | - disabled: false #can change to true to disable this run-group |
121 | | - docker_tag: ann-benchmarks-aerospike |
122 | | - module: ann_benchmarks.algorithms.aerospike |
123 | | - name: aerospike |
124 | | - run_groups: |
125 | | - cosine: #Should match Idx Type |
126 | | -#This grouping is reqired |
127 | | - args: [ |
128 | | - [cosine], #Idx Type, any Aerospike Index Type, case insensitive). This is required… |
129 | | -#A collection of HnswParams where each param is ran as a separate ran for this Idx Type. This is required and must have at least one item. |
130 | | - [{m: 8, ef_construction: 64, ef: 8}, |
131 | | - {m: 16, ef_construction: 128, ef: 8} ], |
132 | | -#Unique Set/Index Name (optional, default True). See the “AVS_SET” environment variable above. |
133 | | - [True], |
134 | | -#True to Drop Idx and Re-Populate, optional default true. See “APP_DROP_IDX” environment variable above. |
135 | | - [True], |
136 | | -#Determines what phases are executed. Values are: |
137 | | -# IdxPopulateOnly – only conduct the populate index phase, |
138 | | -# QueryOnly – only perform the vector search phase, |
139 | | -# AllOps – All phases (optional default value) |
140 | | - [AllOps] |
141 | | - ] |
142 | | -#This grouping is required |
143 | | - query_args: [ |
144 | | -# If provided (optional), overrides the HnswParams defined above for the vector search phase |
145 | | - [null, #Uses default defined above |
146 | | - {ef: 10} #Override “ef” above |
147 | | - ] |
148 | | - ] |
149 | | -#This defines another run group based on the ANN Euclidean datasets. |
150 | | -#This show using the required params. |
151 | | - euclidean: |
152 | | - - base_args: ['@metric', '@dimension'] |
153 | | - constructor: Aerospike |
154 | | - disabled: false |
155 | | - docker_tag: ann-benchmarks-aerospike |
156 | | - module: ann_benchmarks.algorithms.aerospike |
157 | | - name: aerospike |
158 | | - run_groups: |
159 | | - SQUARED_EUCLIDEAN: |
160 | | - args: [ |
161 | | - [SQUARED_EUCLIDEAN], #Idx Type |
162 | | - [{m: 16, ef_construction: 100, ef: 100}] |
163 | | - ] |
164 | | - query_args: [ |
165 | | - [] |
166 | | - ] |
167 | | -``` |
| 1 | +# Backoff Logic when encountering Resource Exhausted |
| 2 | + |
| 3 | +This only applies when "ignoreExhaustedEvent" is true in the config.yml (default is false). When this value is True, any exhausted resource event will be handled by the healer. |
| 4 | + |
| 5 | +# The “back-off” logic is as follows: |
| 6 | + |
| 7 | +- When the exception is received: |
| 8 | + - First record in error will perform the following actions: |
| 9 | + - Signal the “main” populating task to go into “sleep mode” so that additional records will not be upserted. |
| 10 | + - A warning message is logged. |
| 11 | + - All records in error will do the following: |
| 12 | + - Call “wait for index completion.” |
| 13 | + - Once the index is built the following occurs: |
| 14 | + - Re-upsert the error records |
| 15 | + - If successful, signal the “main” populating task to re-start populating. |
| 16 | + - A warning message is logged, stating population has re-started. |
| 17 | + |
| 18 | +# Environmental Variables |
| 19 | + |
| 20 | +Below are the Environmental Variables: |
| 21 | + |
| 22 | +- AVS_LOGLEVEL -- The Vector Client API's Log Level. Defaults "WARNING" |
| 23 | + - Possible Values: |
| 24 | + - CRITICAL |
| 25 | + - FATAL |
| 26 | + - ERROR |
| 27 | + - WARNING |
| 28 | + - WARN |
| 29 | + - INFO |
| 30 | + - DEBUG |
| 31 | + - NOTSET |
| 32 | + |
| 33 | + Note: The logging file is determined by "APP_LOGFILE" |
| 34 | + |
| 35 | +- AVS_HOST -- The AVS Server's Address. Defaults to "localhost" |
| 36 | +- AVS_PORT -- The AVS Server's Port. Defaults to 5000 |
| 37 | +- AVS_USELOADBALANCER -- The AVS Server's Address is a Load Balancer. Default False. |
| 38 | +- AVS_NAMESPACE -- The Vector's Namespace. Defaults to "test" |
| 39 | +- AVS_SET -- The Vector's Set name. Defaults to "ANN-data" |
| 40 | + |
| 41 | + This behavior is determined by the "uniqueSetIdxName" argument defined in the config.yml file. |
| 42 | + |
| 43 | + The default (True) behavior is to create a unique Set name where this is the prefix to that name. |
| 44 | + |
| 45 | +The name has the following parts: |
| 46 | + |
| 47 | +``` |
| 48 | +{AVS_SET}_{ANN Distance Type}_{AVS Idx Type}_{Dimension}_{hnsw m}_{hnsw ef construction}_{hnsw ef} |
| 49 | +``` |
| 50 | + |
| 51 | +Example: |
| 52 | + |
| 53 | +ANN-data_angular_COSINE_20_16_100_100 |
| 54 | + |
| 55 | +If "uniqueSetIdxName" is false, the Set name is as follows: |
| 56 | + |
| 57 | +``` |
| 58 | +{AVS_SET}__{ANN Distance Type}_{AVS Idx Type} |
| 59 | +``` |
| 60 | + |
| 61 | +Example: |
| 62 | + |
| 63 | +ANN-data_angular_COSINE |
| 64 | + |
| 65 | +- APP_LOGFILE -- The Aerospike's ANN Logging file. Default is "AerospikeANN.log". |
| 66 | + |
| 67 | + The folder is always the current working directory. |
| 68 | + |
| 69 | +- APP_LOGLEVEL -- The Aerospike's ANN Log Level. Defaults "INFO" |
| 70 | + - Possible Values: |
| 71 | + - CRITICAL |
| 72 | + - FATAL |
| 73 | + - ERROR |
| 74 | + - WARNING |
| 75 | + - WARN |
| 76 | + - INFO |
| 77 | + - DEBUG |
| 78 | + - NOTSET |
| 79 | + |
| 80 | +Note: For performance testing this should be set to "NOTSET". |
| 81 | + |
| 82 | +When running in a docker container, logging is disabled. |
| 83 | + |
| 84 | +- APP_DROP_IDX -- A Boolean value that will determine if the Vector index is dropped if it already exists. The default is to use "dropIdx" argument in the config.yml file. |
| 85 | +- APP_INDEX_SLEEP -- The amount of time to sleep after the index is dropped. The default is 0. |
| 86 | + |
| 87 | + Possible values are: |
| 88 | + |
| 89 | + - 0 -- Don't Sleep |
| 90 | + - \< 0 -- The number of seconds to sleep |
| 91 | +- APP_POPULATE_TASKS -- The number of concurrent records upserted (put) tasks that are performed during the index population phase. When this number of records are upserted, the app will wait until all upserts are completed and then process the next set of records. The default is 5000. |
| 92 | + |
| 93 | + Values: |
| 94 | + |
| 95 | + - \< 0 -- All records are upserted, concurrently, and the app will only wait for the upsert completion before waiting for index completion. |
| 96 | + - 0 or 1 -- One record is upserted at a time (sync) |
| 97 | + - \> 1 -- The number of records upserted, concurrently (async), before the app waits for the upserts to complete. |
| 98 | +- APP_PINGAVS -- Checks to determine if the AVS server is reachable via ping. Default is False. |
| 99 | +- APP_CHECKRESULT -- Checks the Vector Search results for failed results or Zero Distance. Default is True |
| 100 | + |
| 101 | + Note: This value is always false if running in a docker container. |
| 102 | + |
| 103 | + This should be set to False when conducting performance testing! |
| 104 | + |
| 105 | +The default bin name for the vectors is always "ANN_embedding". |
| 106 | + |
| 107 | +# config.yml file |
| 108 | + |
| 109 | +Using the config.yml file. The Aerospike ANN config.yml file can support the different ANN run group configurations. It is suggested that the Aerospike ANN application is ran using the ANN Distance Type configuration. |
| 110 | + |
| 111 | +Using this configuration, we can match each ANN distance type (i.e., Angular, Euclidean, Jaccard, etc.) to the "best" Aerospike Vector index type. Below is an example of this configuration with comments regarding the behavior of each parameter: |
| 112 | + |
| 113 | +``` |
| 114 | +float: |
| 115 | +#This defines a run group based on the ANN angular datasets. |
| 116 | + angular: |
| 117 | +#All entries to “run_groups” keyword are required as-is (cannot change the values or structure)! |
| 118 | + - base_args: ['@metric', '@dimension'] |
| 119 | + constructor: Aerospike |
| 120 | + disabled: false #can change to true to disable this run-group |
| 121 | + docker_tag: ann-benchmarks-aerospike |
| 122 | + module: ann_benchmarks.algorithms.aerospike |
| 123 | + name: aerospike |
| 124 | + run_groups: |
| 125 | + cosine: #Should match Idx Type |
| 126 | +#This grouping is reqired |
| 127 | + args: [ |
| 128 | + [cosine], #Idx Type, any Aerospike Index Type, case insensitive). This is required… |
| 129 | +#A collection of HnswParams where each param is ran as a separate ran for this Idx Type. This is required and must have at least one item. |
| 130 | + [{m: 8, ef_construction: 64, ef: 8}, |
| 131 | + {m: 16, ef_construction: 128, ef: 8} ], |
| 132 | +#Unique Set/Index Name (optional, default True). See the “AVS_SET” environment variable above. |
| 133 | + [True], |
| 134 | +#True to Drop Idx and Re-Populate, optional default true. See “APP_DROP_IDX” environment variable above. |
| 135 | + [True], |
| 136 | +#Determines what phases are executed. Values are: |
| 137 | +# IdxPopulateOnly – only conduct the populate index phase, |
| 138 | +# QueryOnly – only perform the vector search phase, |
| 139 | +# AllOps – All phases (optional default value) |
| 140 | + [AllOps] |
| 141 | + ] |
| 142 | +#This grouping is required |
| 143 | + query_args: [ |
| 144 | +# If provided (optional), overrides the HnswParams defined above for the vector search phase |
| 145 | + [null, #Uses default defined above |
| 146 | + {ef: 10} #Override “ef” above |
| 147 | + ] |
| 148 | + ] |
| 149 | +#This defines another run group based on the ANN Euclidean datasets. |
| 150 | +#This show using the required params. |
| 151 | + euclidean: |
| 152 | + - base_args: ['@metric', '@dimension'] |
| 153 | + constructor: Aerospike |
| 154 | + disabled: false |
| 155 | + docker_tag: ann-benchmarks-aerospike |
| 156 | + module: ann_benchmarks.algorithms.aerospike |
| 157 | + name: aerospike |
| 158 | + run_groups: |
| 159 | + SQUARED_EUCLIDEAN: |
| 160 | + args: [ |
| 161 | + [SQUARED_EUCLIDEAN], #Idx Type |
| 162 | + [{m: 16, ef_construction: 100, ef: 100}] |
| 163 | + ] |
| 164 | + query_args: [ |
| 165 | + [] |
| 166 | + ] |
| 167 | +``` |
| 168 | + |
| 169 | +# HDF5 Dataset Additional Attributes |
| 170 | + |
| 171 | +The following attributes are added in the resulting ANN HDF5 dataset (note that all added attributes are prefixed with "as_"): |
| 172 | + |
| 173 | +- as_indockercontainer – True if this run was within a docker container. False if it was ran natively |
| 174 | +- as_idx_name – The name of the index |
| 175 | +- as_idx_type – The Aerospike Vector Index Type |
| 176 | +- as_idx_binname – The Index’s Bin name |
| 177 | +- as_idx_hnswparams – The index’s “hnsw” parameters as passed into this run via the config file. Any missing or None values will use the default values defined by Aerospike Vector client/server. |
| 178 | +- as_idx_drop – True if the index will be dropped |
| 179 | +- as_idx_ignoreexhuseevents – True to ignore any “Exhausted Resource” errors and the Aerospike Vector Healer will be used to reconcile the index. If false, internal “back-off” logic is ued. |
| 180 | +- as_idx_definition_built - Only available when the database is populated. The actual Vector Index's definitions with default values. |
| 181 | +- as_actions – The actions performed in this run (e.g., All actions, Populate Index Only, Query Only, etc.) |
| 182 | +- as_host – The Aerospike Vector server |
| 183 | +- as_isloadbalancer – If present, the as_host is a load balancer |
| 184 | +- as_namespace – The Aerospike Namespace used for the tun |
| 185 | +- as_set – The Aerospike Set name used for the run |
| 186 | +- as_train_shape - The dimensions of the training dataset which is used to populate the database. |
| 187 | +- as_query_hnswsearchparams – The Query’s “hnsw” parameters as passed into this run via the config file. Any missing or None values will use the default values defined by Aerospike Vector client/server. |
| 188 | +- as_query_checkresults – If true the query results are checked/validated. This should be false for timing runs. |
| 189 | +- as_query_no_result_cnt - The number of queries that returned empty results. Only available if the query check results are true. |
| 190 | +- as_query_no_neighbors_fnd - The number of queries that returned no neighbors. Only available if the query check results are true. |
| 191 | +- as_upserted_vectors – The number of vectors inserted |
| 192 | +- as_upserted_time_secs – The amount of time to perform all the inserts in seconds. This doesn’t include index build completion. |
| 193 | +- as_idx_completion_secs – The number of seconds to complete the index build. Does not include inset time. |
| 194 | +- as_total_polulation_time_secs – The complete time to insert and build the index. |
0 commit comments