| Type | Description |
|---|---|
| boolean | boolean values |
| number | any type of numbers. int, float .. |
| string | any type of texts |
| compressed_string | compressed text, queryable only for equality |
| datetime | datetime.datetime objects |
| blob | files as bytes, queryable for equality and sorted on size |
| json | any type of json dump-able objects |
| normalized_embedding | float32 normalized numpy array, queryable for equality, sorting or scoring with similarity to a query embedding |
| other | any objects. stored as pickled blobs internally, queryable for equality and sorted on size |
- A schema has to be specified at first initialisation of the index and cannot be modified later on
- data is accessed or updated as a dict of the format {id: record} where record is a dict of format following schema and id is a string
- keys of record /schema can be anything that can be keys in a python dict. eg:
schema_1 = {0: "string", "a": "number"} - An in-memory index is created by default, i.e: no
db_pathis specified, and cannot be accessed from other processes and threads - If
db_pathis specified, disk-based index is initiated which is accessible from all processes, threads and is persistent Noneis allowed as a value for all keys and is the default value for all keys
- Initialize
- Insert/Update
- Get
- Drop or clear Index
- Search
- Distinct
- Group by
- Pop
- Delete
- Count
- Optimize for Search
- Math
- Trigger
- Vaccum
- Export
params
name: name of the index,no default, has to be specified, cannot begin with__schema: schema of the index,defaults to None, has to be specified if index does not exist, cannot be modified later ondb_path: path to the index file.defaults to None, in-memory index is createdram_cache_mb: size of the ram cache in MB.defaults to 64compression_level: compression level for strings, blobs etcdefaults to -1, None for no compression
example use
from liteindex import DefinedIndex, DefinedTypes
schema = {
"name": DefinedTypes.string,
"age": DefinedTypes.number,
"password": DefinedTypes.string,
"verified": DefinedTypes.boolean,
"birthday": DefinedTypes.datetime,
"profile_picture": DefinedTypes.blob,
"nicknames": DefinedTypes.json,
"user_embedding": DefinedTypes.normalized_embedding,
"user_bio": DefinedTypes.compressed_string,
}
index = DefinedIndex(
name="user_details",
schema=schema,
db_path="./test.liteindex"
)- Insert or update single or multiple, partial or full records at once
- Is atomic operation
- input format:
{id: record, id1: record, ....}
params
data: dict of format{id1: record1, id2: partial_record2, ....}return: None
example use
index.update(
{
"john_doe": {
"name": "John Doe",
"age": 25,
"password": "password",
"verified": True,
"birthday": datetime.datetime(1995, 1, 1),
"profile_picture": b"......",
"nicknames": ["John", "Doe"],
"user_embedding": np.array([1, 2, 3]),
"user_bio": "This is a long string that will be compressed and stored"
},
"jane_doe": {
"name": "Jane Doe",
"age": 28,
"verified": True,
}
}
)- accepts a single id or a list of ids
- always returns a dict of format
{id: record, id1: record, ....}
params
ids: key or list of keys to get,no defaultselect_keys: list of keys to include in the returned record.defaults to None- selects all keysreturn: dict of format{id: record, id1: record, ....}
example use
index.get("john_doe")
# {"john_doe": record_for_john_doe}
index.get(["john_doe", "jane_doe"])
# {"john_doe": record_for_john_doe, "jane_doe": record_for_jane_doe}- can delete by single id or list of ids or by query
params
ids: key or list of keys to delete,no defaultquery: query dictionary to delete records matching the query.defaults to None- deletes no recordreturn: None
# delete a single record or multiple records or by query
# returns dict of format {key: record} of deleted records
index.delete("john_doe")
index.delete(["john_doe", "jane_doe"])
index.delete(query={"name": "John Doe"})- no params
# clear the index
index.clear()
# drop/ delete the index completely
index.drop()-
query: Query dictionary. Defaults to{}which will return all records. Full list of queries supported -
sort_by: A key from schema.Defaults to Nonewhich will return records in insertion order. ifsort_byis a key of type normalized_embedding, a np array has to be provided insort_by_embeddingto sort by similarity to this array, scores will be returned in__metakey of the record -
reversed_sort: Defaults toFalse. IfTrue, will return records in reverse order. -
n: Defaults toNonewhich will return all records. -
page_no: Defaults to1which will return the first page of n records. -
offset: Defaults to0which will return the first page of n records. page_no or offset can be used, not both. -
select_keys: A list of keys from schema. Defaults toNonewhich will return all keys. -
update: Optional dictionary of format{key: record}. If provided, will update the records in the index that match the query and return the updated records. -
return_metadata: -
metadata_key_name: defaults to__metaunder this key will be a dict with {"integer_id": unique_integer_id, "updated_at": last_update_at time from epoch, "score": if doing embedding sort} -
sort_by_embedding: ifsort_byis a key of type normalized_embedding, a np array has to be provided here to sort by similarity to this array -
return: dict of format{id: record, id1: record, ....}
Full list of queries supported
index.search()
# Returns {id_1: record_1, id_2: record_2, ...}
# Each record is a dict of format following schema, as inserted.
# By default ordered by insertion order, and returns all records.
index.search(
query={"name": "John Doe"},
sort_by="age",
reversed_sort=True,
n=10,
page_no=1,
select_keys=["name", "age"],
update={"verified": True}
)
index.search(
query={"name": "John Doe"},
sort_by="user_embedding",
sort_by_embedding=np.array([1, 2, 3]), # should be normalized and same size as user_embedding
reversed_sort=True,
n=10,
page_no=1,
select_keys=["name", "age"],
update={"verified": True}
)params
query: Query dictionary.Defaults to {}which will return count of all records.return: int or None
example use
index.count()
index.count({"name": "Joe Biden"})params
key: key from schema to get distinct values for,no defaultquery: query dictionary to get distinct values for records matching the query.defaults to {}- gets distinct values for all recordsreturn: set of distinct values
example use
index.distinct("name")
index.distinct("name", query={"gender": "female"})params
keys: single key or list of keys to group by,no defaultquery: query dictionary to group records matching the query.defaults to {}- groups all records
index.group()- Optimizes the index for search on a key *** params ***
key: key from schema to optimize for search,no defaultis_unique:defaults to False, if True, will not allow duplicate values for the key
index.optimize_for_query(key="name", is_unique=True)*** params ***
return: {key: {"is_unique": bool}}
list_optimized_keys()