Skip to content

Commit 2c49c47

Browse files
authored
Merge 'add some docs for index method' from Nikita Sivukhin
Reviewed-by: Jussi Saurio <[email protected]> Closes #3909
2 parents 2bf5eb8 + 94a39ce commit 2c49c47

File tree

1 file changed

+70
-0
lines changed

1 file changed

+70
-0
lines changed

docs/manual.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ Welcome to Turso database manual!
4242
- [Encryption](#encryption)
4343
- [Vector search](#vector-search)
4444
- [CDC](#cdc-early-preview)
45+
- [Index Method](#index-method-experimental)
4546
- [Appendix A: Turso Internals](#appendix-a-turso-internals)
4647
- [Frontend](#frontend)
4748
- [Parser](#parser)
@@ -878,6 +879,75 @@ turso>
878879
```
879880

880881
If you modify your table schema (adding/dropping columns), the `table_columns_json_array()` function returns the current schema, not the historical one. This can lead to incorrect results when decoding older CDC records. Manually track schema versions by storing the output of `table_columns_json_array()` before making schema changes.
882+
883+
## Index Method (Experimental)
884+
885+
`tursodb` allows developers to implement custom data access methods and integrate them seamlessly with the query planner. This feature is conceptually similar to [VTable](https://www.sqlite.org/vtab.html) but provides greater flexibility and automatic query planner integration. The feature is experimental and currently gated behind the `--experimental-index-method` flag.
886+
887+
### DDL
888+
889+
Index Methods can be created using standard `CREATE INDEX` statements by specifying a custom module name:
890+
891+
```sql
892+
CREATE INDEX t_idx ON t USING index_method_name (column1, column2);
893+
```
894+
895+
Index Methods can also include optional parameters whose values may be numeric, floating-point, string, or blob literals:
896+
897+
```sql
898+
CREATE INDEX t_idx ON t USING index_method_name (c) WITH (a = 1, b = 1.2, c = 'text', d = x'deadbeef');
899+
```
900+
901+
To remove an index, use the standard `DROP INDEX t_idx` statement.
902+
903+
### DML
904+
905+
Data modification operations for Index Methods are executed implicitly for every modification of the base table (similarly to native B-tree indices):
906+
907+
1. Each `INSERT` operation on the table executes an `IdxInsert` opcode for the Index Method, passing the relevant column values and the `rowid` of the inserted row.
908+
2. Each `DELETE` operation executes an `IdxDelete` opcode with the corresponding column values and the deleted row's `rowid`.
909+
3. Each `UPDATE` operation is internally translated into a pair of `DELETE` + `INSERT` operations.
910+
911+
### DQL
912+
913+
At present, Index Methods can only be used implicitly if the query planner decides to apply them. This decision depends on whether the query matches one of the suitable patterns provided by the Index Method implementation. If parts of a query align with a registered pattern, the planner may substitute default table access method with the Index Method.
914+
915+
For example, an Index Method can define the following query pattern:
916+
917+
```sql
918+
SELECT vector_distance_jaccard(embedding, ?) AS distance FROM documents ORDER BY distance LIMIT ?;
919+
```
920+
921+
This pattern describes the shape of the output (a single `distance` column), the parameter placeholders (query embedding and limit), and the type of query it can optimize (an ordered retrieval by distance).
922+
923+
The planner can match this pattern against a user query like:
924+
925+
```sql
926+
SELECT id, content, created_at FROM documents ORDER BY vector_distance_jaccard(embedding, ?) LIMIT 10;
927+
```
928+
929+
Because the query is a *superset* of the pattern, the planner can safely apply the Index Method, enriching its output (`distance`) with data from the main table (`id`, `content`, `created_at`), using the `rowid` provided by each row from the Index Method.
930+
931+
The query planner is conservative and will avoid using an Index Method if doing so would alter the query's semantics. Consider:
932+
933+
```sql
934+
SELECT id, content, created_at FROM documents WHERE user = ? ORDER BY vector_distance_jaccard(embedding, ?) LIMIT 10;
935+
```
936+
937+
The additional filter `WHERE user = ?` does not fit the Index Method's query pattern, so the planner correctly falls back to the default plan.
938+
939+
### Internals
940+
941+
Each Index Method consists of three traits that work together (for details, see the index method module [root](../core/index_method/mod.rs)):
942+
943+
* **`IndexMethod`** — the root trait for all Index Methods, responsible for creating `IndexMethodAttachment` instances for a given table.
944+
* **`IndexMethodAttachment`** — represents an Index Method instance bound to a specific table. It can create cursors for query execution and defines the metadata needed for integration with the query planner.
945+
* **`IndexMethodCursor`** — provides methods for accessing and updating data, as well as for managing the underlying storage during `CREATE INDEX` and `DROP INDEX` operations.
946+
947+
While Index Methods can implement arbitrary logic internally, it's generally recommended to use a B-tree as the underlying storage mechanism. To support this, `tursodb` provides a special `backing_btree` Index Method that other Index Methods can use to create auxiliary tables for storing supporting data.
948+
949+
For more details, see [`toy_vector_sparse_ivf`](../core/index_method/toy_vector_sparse_ivf.rs) implementation.
950+
881951
## Appendix A: Turso Internals
882952

883953
Turso's architecture resembles SQLite's but differs primarily in its

0 commit comments

Comments
 (0)