Skip to content

Commit a01ccd9

Browse files
authored
doc: add MkDocs documentation (#94)
1 parent 2a9bb86 commit a01ccd9

8 files changed

+211
-0
lines changed

docs/about.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# About

docs/explanation.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Explanation

docs/howto.md

+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# How-to Guides
2+
3+
## Installation
4+
5+
tantivy-py can be installed using from [pypi](pypi.org) using pip:
6+
7+
pip install tantivy
8+
9+
If no binary wheel is present for your operating system the bindings will be
10+
build from source, this means that Rust needs to be installed before building
11+
can succeed.
12+
13+
Note that the bindings are using [PyO3](https://github.com/PyO3/pyo3), which
14+
only supports python3.
15+
16+
## Set up a development environment to work on tantivy-py itself
17+
18+
Setting up a development environment can be done in a virtual environment using
19+
[`nox`](https://nox.thea.codes) or using local packages using the provided `Makefile`.
20+
21+
For the `nox` setup install the virtual environment and build the bindings using:
22+
23+
python3 -m pip install nox
24+
nox
25+
26+
For the `Makefile` based setup run:
27+
28+
make
29+
30+
Running the tests is done using:
31+
32+
make test
33+
34+
## Working on tantivy-py documentation
35+
36+
Please be aware that this documentation is structured using the [Diátaxis](https://diataxis.fr/) framework. In very simple terms, this framework will suggest the correct location for different kinds of documentation. Please make sure you gain a basic understanding of the goals of the framework before making large pull requests with new documentation.
37+
38+
This documentation uses the [MkDocs](https://mkdocs.readthedocs.io/en/stable/) framework. This package is specified as an optional dependency in the `pyproject.toml` file. To install all optional dev dependencies into your virtual env, run the following command:
39+
40+
pip install .[dev]
41+
42+
The [MkDocs](https://mkdocs.readthedocs.io/en/stable/) documentation itself is comprehensive. MkDocs provides some additional context and help around [writing with markdown](https://mkdocs.readthedocs.io/en/stable/user-guide/writing-your-docs/#writing-with-markdown).
43+
44+
If all you want to do is make a few edits right away, the documentation content is in the `/docs` directory and consists of [Markdown](https://www.markdownguide.org/) files, which can be edited with any text editor.
45+
46+
The most efficient way to work is to run a MkDocs livereload server in the background. This will launch a local web server on your dev machine, serve the docs (by default at `http://localhost:8000`), and automatically reload the page after you save any changes to the documentation files.

docs/index.md

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Welcome to tantivy-py
2+
3+
tantivy-py is a wrapper for the [tantivy](https://github.com/quickwit-oss/tantivy) full-text search engine, which is inspired by Apache Lucene.
4+
5+
tantivy-py is [licensed](https://github.com/quickwit-oss/tantivy-py/blob/master/LICENSE) under the [MIT License](https://www.tldrlegal.com/license/mit-license).
6+
7+
## Important links
8+
9+
- [tantivy-py code repository](https://github.com/quickwit-oss/tantivy-py)
10+
- [tantivy code repository](https://github.com/quickwit-oss/tantivy)
11+
- [tantivy Documentation](https://docs.rs/crate/tantivy/latest)
12+
- [tantivy query language](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html#method.parse_query)
13+
14+
## How to use this documentation
15+
16+
This documentation uses the [Diátaxis](https://diataxis.fr/) framework. The following sections are clearly separated:
17+
18+
- [Tutorials](tutorials.md): when you want to learn
19+
- [How-to Guides](howto.md): when need to accomplish a task
20+
- [Explanation](howto.md): when you need a broader understanding and the thinking behind why certain things are set up in a particular way.
21+
- [Reference](reference.md): when you need precise, detailed information
22+

docs/reference.md

+38
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Reference
2+
3+
## Valid Query Formats
4+
5+
tantivy-py supports the [query language](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html#method.parse_query) used in tantivy.
6+
Below a few basic query formats are shown:
7+
8+
- AND and OR conjunctions.
9+
```python
10+
query = index.parse_query('(Old AND Man) OR Stream', ["title", "body"])
11+
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
12+
best_doc = searcher.doc(best_doc_address)
13+
```
14+
15+
- +(includes) and -(excludes) operators.
16+
```python
17+
query = index.parse_query('+Old +Man chef -fished', ["title", "body"])
18+
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
19+
best_doc = searcher.doc(best_doc_address)
20+
```
21+
Note: in a query like above, a word with no +/- acts like an OR.
22+
23+
- phrase search.
24+
```python
25+
query = index.parse_query('"eighty-four days"', ["title", "body"])
26+
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
27+
best_doc = searcher.doc(best_doc_address)
28+
```
29+
30+
- integer search
31+
```python
32+
query = index.parse_query('"eighty-four days"', ["doc_id"])
33+
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
34+
best_doc = searcher.doc(best_doc_address)
35+
```
36+
Note: for integer search, the integer field should be indexed.
37+
38+
For more possible query formats and possible query options, see [Tantivy Query Parser Docs.](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html)

docs/tutorials.md

+82
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Tutorials
2+
3+
## Building an index and populating it
4+
5+
```python
6+
import tantivy
7+
8+
# Declaring our schema.
9+
schema_builder = tantivy.SchemaBuilder()
10+
schema_builder.add_text_field("title", stored=True)
11+
schema_builder.add_text_field("body", stored=True)
12+
schema_builder.add_integer_field("doc_id",stored=True)
13+
schema = schema_builder.build()
14+
15+
# Creating our index (in memory)
16+
index = tantivy.Index(schema)
17+
```
18+
19+
To have a persistent index, use the path
20+
parameter to store the index on the disk, e.g:
21+
22+
```python
23+
index = tantivy.Index(schema, path=os.getcwd() + '/index')
24+
```
25+
26+
By default, tantivy offers the following tokenizers
27+
which can be used in tantivy-py:
28+
- `default`
29+
`default` is the tokenizer that will be used if you do not
30+
assign a specific tokenizer to your text field.
31+
It will chop your text on punctuation and whitespaces,
32+
removes tokens that are longer than 40 chars, and lowercase your text.
33+
34+
- `raw`
35+
Does not actual tokenizer your text. It keeps it entirely unprocessed.
36+
It can be useful to index uuids, or urls for instance.
37+
38+
- `en_stem`
39+
40+
In addition to what `default` does, the `en_stem` tokenizer also
41+
apply stemming to your tokens. Stemming consists in trimming words to
42+
remove their inflection. This tokenizer is slower than the default one,
43+
but is recommended to improve recall.
44+
45+
to use the above tokenizers, simply provide them as a parameter to `add_text_field`. e.g.
46+
```python
47+
schema_builder.add_text_field("body", stored=True, tokenizer_name='en_stem')
48+
```
49+
50+
## Adding one document.
51+
52+
```python
53+
writer = index.writer()
54+
writer.add_document(tantivy.Document(
55+
doc_id=1,
56+
title=["The Old Man and the Sea"],
57+
body=["""He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish."""],
58+
))
59+
# ... and committing
60+
writer.commit()
61+
```
62+
63+
## Building and Executing Queries
64+
65+
First you need to get a searcher for the index
66+
67+
```python
68+
# Reload the index to ensure it points to the last commit.
69+
index.reload()
70+
searcher = index.searcher()
71+
```
72+
73+
Then you need to get a valid query object by parsing your query on the index.
74+
75+
```python
76+
query = index.parse_query("fish days", ["title", "body"])
77+
(best_score, best_doc_address) = searcher.search(query, 3).hits[0]
78+
best_doc = searcher.doc(best_doc_address)
79+
assert best_doc["title"] == ["The Old Man and the Sea"]
80+
print(best_doc)
81+
```
82+

mkdocs.yml

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
site_name: tantivy-py
2+
# site_url: https://example.com
3+
nav:
4+
- Home: index.md
5+
- Tutorials: tutorials.md
6+
- How-to Guides: howto.md
7+
- Explanation: explanation.md
8+
- Reference: reference.md
9+
- About: about.md
10+
theme: readthedocs
11+
12+
# Can nest documents under above sections
13+
# - 'User Guide':
14+
# - 'Writing your docs': 'writing-your-docs.md'
15+
# - 'Styling your docs': 'styling-your-docs.md'

pyproject.toml

+6
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,11 @@ build-backend = "maturin"
66
name = "tantivy"
77
requires-python = ">=3.7"
88

9+
[project.optional-dependencies]
10+
dev = [
11+
"nox",
12+
"mkdocs",
13+
]
14+
915
[tool.maturin]
1016
bindings = "pyo3"

0 commit comments

Comments
 (0)