Skip to content

Infrastructure for Sanskrit software. For Python bindings, see `vidyut-py`.

Notifications You must be signed in to change notification settings

ambuda-org/vidyut

Repository files navigation

विद्युत्

Vidyut provides reliable infrastructure for Sanskrit software.

Vidyut compiles to fast and efficient native code, and it can be bound to other programming languages with minimal work. We provide first-class support for Python and are eager to support other bindings as well.

Vidyut is under active development as part of the Ambuda project.

License: MIT

Build status

Contents

Installation

Vidyut is implemented in Rust, which provides low-level control with high-level ergonomics. We also provide first-class support for Python bindings through the [vidyut][vidyut-py] Python package. This section describes how to use Vidyut either through Rust or through Python.

Through Rust

First, install Rust on your computer by following the instructions here.

Once you've done so, create a new project with cargo new and install Vidyut's packages:

cargo add vidyut-prakriya
cargo add vidyut-kosha
cargo add vidyut-lipi
# ... and so on

You can also install directly from this repository:

cargo add vidyut-prakriya --git https://github.com/ambuda-org/vidyut.git
cargo add vidyut-kosha --git https://github.com/ambuda-org/vidyut.git
cargo add vidyut-lipi --git https://github.com/ambuda-org/vidyut.git
# ... and so on

We recommend using our pre-built linguistic data, which is available as a ZIP file here.

For more information, see our Rust documentation.

Through Python

First, install Python on your computer. There are many ways to do so, but we recommend installing uv then running uv init my-project to create a Python project.

Once your setup is ready, you can install the vidyut package:

# With Pip
$ pip install vidyut

# With uv
$ uv add vidyut

You can also install directly from this repository. Doing so compiles the repository from scratch and might take several minutes, so we strongly suggest using our latest PyPI release instead.

# The command is very slow, so pass `--verbose` to monitor its status.
pip install -e "git+https://github.com/ambuda-org/vidyut.git#egg=vidyut&subdirectory=bindings-python" --verbose

We recommend using our pre-built linguistic data, which is available as a ZIP file here.

For more information, see our [Python documentation][rtd].

Building from source

Building from source lets you work with Vidyut as a developer and contributor.

Through Rust

(This setup requires cargo. Confirm that you have cargo installed by running cargo --version.)

Once you download the repo, you can run cargo test --all to run unit tests.

$ git clone https://github.com/ambuda-org/vidyut.git
$ cd vidyut
$ cargo test --all

(If you [install cargo-nextest][nextest], you can also run make test for a nicer testing experience.)

Your first build will likely take a few minutes, but future builds will be much faster.

We recommend using our pre-built linguistic data, which is available as a ZIP file here. Or if you prefer, you can build this data for yourself:

$ cd vidyut-data
$ make create_all_data

Output will be written to data/build/vidyut-latest.

NOTE: this command is resource-intensive and might stall on slower machines.

Through Python

(This setup requires uv. Confirm that you have uv installed by running uv --version.)

Once you download the repo, you can run make test in the bindings-python directory to run Python-specific unit tests:

$ git clone https://github.com/ambuda-org/vidyut.git
$ cd vidyut/bindings-python
$ make test

make test uses a development build, which compiles more quickly but has worse runtime performance. To create a release build instead, run make release.

Components

Vidyut contains several standard components for common Sanskrit processing tasks. These components work together well, but you can also use them independently depending on your use case.

In Rust, components of this kind are called crates.

vidyut-chandas identifies the meter in some piece of Sanskrit text. This crate is experimental, and while it is useful for common and basic use cases, it is not a state-of-the-art solution.

For details, see the vidyut-chandas README.

vidyut-cheda segments Sanskrit expressions into words then annotates those words with their morphological data. Our segmenter is optimized for real-time and interactive usage: it is fast, low-memory, and capably handles pathological input.

For details, see the vidyut-cheda README.

vidyut-kosha defines a key-value store that can compactly map tens of millions of Sanskrit words to their inflectional data. Depending on the application, storage costs can be as low as 1 byte per word. This storage efficiency comes at the cost of increased lookup time, but in practice, we have found that this increase is negligible and well worth the efficiency gains elsewhere.

For details, see the vidyut-kosha README.

vidyut-lipi is a transliteration library for Sanskrit and Pali that also supports many of the scripts used within the Indosphere. Our goal is to provide a standard transliterator that is easy to bind to other programming languages.

For details, see the vidyut-lipi README.

vidyut-prakriya generates Sanskrit words with their prakriyās (derivations) according to the rules of Paninian grammar. Our long-term goal is to provide a complete implementation of the Ashtadhyayi.

For details, see the vidyut-prakriya README.

vidyut-sandhi contains various utilities for working with sandhi changes between words. It is fast, simple, and appropriate for most use cases.

For details, see the vidyut-sandhi README.

Documentation

To view documentation for all crates (including private modules and structs), run make docs. This command will generate Rust's standard documentation and open it in your default web browser.

Contributing

Thank you for considering a contribution to Vidyut! Vidyut is an ambitious and transformative project, and it can grow only with your help.

For all of the details, see our CONTRIBUTING.md file.

Community

If you're excited about our work on Vidyut, we would love to have you join our community.

  • Most of our conversation occurs on Ambuda's Discord server on the #vidyut channel, where you can chat directly with our team and get fast answers to your questions. We also schedule time to spend together virtually, usually on a weekly frequency.

  • Occasional discussion related to Vidyut might also appear on ambuda-discuss or on standard mailing lists like sanskrit-programmers.

  • You can also follow along with project announcements on ambuda-announce.

  • More technical discussions will appear on our issues page.

बलमिति विद्युति