Skip to content

Latest commit

 

History

History
328 lines (211 loc) · 10.3 KB

README.md

File metadata and controls

328 lines (211 loc) · 10.3 KB

विद्युत्

Reliable infrastructure for Sanskrit software

Vidyut aims to provide performant and high-quality solutions for the common problems that Sanskrit programmers face. Some of these problems include:

  • Word generation, or converting bases and suffixes into complete words. (भू → भवति)

  • Word lookup, or mapping a complete word back to its bases and suffixes. (भवति → भू)

  • Transliteration, or conversion of Sanskrit text from one script to another. (भू → bhū)

  • Metrical analysis, or understanding the meter used by a piece of Sanskrit text.

  • Sandhi changes, or applying and undoing the sound changes that occur between pieces of Sanskrit text. (चैव → च एव)

  • Segmentation, or splitting a piece of Sanskrit text into distinct words. (भवत्येव → भवति एव)

Vidyut compiles to fast and efficient native code, and it can be bound to other programming languages with minimal work. We provide first-class support for Python and are eager to support other bindings as well.

Vidyut is under active development as part of the Ambuda project.

License: MIT

Build status

Contents

Installation

Vidyut is implemented in Rust, which provides low-level control with high-level ergonomics. For your convenience, we also provide first-class support for Python bindings through the vidyut Python package. This section describes how to use Vidyut either through Rust or through Python.

Through Rust

First, install Rust on your computer by following the instructions here.

Once you've done so, create a new project with cargo new and install Vidyut's packages:

cargo add vidyut-prakriya
cargo add vidyut-kosha
cargo add vidyut-lipi
# ... and so on

You can also install directly from this repository:

cargo add vidyut-prakriya --git https://github.com/ambuda-org/vidyut.git
cargo add vidyut-kosha --git https://github.com/ambuda-org/vidyut.git
cargo add vidyut-lipi --git https://github.com/ambuda-org/vidyut.git
# ... and so on

We recommend using our pre-built linguistic data, which is available as a ZIP file here.

For more information, see our Rust documentation.

Through Python

First, install Python on your computer. There are many ways to do so, but we recommend installing uv then running uv init my-project to create a Python project.

Once your setup is ready, you can install the vidyut package:

# With uv
$ uv add vidyut

# With pip
$ pip install vidyut

You can also install directly from this repository. Doing so compiles the repository from scratch and might take several minutes, so we strongly suggest using our latest PyPI release instead.

# Building from scratch is slow, so we pass `--verbose` to monitor its status.

# With uv
$ uv add "git+https://github.com/ambuda-org/vidyut.git#subdirectory=bindings-python" --verbose

# With pip
$ pip install -e "git+https://github.com/ambuda-org/vidyut.git#egg=vidyut&subdirectory=bindings-python" --verbose

We recommend using our pre-built linguistic data, which is available as a ZIP file here.

For more information, see our Python documentation.

Building from source

Building from source lets you work with Vidyut as a developer and contributor.

Through Rust

(This setup requires cargo. Confirm that you have cargo installed by running cargo --version.)

Once you download the repo, you can run cargo test --all to run unit tests.

$ git clone https://github.com/ambuda-org/vidyut.git
$ cd vidyut
$ cargo test --all

(If you install cargo-nextest, you can also run make test for a nicer testing experience.)

Your first build will likely take a few minutes, but future builds will be much faster.

We recommend using our pre-built linguistic data, which is available as a ZIP file here. Or if you prefer, you can build this data for yourself:

$ cd vidyut-data
$ make create_all_data

Output will be written to data/build/vidyut-latest.

NOTE: this command is resource-intensive and might stall on slower machines.

Through Python

(This setup requires uv. Confirm that you have uv installed by running uv --version.)

Once you download the repo, you can run make test in the bindings-python directory to run Python-specific unit tests:

$ git clone https://github.com/ambuda-org/vidyut.git
$ cd vidyut/bindings-python
$ make test

make test uses a development build, which compiles more quickly but has worse runtime performance. To create a release build instead, run make release.

Components

Vidyut contains several standard components for common Sanskrit processing tasks. These components work together well, but you can also use them independently depending on your use case.

In Rust, components of this kind are called crates.

vidyut-chandas identifies the meter in some piece of Sanskrit text. This crate is experimental, and while it is useful for common and basic use cases, it is not a state-of-the-art solution.

For details, see the vidyut-chandas README.

vidyut-cheda segments Sanskrit expressions into words then annotates those words with their morphological data. Our segmenter is optimized for real-time and interactive usage: it is fast, low-memory, and capably handles pathological input.

For details, see the vidyut-cheda README.

vidyut-kosha defines a key-value store that can compactly map tens of millions of Sanskrit words to their inflectional data. Depending on the application, storage costs can be as low as 1 byte per word. This storage efficiency comes at the cost of increased lookup time, but in practice, we have found that this increase is negligible and well worth the efficiency gains elsewhere.

For details, see the vidyut-kosha README.

vidyut-lipi is a transliteration library for Sanskrit and Pali that also supports many of the scripts used within the Indosphere. Our goal is to provide a standard transliterator that is easy to bind to other programming languages.

For details, see the vidyut-lipi README.

vidyut-prakriya generates Sanskrit words with their prakriyās (derivations) according to the rules of Paninian grammar. Our long-term goal is to provide a complete implementation of the Ashtadhyayi.

For details, see the vidyut-prakriya README.

vidyut-sandhi contains various utilities for working with sandhi changes between words. It is fast, simple, and appropriate for most use cases.

For details, see the vidyut-sandhi README.

Documentation

Our Rust documentation is available on docs.rs, and our Python documentation is available on readthedocs.org. You can also build our documentation from scratch:

  • (Rust) To view documentation for all crates (including private modules and structs), run make docs from the repository root. This command will generate Rust's standard documentation and open it in your default web browser.

  • (Python) To view the latest build of our Python documentation, run make docs from the bindings-python directory. This command will write our Python docs to local HTML files, which you should then open manually.

Contributing

Thank you for considering a contribution to Vidyut! Vidyut is an ambitious and transformative project, and it can grow only with your help.

For all of the details, see our CONTRIBUTING.md file.

Community

If you're excited about our work on Vidyut, we would love to have you join our community.

  • Most of our conversation occurs on Ambuda's Discord server on the #vidyut channel, where you can chat directly with our team and get fast answers to your questions. We also schedule time to spend together virtually, usually on a weekly frequency.

  • Occasional discussion related to Vidyut might also appear on ambuda-discuss or on standard mailing lists like sanskrit-programmers.

  • You can also follow along with project announcements on ambuda-announce.

  • More technical discussions will appear on our issues page.

बलमिति विद्युति