Skip to content

Commit

Permalink
deploy: e24c637
Browse files Browse the repository at this point in the history
  • Loading branch information
arxyzan committed Aug 28, 2023
0 parents commit a94ec24
Show file tree
Hide file tree
Showing 475 changed files with 122,450 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 7254a81eeaa30845194afa3d93625a5e
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file added .doctrees/contribute/add_datasets.doctree
Binary file not shown.
Binary file added .doctrees/contribute/add_docs.doctree
Binary file not shown.
Binary file added .doctrees/contribute/add_models.doctree
Binary file not shown.
Binary file added .doctrees/contribute/add_tests.doctree
Binary file not shown.
Binary file added .doctrees/contribute/contribute_to_hezar.doctree
Binary file not shown.
Binary file added .doctrees/contribute/index.doctree
Binary file not shown.
Binary file added .doctrees/contribute/pull_requests.doctree
Binary file not shown.
Binary file added .doctrees/environment.pickle
Binary file not shown.
Binary file added .doctrees/get_started/index.doctree
Binary file not shown.
Binary file added .doctrees/get_started/installation.doctree
Binary file not shown.
Binary file added .doctrees/get_started/overview.doctree
Binary file not shown.
Binary file added .doctrees/get_started/quick_tour.doctree
Binary file not shown.
Binary file added .doctrees/guide/advanced_training.doctree
Binary file not shown.
Binary file added .doctrees/guide/hezar_architecture.doctree
Binary file not shown.
Binary file added .doctrees/guide/index.doctree
Binary file not shown.
Binary file added .doctrees/guide/models_advanced.doctree
Binary file not shown.
Binary file added .doctrees/index.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.builders.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.configs.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.constants.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.data.datasets.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.data.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.embeddings.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.integrations.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.f1.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.metric.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.recall.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.metrics.seqeval.doctree
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.models.doctree
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.models.image2text.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.models.model.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.models.text2text.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.preprocessors.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.registry.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.trainers.doctree
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.trainers.trainer.doctree
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.audio_utils.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.common_utils.doctree
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.core_utils.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.data_utils.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.file_utils.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.hub_utils.doctree
Binary file not shown.
Binary file added .doctrees/source/hezar.utils.logging.doctree
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/source/index.doctree
Binary file not shown.
Binary file added .doctrees/source/modules.doctree
Binary file not shown.
Binary file added .doctrees/tutorial/datasets.doctree
Binary file not shown.
Binary file added .doctrees/tutorial/index.doctree
Binary file not shown.
Binary file added .doctrees/tutorial/models.doctree
Binary file not shown.
Binary file added .doctrees/tutorial/preprocessors.doctree
Binary file not shown.
Binary file added .doctrees/tutorial/training.doctree
Binary file not shown.
Empty file added .nojekyll
Empty file.
614 changes: 614 additions & 0 deletions _modules/hezar/builders.html

Large diffs are not rendered by default.

843 changes: 843 additions & 0 deletions _modules/hezar/configs.html

Large diffs are not rendered by default.

552 changes: 552 additions & 0 deletions _modules/hezar/constants.html

Large diffs are not rendered by default.

625 changes: 625 additions & 0 deletions _modules/hezar/data/data_collators.html

Large diffs are not rendered by default.

517 changes: 517 additions & 0 deletions _modules/hezar/data/datasets/dataset.html

Large diffs are not rendered by default.

586 changes: 586 additions & 0 deletions _modules/hezar/data/datasets/sequence_labeling_dataset.html

Large diffs are not rendered by default.

568 changes: 568 additions & 0 deletions _modules/hezar/data/datasets/text_classification_dataset.html

Large diffs are not rendered by default.

568 changes: 568 additions & 0 deletions _modules/hezar/data/datasets/text_summarization_dataset.html

Large diffs are not rendered by default.

673 changes: 673 additions & 0 deletions _modules/hezar/embeddings/embedding.html

Large diffs are not rendered by default.

627 changes: 627 additions & 0 deletions _modules/hezar/embeddings/fasttext.html

Large diffs are not rendered by default.

628 changes: 628 additions & 0 deletions _modules/hezar/embeddings/word2vec.html

Large diffs are not rendered by default.

473 changes: 473 additions & 0 deletions _modules/hezar/integrations.html

Large diffs are not rendered by default.

508 changes: 508 additions & 0 deletions _modules/hezar/metrics/f1.html

Large diffs are not rendered by default.

486 changes: 486 additions & 0 deletions _modules/hezar/metrics/metric.html

Large diffs are not rendered by default.

511 changes: 511 additions & 0 deletions _modules/hezar/metrics/recall.html

Large diffs are not rendered by default.

533 changes: 533 additions & 0 deletions _modules/hezar/metrics/seqeval.html

Large diffs are not rendered by default.

534 changes: 534 additions & 0 deletions _modules/hezar/models/language_modeling/bert/bert_lm.html

Large diffs are not rendered by default.

481 changes: 481 additions & 0 deletions _modules/hezar/models/language_modeling/bert/bert_lm_config.html

Large diffs are not rendered by default.

522 changes: 522 additions & 0 deletions _modules/hezar/models/language_modeling/distilbert/distilbert_lm.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

533 changes: 533 additions & 0 deletions _modules/hezar/models/language_modeling/roberta/roberta_lm.html

Large diffs are not rendered by default.

483 changes: 483 additions & 0 deletions _modules/hezar/models/language_modeling/roberta/roberta_lm_config.html

Large diffs are not rendered by default.

843 changes: 843 additions & 0 deletions _modules/hezar/models/model.html

Large diffs are not rendered by default.

552 changes: 552 additions & 0 deletions _modules/hezar/models/model_outputs.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

562 changes: 562 additions & 0 deletions _modules/hezar/models/text2text/t5/t5_text2text.html

Large diffs are not rendered by default.

483 changes: 483 additions & 0 deletions _modules/hezar/models/text2text/t5/t5_text2text_config.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

608 changes: 608 additions & 0 deletions _modules/hezar/preprocessors/preprocessor.html

Large diffs are not rendered by default.

585 changes: 585 additions & 0 deletions _modules/hezar/preprocessors/text_normalizer.html

Large diffs are not rendered by default.

601 changes: 601 additions & 0 deletions _modules/hezar/preprocessors/tokenizers/bpe.html

Large diffs are not rendered by default.

606 changes: 606 additions & 0 deletions _modules/hezar/preprocessors/tokenizers/sentencepiece_bpe.html

Large diffs are not rendered by default.

604 changes: 604 additions & 0 deletions _modules/hezar/preprocessors/tokenizers/sentencepiece_unigram.html

Large diffs are not rendered by default.

1,143 changes: 1,143 additions & 0 deletions _modules/hezar/preprocessors/tokenizers/tokenizer.html

Large diffs are not rendered by default.

1,083 changes: 1,083 additions & 0 deletions _modules/hezar/preprocessors/tokenizers/whisper_bpe.html

Large diffs are not rendered by default.

577 changes: 577 additions & 0 deletions _modules/hezar/preprocessors/tokenizers/wordpiece.html

Large diffs are not rendered by default.

717 changes: 717 additions & 0 deletions _modules/hezar/registry.html

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

984 changes: 984 additions & 0 deletions _modules/hezar/trainers/trainer.html

Large diffs are not rendered by default.

536 changes: 536 additions & 0 deletions _modules/hezar/trainers/trainer_utils.html

Large diffs are not rendered by default.

1,010 changes: 1,010 additions & 0 deletions _modules/hezar/utils/audio_utils.html

Large diffs are not rendered by default.

489 changes: 489 additions & 0 deletions _modules/hezar/utils/common_utils.html

Large diffs are not rendered by default.

478 changes: 478 additions & 0 deletions _modules/hezar/utils/context_managers.html

Large diffs are not rendered by default.

675 changes: 675 additions & 0 deletions _modules/hezar/utils/core_utils.html

Large diffs are not rendered by default.

537 changes: 537 additions & 0 deletions _modules/hezar/utils/data_utils.html

Large diffs are not rendered by default.

482 changes: 482 additions & 0 deletions _modules/hezar/utils/file_utils.html

Large diffs are not rendered by default.

598 changes: 598 additions & 0 deletions _modules/hezar/utils/hub_utils.html

Large diffs are not rendered by default.

474 changes: 474 additions & 0 deletions _modules/hezar/utils/logging.html

Large diffs are not rendered by default.

508 changes: 508 additions & 0 deletions _modules/hezar/utils/registry_utils.html

Large diffs are not rendered by default.

516 changes: 516 additions & 0 deletions _modules/index.html

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions _sources/contribute/add_datasets.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Add a Dataset
1 change: 1 addition & 0 deletions _sources/contribute/add_docs.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Contribute to Docs
1 change: 1 addition & 0 deletions _sources/contribute/add_models.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Add a Model
1 change: 1 addition & 0 deletions _sources/contribute/add_tests.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Add Tests
1 change: 1 addition & 0 deletions _sources/contribute/contribute_to_hezar.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Contribute to Hezar
10 changes: 10 additions & 0 deletions _sources/contribute/index.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Contribute

```{toctree}
contribute_to_hezar.md
add_models.md
add_datasets.md
add_docs.md
add_tests.md
pull_requests.md
```
1 change: 1 addition & 0 deletions _sources/contribute/pull_requests.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Sending a Pull Request
8 changes: 8 additions & 0 deletions _sources/get_started/index.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Get Started
```{toctree}
:maxdepth: 1

overview.md
installation.md
quick_tour.md
```
25 changes: 25 additions & 0 deletions _sources/get_started/installation.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Installation

#### Install from PyPi
Installing Hezar is as easy as any other Python library! Most of the requirements are cross-platform and installing
them on any machine is a piece of cake!

```
pip install hezar
```
#### Install from source
Also, you can install the dev version of the library using the source:
```
pip install git+https://github.com/hezarai/hezar.git
```

#### Test installation
From a Python console or in CLI just import `hezar` and check the version:
```python
import hezar

print(hezar.__version__)
```
```
0.23.1
```
20 changes: 20 additions & 0 deletions _sources/get_started/overview.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Overview

Welcome to Hezar! A library that makes state-of-the-art machine learning as easy as possible aimed for the Persian
language, built by the Persian community!

In Hezar, the primary goal is to provide plug-and-play AI/ML utilities so that you don't need to know much about what's
going on under the hood. Hezar is not just a model library, but instead it's packed with every aspect you need for any
ML pipeline like datasets, trainers, preprocessors, feature extractors, etc.

Hezar is a library that:
- brings together all the best works in AI for Persian
- makes using AI models as easy as a couple of lines of code
- seamlessly integrates with Hugging Face Hub for all of its models
- has a highly developer-friendly interface
- has a task-based model interface which is more convenient for general users.
- is packed with additional tools like word embeddings, tokenizers, feature extractors, etc.
- comes with a lot of supplementary ML tools for deployment, benchmarking, optimization, etc.
- and more!

To find out more, just take the [quick tour](quick_tour.md)!
151 changes: 151 additions & 0 deletions _sources/get_started/quick_tour.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# Quick Tour
Let's have a quick tour on some of the most important features of Hezar!

### Models
There's a bunch of ready to use trained models for different tasks on the Hub. To see all the models see [here](https://huggingface.co/hezarai)!

- **Text classification (sentiment analysis, categorization, etc)**
```python
from hezar import Model

example = ["هزار، کتابخانه‌ای کامل برای به کارگیری آسان هوش مصنوعی"]
model = Model.load("hezarai/bert-fa-sentiment-dksf")
outputs = model.predict(example)
print(outputs)
```
```
{'labels': ['positive'], 'probs': [0.812910258769989]}
```
- **Sequence labeling (POS, NER, etc.)**
```python
from hezar import Model

pos_model = Model.load("hezarai/bert-fa-pos-lscp-500k") # Part-of-speech
ner_model = Model.load("hezarai/bert-fa-ner-arman") # Named entity recognition
inputs = ["شرکت هوش مصنوعی هزار"]
pos_outputs = pos_model.predict(inputs)
ner_outputs = ner_model.predict(inputs)
print(f"POS: {pos_outputs}")
print(f"NER: {ner_outputs}")
```
```
POS: [[{'token': 'شرکت', 'tag': 'Ne'}, {'token': 'هوش', 'tag': 'Ne'}, {'token': 'مصنوعی', 'tag': 'AJe'}, {'token': 'هزار', 'tag': 'NUM'}]]
NER: [[{'token': 'شرکت', 'tag': 'B-org'}, {'token': 'هوش', 'tag': 'I-org'}, {'token': 'مصنوعی', 'tag': 'I-org'}, {'token': 'هزار', 'tag': 'I-org'}]]
```
- **Speech recognition**
```python
from hezar import Model
from datasets import load_dataset

ds = load_dataset("mozilla-foundation/common_voice_11_0", "fa", split="test")
sample = ds[1001]
whisper = Model.load("hezarai/whisper-small-fa")
transcript = whisper.predict(sample["path"]) # or pass `sample["audio"]["array"]` (with the right sample rate)
print(transcript)
```
```
{'transcription': ['و این تنها محدود به محیط کار نیست']}
```

### Word Embeddings
- **FastText**
```python
from hezar import Embedding

fasttext = Embedding.load("hezarai/fasttext-fa-300")
most_similar = fasttext.most_similar("هزار")
print(most_similar)
```
```
[{'score': 0.7579, 'word': 'میلیون'},
{'score': 0.6943, 'word': '21هزار'},
{'score': 0.6861, 'word': 'میلیارد'},
{'score': 0.6825, 'word': '26هزار'},
{'score': 0.6803, 'word': '٣هزار'}]
```
- **Word2Vec (Skip-gram)**
```python
from hezar import Embedding

word2vec = Embedding.load("hezarai/word2vec-skipgram-fa-wikipedia")
most_similar = word2vec.most_similar("هزار")
print(most_similar)
```
```
[{'score': 0.7885, 'word': 'چهارهزار'},
{'score': 0.7788, 'word': '۱۰هزار'},
{'score': 0.7727, 'word': 'دویست'},
{'score': 0.7679, 'word': 'میلیون'},
{'score': 0.7602, 'word': 'پانصد'}]
```
- **Word2Vec (CBOW)**
```python
from hezar import Embedding

word2vec = Embedding.load("hezarai/word2vec-cbow-fa-wikipedia")
most_similar = word2vec.most_similar("هزار")
print(most_similar)
```
```
[{'score': 0.7407, 'word': 'دویست'},
{'score': 0.7400, 'word': 'میلیون'},
{'score': 0.7326, 'word': 'صد'},
{'score': 0.7276, 'word': 'پانصد'},
{'score': 0.7011, 'word': 'سیصد'}]
```

### Datasets
You can load any of the datasets on the [Hub](https://huggingface.co/hezarai) like below:
```python
from hezar import Dataset

sentiment_dataset = Dataset.load("hezarai/sentiment-dksf") # A TextClassificationDataset instance
lscp_dataset = Dataset.load("hezarai/lscp-pos-500k") # A SequenceLabelingDataset instance
xlsum_dataset = Dataset.load("hezarai/xlsum-fa") # A TextSummarizationDataset instance
...
```

### Training
Hezar makes it super easy to train models using out-of-the-box models and datasets provided in the library.
```python
from hezar import (
BertSequenceLabeling,
BertSequenceLabelingConfig,
TrainerConfig,
SequenceLabelingTrainer,
Dataset,
Preprocessor,
)

base_model_path = "hezarai/bert-base-fa"
dataset_path = "hezarai/lscp-pos-500k"

train_dataset = Dataset.load(dataset_path, split="train", tokenizer_path=base_model_path)
eval_dataset = Dataset.load(dataset_path, split="test", tokenizer_path=base_model_path)

model = BertSequenceLabeling(BertSequenceLabelingConfig(id2label=train_dataset.config.id2label))
preprocessor = Preprocessor.load(base_model_path)

train_config = TrainerConfig(
device="cuda",
init_weights_from=base_model_path,
batch_size=8,
num_epochs=5,
checkpoints_dir="checkpoints/",
metrics=["seqeval"],
)

trainer = SequenceLabelingTrainer(
config=train_config,
model=model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
data_collator=train_dataset.data_collator,
preprocessor=preprocessor,
)
trainer.train()

trainer.push_to_hub("bert-fa-pos-lscp-500k") # push model, config, preprocessor, trainer files and configs
```

Want to go deeper? Check out the [guides](../guide/index.md).
2 changes: 2 additions & 0 deletions _sources/guide/advanced_training.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Advanced Training
Docs coming soon, stay tuned!
Loading

0 comments on commit a94ec24

Please sign in to comment.