Skip to content

Commit 5e9757b

Browse files
Merge pull request #3 from centre-for-humanities-computing/cli
Release 1.0.0
2 parents ddd2430 + 774d37c commit 5e9757b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

89 files changed

+1026
-22085
lines changed

.github/workflows/static.yml

Lines changed: 22 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,34 @@
1-
# Simple workflow for deploying static content to GitHub Pages
2-
name: Deploy static content to Pages
1+
# creates the documentation on pushes it to the gh-pages branch
2+
name: Documentation
33

44
on:
5-
# Runs on pushes targeting the default branch
5+
pull_request:
6+
branches: [main]
67
push:
7-
branches: ["main"]
8+
branches: [main]
89

9-
# Allows you to run this workflow manually from the Actions tab
10-
workflow_dispatch:
1110

12-
# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages
1311
permissions:
14-
contents: read
15-
pages: write
16-
id-token: write
17-
18-
# Allow one concurrent deployment
19-
concurrency:
20-
group: "pages"
21-
cancel-in-progress: true
12+
contents: write
2213

2314
jobs:
24-
# Single deploy job since we're just deploying
2515
deploy:
26-
environment:
27-
name: github-pages
28-
url: ${{ steps.deployment.outputs.page_url }}
2916
runs-on: ubuntu-latest
3017
steps:
31-
- name: Checkout
32-
uses: actions/checkout@v3
33-
- name: Setup Pages
34-
uses: actions/configure-pages@v2
35-
- name: Upload artifact
36-
uses: actions/upload-pages-artifact@v1
18+
- uses: actions/checkout@v4
19+
- uses: actions/setup-python@v4
3720
with:
38-
# Upload entire repository
39-
path: './docs/_build/html'
40-
- name: Deploy to GitHub Pages
41-
id: deployment
42-
uses: actions/deploy-pages@v1
21+
python-version: '3.10'
22+
23+
- name: Dependencies
24+
run: |
25+
python -m pip install --upgrade pip
26+
pip install "stormtrooper[docs,openai]"
27+
28+
- name: Build and Deploy
29+
if: github.event_name == 'push'
30+
run: mkdocs gh-deploy --force
31+
32+
- name: Build
33+
if: github.event_name == 'pull_request'
34+
run: mkdocs build

.github/workflows/tests.yml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
name: Tests
2+
on:
3+
push:
4+
branches: [main]
5+
pull_request:
6+
branches: [main]
7+
8+
jobs:
9+
pytest:
10+
runs-on: ubuntu-latest
11+
strategy:
12+
matrix:
13+
python-version: ["3.10"]
14+
#
15+
# This allows a subsequently queued workflow run to interrupt previous runs
16+
concurrency:
17+
group: "${{ github.workflow }}-${{ matrix.python-version}}-${{ matrix.os }} @ ${{ github.ref }}"
18+
cancel-in-progress: true
19+
20+
steps:
21+
- uses: actions/checkout@v4
22+
- name: Set up Python ${{ matrix.python-version }}
23+
uses: actions/setup-python@v4
24+
with:
25+
python-version: ${{ matrix.python-version }}
26+
cache: "pip"
27+
# You can test your matrix by printing the current Python version
28+
- name: Display Python version
29+
run: python3 -c "import sys; print(sys.version)"
30+
31+
- name: Install dependencies
32+
run: python3 -m pip install --upgrade stormtrooper[docs,openai] pandas pytest "sentence-transformers>=3.0.0" "accelerate>=0.25.0" "datasets>=2.14.0"
33+
34+
- name: Run tests
35+
run: python3 -m pytest tests/

README.md

Lines changed: 57 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -7,117 +7,77 @@ Zero/few shot learning components for scikit-learn pipelines with large-language
77

88
[Documentation](https://centre-for-humanities-computing.github.io/stormtrooper/)
99

10-
## Why stormtrooper?
10+
## New in 1.0.0
1111

12-
Other packages promise to provide at least similar functionality (scikit-llm), why should you choose stormtrooper instead?
12+
### `Trooper`
13+
The brand new `Trooper` interface allows you not to have to specify what model type you wish to use.
14+
Stormtrooper will automatically detect the model type from the specified name.
1315

14-
1. Fine-grained control over you pipeline.
15-
- Variety: stormtrooper allows you to use virtually all canonical approaches for zero and few-shot classification including NLI, Seq2Seq and Generative open-access models from Transformers, SetFit and even OpenAI's large language models.
16-
- Prompt engineering: You can adjust prompt templates to your hearts content.
17-
2. Performance
18-
- Easy inference on GPU if you have access to it.
19-
- Interfacing HuggingFace's TextGenerationInference API, the most efficient way to host models locally.
20-
- Async interaction with external APIs, this can speed up inference with OpenAI's models quite drastically.
21-
3. Extensive [Documentation](https://centre-for-humanities-computing.github.io/stormtrooper/)
22-
- Throrough API reference and loads of examples to get you started.
23-
3. Battle-hardened
24-
- We at the Center For Humanities Computing are making extensive use of this package. This means you can rest assured that the package works under real-world pressure. As such you can expect regular updates and maintance.
25-
4. Simple
26-
- We opted for as bare-bones of an implementation and little coupling as possible. The library works at the lowest level of abstraction possible, and we hope our code will be rather easy for others to understand and contribute to.
16+
```python
17+
from stormtrooper import Trooper
18+
19+
# This loads a setfit model
20+
model = Trooper("all-MiniLM-L6-v2")
21+
22+
# This loads an OpenAI model
23+
model = Trooper("gpt-4")
2724

25+
# This loads a Text2Text model
26+
model = Trooper("google/flan-t5-base")
27+
```
2828

29-
## New in version 0.5.0
29+
### Unified zero and few-shot classification
3030

31-
stormtrooper now uses chat templates from HuggingFace transformers for generative models.
32-
This means that you no longer have to pass model-specific prompt templates to these and can define system and user prompts separately.
31+
You no longer have to specify whether a model should be a few or a zero-shot classifier when initialising it.
32+
If you do not pass any training examples, it will be automatically assumed that the model should be zero-shot.
3333

3434
```python
35-
from stormtrooper import GenerativeZeroShotClassifier
35+
# This is a zero-shot model
36+
model.fit(None, ["dog", "cat"])
3637

37-
system_prompt = "You're a helpful assistant."
38-
user_prompt = """
39-
Classify a text into one of the following categories: {classes}
40-
Text to clasify:
41-
"{X}"
42-
"""
38+
# This is a few-shot model
39+
model.fit(["he was a good boy", "just lay down on my laptop"], ["dog", "cat"])
4340

44-
model = GenerativeZeroShotClassifier().fit(None, ["political", "not political"])
45-
model.predict("Joe Biden is no longer the candidate of the Democrats.")
4641
```
42+
## Model types
43+
44+
You can use all sorts of transformer models for few and zero-shot classification in Stormtrooper.
4745

46+
1. Instruction fine-tuned generative models, e.g. `Trooper("HuggingFaceH4/zephyr-7b-beta")`
47+
2. Encoder models with SetFit, e.g. `Trooper("all-MiniLM-L6-v2")`
48+
3. Text2Text models e.g. `Trooper("google/flan-t5-base")`
49+
4. OpenAI models e.g. `Trooper("gpt-4")`
50+
5. NLI models e.g. `Trooper("facebook/bart-large-mnli")`
4851

49-
## Examples
52+
## Example usage
5053

51-
Here are a couple of motivating examples to get you hooked. Find more in our [docs](https://centre-for-humanities-computing.github.io/stormtrooper/).
54+
Find more in our [docs](https://centre-for-humanities-computing.github.io/stormtrooper/).
5255

5356
```bash
5457
pip install stormtrooper
5558
```
5659

5760
```python
61+
from stormtrooper import Trooper
62+
5863
class_labels = ["atheism/christianity", "astronomy/space"]
5964
example_texts = [
6065
"God came down to earth to save us.",
6166
"A new nebula was recently discovered in the proximity of the Oort cloud."
6267
]
63-
```
64-
65-
66-
### Zero-shot learning
67-
68-
For zero-shot learning you can use zero-shot models:
69-
```python
70-
from stormtrooper import ZeroShotClassifier
71-
classifier = ZeroShotClassifier().fit(None, class_labels)
72-
```
73-
74-
Generative models (GPT, Llama):
75-
```python
76-
from stormtrooper import GenerativeZeroShotClassifier
77-
classifier = GenerativeZeroShotClassifier("meta-llama/Meta-Llama-3.1-8B-Instruct").fit(None, class_labels)
78-
```
79-
80-
Text2Text models (T5):
81-
If you are running low on resources I would personally recommend T5.
82-
```python
83-
from stormtrooper import Text2TextZeroShotClassifier
84-
# You can define a custom prompt, but a default one is available
85-
prompt = "..."
86-
classifier =Text2TextZeroShotClassifier(prompt=prompt).fit(None, class_labels)
87-
```
88-
89-
```python
90-
predictions = classifier.predict(example_texts)
91-
92-
assert list(predictions) == ["atheism/christianity", "astronomy/space"]
93-
```
94-
95-
OpenAI models:
96-
You can now use OpenAI's chat LLMs in stormtrooper workflows.
97-
98-
```python
99-
from stormtrooper import OpenAIZeroShotClassifier
100-
101-
classifier = OpenAIZeroShotClassifier("gpt-4").fit(None, class_labels)
102-
```
103-
104-
```python
105-
predictions = classifier.predict(example_texts)
106-
107-
assert list(predictions) == ["atheism/christianity", "astronomy/space"]
108-
```
109-
110-
### Few-Shot Learning
111-
112-
For few-shot tasks you can only use Generative, Text2Text, OpenAI (aka. promptable) or SetFit models.
113-
114-
```python
115-
from stormtrooper import GenerativeFewShotClassifier, Text2TextFewShotClassifier, SetFitFewShotClassifier
116-
117-
classifier = SetFitFewShotClassifier().fit(example_texts, class_labels)
118-
predictions = model.predict(["Calvinists believe in predestination."])
119-
120-
assert list(predictions) == ["atheism/christianity"]
68+
new_texts = ["God bless the reailway workers", "The frigate is ready to launch from the spaceport"]
69+
70+
# Zero-shot classification
71+
model = Trooper("google/flan-t5-base")
72+
model.fit(None, class_labels)
73+
model.predict(new_texts)
74+
# ["atheism/christianity", "astronomy/space"]
75+
76+
# Few-shot classification
77+
model = Trooper("google/flan-t5-base")
78+
model.fit(example_texts, class_labels)
79+
model.predict(new_texts)
80+
# ["atheism/christianity", "astronomy/space"]
12181
```
12282

12383
### Fuzzy Matching
@@ -133,5 +93,14 @@ From version 0.2.2 you can run models on GPU.
13393
You can specify the device when initializing a model:
13494

13595
```python
136-
classifier = Text2TextZeroShotClassifier(device="cuda:0")
96+
classifier = Trooper("all-MiniLM-L6-v2", device="cuda:0")
97+
```
98+
99+
### Inference on multiple GPUs
100+
101+
You can run a model on multiple devices in order of device priority `GPU -> CPU + Ram -> Disk` and on multiple devices by using the `device_map` argument.
102+
Note that this only works with text2text and generative models.
103+
104+
```
105+
model = Trooper("HuggingFaceH4/zephyr-7b-beta", device_map="auto")
137106
```

docs/Makefile

Lines changed: 0 additions & 20 deletions
This file was deleted.
-22.6 KB
Binary file not shown.
-5.3 KB
Binary file not shown.

docs/_build/doctrees/index.doctree

-6.54 KB
Binary file not shown.
-4.17 KB
Binary file not shown.
-7.78 KB
Binary file not shown.
-6.51 KB
Binary file not shown.

0 commit comments

Comments
 (0)