Skip to content

Commit becbff8

Browse files
authored
Merge branch 'main' into fix_hqq_tests
2 parents 5a557f4 + 4b8d1f7 commit becbff8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+2608
-163
lines changed

docker/transformers-all-latest-gpu/Dockerfile

+3
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,9 @@ RUN python3 -m pip install --no-cache-dir python-Levenshtein
6565
# For `FastSpeech2ConformerTokenizer` tokenizer
6666
RUN python3 -m pip install --no-cache-dir g2p-en
6767

68+
# For Some bitsandbytes tests
69+
RUN python3 -m pip install --no-cache-dir einops
70+
6871
# When installing in editable mode, `transformers` is not recognized as a package.
6972
# this line must be added in order for python to be aware of transformers.
7073
RUN cd transformers && python3 setup.py develop

docs/source/ar/_toctree.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -35,8 +35,8 @@
3535
sections:
3636
# - local: tasks/sequence_classification
3737
# title: تصنيف النصوص
38-
# - local: tasks/token_classification
39-
# title: تصنيف الرموز
38+
- local: tasks/token_classification
39+
title: تصنيف الرموز
4040
- local: tasks/question_answering
4141
title: الإجابة على الأسئلة
4242
# - local: tasks/language_modeling

docs/source/ar/tasks/token_classification.md

+550
Large diffs are not rendered by default.

docs/source/en/_toctree.yml

+2
Original file line numberDiff line numberDiff line change
@@ -452,6 +452,8 @@
452452
title: Granite
453453
- local: model_doc/granitemoe
454454
title: GraniteMoe
455+
- local: model_doc/helium
456+
title: Helium
455457
- local: model_doc/herbert
456458
title: HerBERT
457459
- local: model_doc/ibert

docs/source/en/index.md

+1
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,7 @@ Flax), PyTorch, and/or TensorFlow.
173173
| [Graphormer](model_doc/graphormer) ||||
174174
| [Grounding DINO](model_doc/grounding-dino) ||||
175175
| [GroupViT](model_doc/groupvit) ||||
176+
| [Helium](model_doc/helium) ||||
176177
| [HerBERT](model_doc/herbert) ||||
177178
| [Hiera](model_doc/hiera) ||||
178179
| [Hubert](model_doc/hubert) ||||

docs/source/en/model_doc/helium.md

+158
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,158 @@
1+
<!--Copyright 2024 Kyutai and The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
12+
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
13+
rendered properly in your Markdown viewer.
14+
15+
-->
16+
17+
# Helium
18+
19+
20+
## Overview
21+
22+
Helium was proposed in [Announcing Helium-1 Preview](https://kyutai.org/2025/01/13/helium.html) by the Kyutai Team.
23+
24+
25+
Helium-1 preview is a lightweight language model with 2B parameters, targeting edge and mobile devices.
26+
It supports the following languages: English, French, German, Italian, Portuguese, Spanish.
27+
28+
- **Developed by:** Kyutai
29+
- **Model type:** Large Language Model
30+
- **Language(s) (NLP):** English, French, German, Italian, Portuguese, Spanish
31+
- **License:** CC-BY 4.0
32+
33+
34+
35+
36+
## Evaluation
37+
38+
<!-- This section describes the evaluation protocols and provides the results. -->
39+
40+
#### Testing Data
41+
42+
<!-- This should link to a Dataset Card if possible. -->
43+
44+
The model was evaluated on MMLU, TriviaQA, NaturalQuestions, ARC Easy & Challenge, Open Book QA, Common Sense QA,
45+
Physical Interaction QA, Social Interaction QA, HellaSwag, WinoGrande, Multilingual Knowledge QA, FLORES 200.
46+
47+
#### Metrics
48+
49+
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
50+
51+
We report accuracy on MMLU, ARC, OBQA, CSQA, PIQA, SIQA, HellaSwag, WinoGrande.
52+
We report exact match on TriviaQA, NQ and MKQA.
53+
We report BLEU on FLORES.
54+
55+
### English Results
56+
57+
| Benchmark | Helium-1 Preview | HF SmolLM2 (1.7B) | Gemma-2 (2.6B) | Llama-3.2 (3B) | Qwen2.5 (1.5B) |
58+
|--------------|--------|--------|--------|--------|--------|
59+
| | | | | | |
60+
| MMLU | 51.2 | 50.4 | 53.1 | 56.6 | 61.0 |
61+
| NQ | 17.3 | 15.1 | 17.7 | 22.0 | 13.1 |
62+
| TQA | 47.9 | 45.4 | 49.9 | 53.6 | 35.9 |
63+
| ARC E | 80.9 | 81.8 | 81.1 | 84.6 | 89.7 |
64+
| ARC C | 62.7 | 64.7 | 66.0 | 69.0 | 77.2 |
65+
| OBQA | 63.8 | 61.4 | 64.6 | 68.4 | 73.8 |
66+
| CSQA | 65.6 | 59.0 | 64.4 | 65.4 | 72.4 |
67+
| PIQA | 77.4 | 77.7 | 79.8 | 78.9 | 76.0 |
68+
| SIQA | 64.4 | 57.5 | 61.9 | 63.8 | 68.7 |
69+
| HS | 69.7 | 73.2 | 74.7 | 76.9 | 67.5 |
70+
| WG | 66.5 | 65.6 | 71.2 | 72.0 | 64.8 |
71+
| | | | | | |
72+
| Average | 60.7 | 59.3 | 62.2 | 64.7 | 63.6 |
73+
74+
#### Multilingual Results
75+
76+
| Language | Benchmark | Helium-1 Preview | HF SmolLM2 (1.7B) | Gemma-2 (2.6B) | Llama-3.2 (3B) | Qwen2.5 (1.5B) |
77+
|-----|--------------|--------|--------|--------|--------|--------|
78+
| | | | | | | |
79+
|German| MMLU | 45.6 | 35.3 | 45.0 | 47.5 | 49.5 |
80+
|| ARC C | 56.7 | 38.4 | 54.7 | 58.3 | 60.2 |
81+
|| HS | 53.5 | 33.9 | 53.4 | 53.7 | 42.8 |
82+
|| MKQA | 16.1 | 7.1 | 18.9 | 20.2 | 10.4 |
83+
| | | | | | | |
84+
|Spanish| MMLU | 46.5 | 38.9 | 46.2 | 49.6 | 52.8 |
85+
|| ARC C | 58.3 | 43.2 | 58.8 | 60.0 | 68.1 |
86+
|| HS | 58.6 | 40.8 | 60.5 | 61.1 | 51.4 |
87+
|| MKQA | 16.0 | 7.9 | 18.5 | 20.6 | 10.6 |
88+
89+
90+
## Technical Specifications
91+
92+
### Model Architecture and Objective
93+
94+
| Hyperparameter | Value |
95+
|--------------|--------|
96+
| Layers | 24 |
97+
| Heads | 20 |
98+
| Model dimension | 2560 |
99+
| MLP dimension | 7040 |
100+
| Context size | 4096 |
101+
| Theta RoPE | 100,000 |
102+
103+
Tips:
104+
105+
- This model was contributed by [Laurent Mazare](https://huggingface.co/lmz)
106+
107+
108+
## Usage tips
109+
110+
`Helium` can be found on the [Huggingface Hub](https://huggingface.co/collections/kyutai/helium-1-preview)
111+
112+
In the following, we demonstrate how to use `helium-1-preview` for the inference.
113+
114+
```python
115+
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
116+
>>> device = "cuda" # the device to load the model onto
117+
118+
>>> model = AutoModelForCausalLM.from_pretrained("helium-1-preview", device_map="auto")
119+
>>> tokenizer = AutoTokenizer.from_pretrained("helium-1-preview")
120+
121+
>>> prompt = "Give me a short introduction to large language model."
122+
123+
>>> messages = [{"role": "user", "content": prompt}]
124+
125+
>>> text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
126+
127+
>>> model_inputs = tokenizer([text], return_tensors="pt").to(device)
128+
129+
>>> generated_ids = model.generate(model_inputs.input_ids, max_new_tokens=512, do_sample=True)
130+
131+
>>> generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)]
132+
133+
>>> response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
134+
```
135+
136+
## HeliumConfig
137+
138+
[[autodoc]] HeliumConfig
139+
140+
## HeliumModel
141+
142+
[[autodoc]] HeliumModel
143+
- forward
144+
145+
## HeliumForCausalLM
146+
147+
[[autodoc]] HeliumForCausalLM
148+
- forward
149+
150+
## HeliumForSequenceClassification
151+
152+
[[autodoc]] HeliumForSequenceClassification
153+
- forward
154+
155+
## HeliumForTokenClassification
156+
157+
[[autodoc]] HeliumForTokenClassification
158+
- forward

docs/source/en/model_doc/siglip.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ If you want to do the pre- and postprocessing yourself, here's how to do that:
102102

103103
A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with SigLIP.
104104

105-
- [Zero-shot image classification task guide](../tasks/zero_shot_image_classification_md)
105+
- [Zero-shot image classification task guide](../tasks/zero_shot_image_classification)
106106
- Demo notebooks for SigLIP can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/SigLIP). 🌎
107107

108108
If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource.

docs/source/en/perf_infer_gpu_one.md

+2
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ FlashAttention-2 is currently supported for the following architectures:
109109
* [SigLIP](https://huggingface.co/docs/transformers/model_doc/siglip)
110110
* [UniSpeech](https://huggingface.co/docs/transformers/v4.39.3/en/model_doc/unispeech#transformers.UniSpeechModel)
111111
* [unispeech_sat](https://huggingface.co/docs/transformers/v4.39.3/en/model_doc/unispeech-sat#transformers.UniSpeechSatModel)
112+
* [helium](https://huggingface.co/docs/transformers/main/en/model_doc/heliumtransformers.HeliumModel)
112113

113114
You can request to add FlashAttention-2 support for another model by opening a GitHub Issue or Pull Request.
114115

@@ -324,6 +325,7 @@ For now, Transformers supports SDPA inference and training for the following arc
324325
* [XLM-RoBERTa](https://huggingface.co/docs/transformers/model_doc/xlm-roberta#transformers.XLMRobertaModel)
325326
* [XLM-RoBERTa-XL](https://huggingface.co/docs/transformers/model_doc/xlm-roberta-xl#transformers.XLMRobertaXLModel)
326327
* [YOLOS](https://huggingface.co/docs/transformers/model_doc/yolos#transformers.YolosModel)
328+
* [helium](https://huggingface.co/docs/transformers/main/en/model_doc/heliumtransformers.HeliumModel)
327329

328330
<Tip>
329331

src/transformers/__init__.py

+18
Original file line numberDiff line numberDiff line change
@@ -498,6 +498,7 @@
498498
"GroupViTTextConfig",
499499
"GroupViTVisionConfig",
500500
],
501+
"models.helium": ["HeliumConfig"],
501502
"models.herbert": ["HerbertTokenizer"],
502503
"models.hiera": ["HieraConfig"],
503504
"models.hubert": ["HubertConfig"],
@@ -2506,6 +2507,15 @@
25062507
"GroupViTVisionModel",
25072508
]
25082509
)
2510+
_import_structure["models.helium"].extend(
2511+
[
2512+
"HeliumForCausalLM",
2513+
"HeliumForSequenceClassification",
2514+
"HeliumForTokenClassification",
2515+
"HeliumModel",
2516+
"HeliumPreTrainedModel",
2517+
]
2518+
)
25092519
_import_structure["models.hiera"].extend(
25102520
[
25112521
"HieraBackbone",
@@ -5529,6 +5539,7 @@
55295539
GroupViTTextConfig,
55305540
GroupViTVisionConfig,
55315541
)
5542+
from .models.helium import HeliumConfig
55325543
from .models.herbert import HerbertTokenizer
55335544
from .models.hiera import HieraConfig
55345545
from .models.hubert import HubertConfig
@@ -7371,6 +7382,13 @@
73717382
GroupViTTextModel,
73727383
GroupViTVisionModel,
73737384
)
7385+
from .models.helium import (
7386+
HeliumForCausalLM,
7387+
HeliumForSequenceClassification,
7388+
HeliumForTokenClassification,
7389+
HeliumModel,
7390+
HeliumPreTrainedModel,
7391+
)
73747392
from .models.hiera import (
73757393
HieraBackbone,
73767394
HieraForImageClassification,

src/transformers/configuration_utils.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -994,8 +994,11 @@ def dict_torch_dtype_to_str(self, d: Dict[str, Any]) -> None:
994994
converts torch.dtype to a string of just the type. For example, `torch.float32` get converted into *"float32"*
995995
string, which can then be stored in the json format.
996996
"""
997-
if d.get("torch_dtype", None) is not None and not isinstance(d["torch_dtype"], str):
998-
d["torch_dtype"] = str(d["torch_dtype"]).split(".")[1]
997+
if d.get("torch_dtype", None) is not None:
998+
if isinstance(d["torch_dtype"], dict):
999+
d["torch_dtype"] = {k: str(v).split(".")[-1] for k, v in d["torch_dtype"].items()}
1000+
elif not isinstance(d["torch_dtype"], str):
1001+
d["torch_dtype"] = str(d["torch_dtype"]).split(".")[1]
9991002
for value in d.values():
10001003
if isinstance(value, dict):
10011004
self.dict_torch_dtype_to_str(value)

src/transformers/convert_slow_tokenizer.py

+89
Original file line numberDiff line numberDiff line change
@@ -1446,6 +1446,95 @@ def pre_tokenizer(self, replacement, add_prefix_space):
14461446
return pre_tokenizers.Metaspace(replacement=replacement, prepend_scheme=prepend_scheme, split=False)
14471447

14481448

1449+
class HeliumConverter(SpmConverter):
1450+
handle_byte_fallback = True
1451+
1452+
def __init__(self, vocab_file=None, *args):
1453+
requires_backends(self, "protobuf")
1454+
1455+
Converter.__init__(self, vocab_file)
1456+
1457+
model_pb2 = import_protobuf()
1458+
1459+
m = model_pb2.ModelProto()
1460+
with open(vocab_file, "rb") as f:
1461+
m.ParseFromString(f.read())
1462+
self.proto = m
1463+
1464+
def tokenizer(self, proto):
1465+
vocab_scores = self.vocab(proto)
1466+
tokenizer = Tokenizer(
1467+
Unigram(
1468+
vocab_scores,
1469+
unk_id=self.unk_id(proto),
1470+
byte_fallback=self.handle_byte_fallback,
1471+
)
1472+
)
1473+
# control tokens are special
1474+
# user defined symbols are not
1475+
# both user and control tokens are AddedTokens
1476+
# Add user defined symbols (type == 4) from sentencepiece (https://github.com/google/sentencepiece/blob/6225e08edb2577757163b3f5dbba4c0b670ef445/src/sentencepiece_model.proto#L299C29-L299C33)
1477+
spm_added_tokens = [
1478+
(id, p.piece, p.type == 3 or p.piece in self.special_tokens)
1479+
for id, p in enumerate(proto.pieces)
1480+
if p.type in [3, 4]
1481+
]
1482+
tokenizer.add_tokens(
1483+
[
1484+
AddedToken(token, normalized=False, special=special, single_word=True)
1485+
for id, token, special in sorted(spm_added_tokens, key=lambda x: x[0])
1486+
]
1487+
)
1488+
tokenizer.add_tokens([AddedToken("\n", normalized=False, special=False)])
1489+
tokenizer.enable_padding(pad_token="<pad>", pad_id=3)
1490+
return tokenizer
1491+
1492+
def vocab(self, proto):
1493+
vocab = []
1494+
for piece in proto.pieces:
1495+
if piece.piece == "<0x0A>":
1496+
vocab += [("\n", piece.score)]
1497+
else:
1498+
vocab += [(piece.piece, piece.score)]
1499+
return vocab
1500+
1501+
def unk_id(self, proto):
1502+
unk_id = 0
1503+
return unk_id
1504+
1505+
def decoder(self, replacement, add_prefix_space):
1506+
sequence = [
1507+
decoders.Replace("▁", " "),
1508+
decoders.ByteFallback(),
1509+
decoders.Fuse(),
1510+
]
1511+
sequence += [decoders.Strip(content=" ", left=1)]
1512+
return decoders.Sequence(sequence)
1513+
1514+
def normalizer(self, proto):
1515+
return normalizers.Sequence([normalizers.Prepend(" "), normalizers.Replace(r" ", "▁")])
1516+
1517+
def pre_tokenizer(self, replacement, add_prefix_space):
1518+
return pre_tokenizers.Sequence([pre_tokenizers.Split("\n", "contiguous")])
1519+
1520+
def post_processor(self):
1521+
return processors.TemplateProcessing(
1522+
single=[
1523+
"<s>",
1524+
"$A",
1525+
],
1526+
pair=[
1527+
"<s>",
1528+
"$A",
1529+
"<s>",
1530+
"$B",
1531+
],
1532+
special_tokens=[
1533+
("<s>", 1),
1534+
],
1535+
)
1536+
1537+
14491538
# Copied from transformers.models.gpt2.tokenization_gpt2.bytes_to_unicode
14501539
def bytes_to_unicode():
14511540
"""

0 commit comments

Comments
 (0)