feat: Donut, Swin, and BART (models and examples)#3265
Open
danielclough wants to merge 6 commits intohuggingface:mainfrom
Open
feat: Donut, Swin, and BART (models and examples)#3265danielclough wants to merge 6 commits intohuggingface:mainfrom
danielclough wants to merge 6 commits intohuggingface:mainfrom
Conversation
- Full encoder-decoder architecture - Beam search decoding with configurable parameters - Causal language modeling head - Support for BART, mBART, and MBart50 variants
- Shifted window multi-head self-attention - Patch merging for hierarchical feature maps
ivarflakstad
reviewed
Jan 3, 2026
Comment on lines
+4
to
+6
| mBART models use SentencePiece tokenization which isn't directly supported | ||
| by the Rust tokenizers crate. This script converts the tokenizer to the | ||
| tokenizer.json format that can be loaded by the Rust example. |
Member
There was a problem hiding this comment.
Is there a specific model you've encountered where they don't provide a tokenizer.json?
Contributor
Author
There was a problem hiding this comment.
Take a look at: https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt/tree/main
The README.md explains converting the SentencePiece tokenizer first:
# Step 1: Convert tokenizer
cd candle-examples/examples/bart
pip install transformers sentencepiece
python convert_mbart_tokenizer.py --model-id facebook/mbart-large-50-many-to-many-mmt
# Step 2: Run translation (English to French)
cargo run --example bart --release -- \
--model-id facebook/mbart-large-50-many-to-many-mmt \
--prompt "Hello, how are you today?" \
--source-lang en_XX \
--target-lang fr_XX \
--sample-len 50
Contributor
Author
|
I noticed I forgot to run clippy with -D warnings. 🤦 I'll fix that. |
Co-authored-by: ivarflakstad <69173633+ivarflakstad@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Donut, Swin, and BART (models and examples)
This PR adds three interconnected models for document understanding and text generation:
candle-transformers/src/models/bart/) - Encoder-decoder transformer for summarization and translationcandle-transformers/src/models/swin.rs) - Hierarchical vision transformer for image processingcandle-transformers/src/models/donut.rs) - OCR-free document understanding (Swin encoder + BART decoder)I am submitting the PR together because Donut relies upon Swin (encoder) and BART (decoder).
Each model also comes with examples demonstrating essential features.
Features
Key Implementation Details