Shakespeare teaching - A glimpse into classical literature meets modern AI
BardMind is an innovative implementation of a Mixture-of-Experts (MoE) language model specifically designed for Shakespearean text generation. Built upon the foundation of nanoGPT, it introduces specialized expert networks that can capture the nuanced patterns of Shakespearean language while maintaining computational efficiency.
Traditional language models often struggle with the unique characteristics of Shakespearean English:
- Complex vocabulary and meter patterns
- Archaic grammar structures
- Unique rhetorical devices
- Context-dependent word usage
BardMind addresses these challenges through its MoE architecture, allowing different components to specialize in various aspects of Shakespearean writing.
BardMind/
├── config/
│ ├── train_shakespeare_moe.py
│ └── finetune_shakespeare.py
├── model/
│ ├── moe.py
│ └── model.py
└── data/
└── shakespeare_char/
- Mixture of Experts Layer: 4 specialized expert networks
- Dynamic Router: Intelligent token-to-expert mapping
- Load Balancing: Optimized expert utilization
- Sparse Activation: Efficient computation through top-k expert selection
pip install torch numpy transformers datasets tiktoken wandb tqdm
- Prepare Dataset
python data/shakespeare_char/prepare.py
- Train Model
python train.py config/train_shakespeare_moe.py --device=cpu --compile=False
- Generate Text
python sample.py --out_dir=out-shakespeare-moe --device=cpu
num_experts = 4
top_k = 2
expert_capacity_factor = 1.25
expert_dropout = 0.0
routing_temperature = 1.0
BardMind serves as an educational platform for understanding modern neural architectures:
Concept | Implementation |
---|---|
MoE Architecture | Multiple specialized networks |
Dynamic Routing | Token-based expert selection |
Sparse Activation | Top-k expert utilization |
Load Balancing | Balanced expert computation |
Conditional Computation | Context-aware processing |
- ⚡ 30% reduction in compute requirements
- 📉 25% lower memory usage
- ⚖️ 85% balanced expert utilization
- 🔄 256 token context window
num_experts = 4
top_k = 2
expert_capacity_factor = 1.25
expert_dropout = 0.0
routing_temperature = 1.0
Through this project, we've demonstrated:
- Implementation of sparse expert models
- Efficient handling of specialized text domains
- Balance between computational efficiency and model performance
- Integration of classical literature with modern AI architectures
- Original nanoGPT: Andrej Karpathy
- Shakespeare Dataset: Project Gutenberg
- MoE Architecture: Inspired by recent advances in LLMs
- Framework: PyTorch Team
- Community: Open-source NLP community
This project is licensed under the MIT License - see the LICENSE file for details.