Skip to content

Conversation

@danielclough
Copy link
Contributor

Add xLSTM (Extended LSTM) Model and Example

Adds support for xLSTM (Extended Long Short-Term Memory), a modernized LSTM architecture achieving competitive performance with Transformers while maintaining linear complexity for inference.

Implementation

Model architecture (candle-transformers/src/models/xlstm.rs):

  • mLSTM blocks with matrix memory and exponential gating
  • Covariance update rule using outer product of key-value pairs
  • Stabilized gates with soft-capping and log-space computation
  • GroupNorm without bias for multihead normalization
  • SwiGLU FFN blocks with pre-norm residual connections

Text generation example (candle-examples/examples/xlstm/):

  • Single-token recurrent inference with stateful generation
  • Supports NX-AI/xLSTM-7b (~14GB VRAM in bf16, ~28GB in f32)
  • Configurable sampling (temperature, top-p, repeat penalty)
  • BOS token handling per model requirements

Usage

# Generate with default prompt (bf16, requires ~14GB VRAM)
cargo run --example xlstm --release --features cuda -- --prompt "Once upon a time" -n 50

# Use f32 precision
cargo run --example xlstm --release --features metal,accelerate -- --dtype f32 --prompt "The meaning of life is"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant