Skip to content

Conversation

@RexBearIU
Copy link
Collaborator

@RexBearIU RexBearIU commented Jan 16, 2026

Description

This pull request significantly updates and modernizes the knowledge distillation tutorial for MaxText, aligning it with current best practices and tooling. The guide now uses Qwen3-32B as the teacher model (via vLLM) and Llama-3.1-8B as the student, streamlines the setup with Hyperdisk storage, and provides new scripts and commands for dataset generation and fine-tuning. The instructions have been clarified, unnecessary conversion steps removed for the teacher, and the fine-tuning process updated for the latest MaxText and vLLM workflows.

Tests

Manually triggered the distillation pipeline and monitored the execution flow step-by-step. Confirmed that the training loop finished and resources were released.
image

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant