RLHF Book

A comprehensive guide to Reinforcement Learning from Human Feedback (and a broad introduction to post-training language models).

Read online | Pre-order print

This book is my attempt to open-source all the knowledge I've gained working at the frontier of open models in the post-ChatGPT take off of language models. When I started, many established methods like rejection sampling had no canonical reference. On the other side, industry practices to make the models more personable -- colloquially called Character Training -- had no open research. It was obvious to me that there would be payoff to documenting, learning the fundamentals, carefully curating the references (in an era of AI slop), and everything in between would be a wonderful starting point for people.

Today, I'm adding code and seeing this as a home base for people who want to learn. You should use coding assistants to ask questions. You should buy the physical book because the real world matters. You should read the specific AI outputs tailored to you.

In the future I want to build more education resources to this, such as open source slide decks and more ways to learn. In the end, with how impossible it is to measure human preferences, RLHF will never be a solved problem.

Thank you for reading. Thank you for contributing any feedback or engaging with the community.

-- Nathan Lambert, @natolambert

Repository Structure

rlhf-book/
├── book/                   # Book source and build files
│   ├── chapters/           # Markdown source (01-introduction.md, etc.)
│   ├── images/             # Figures referenced in chapters
│   ├── assets/             # Brand assets (covers, logos)
│   ├── templates/          # Pandoc templates (HTML, PDF, EPUB)
│   ├── scripts/            # Build utilities
│   └── data/               # Library data
├── code/                   # Reference implementations
│   ├── policy_gradients/   # PPO, REINFORCE, GRPO, RLOO
│   ├── reward_models/      # Preference RM, ORM, PRM training
│   └── direct_alignment/   # DPO and variants
├── diagrams/               # Diagram source files
│   ├── scripts/            # Python generation scripts
│   ├── tikz/               # LaTeX/TikZ sources
│   └── specs/              # YAML specifications
├── build/                  # Generated output (git-ignored)
└── Makefile                # Build system

Code Library

Reference implementations for RLHF algorithms in code/:

Policy gradient methods (PPO, REINFORCE, GRPO, RLOO, etc.)
Reward model training (preference RM, ORM, PRM)
Direct alignment methods

See code/README.md for setup and usage.

Book Source

Book source files are in book/. Build locally:

make html   # Build HTML site
make pdf    # Build PDF (requires LaTeX)

See book/README.md for detailed build instructions.

Diagrams

The diagrams/ directory contains source files for figures used in the book. These are designed to be reusable for presentations, blog posts, or your own learning materials. Generate them with:

cd diagrams && make all

Citation

To cite this book, please use the following format:

@book{rlhf2025,
  author       = {Nathan Lambert},
  title        = {Reinforcement Learning from Human Feedback},
  year         = {2025},
  publisher    = {Online},
  url          = {https://rlhfbook.com},
}

License

Code: MIT
Chapters: CC-BY-NC-SA-4.0

Contributors

Where I get the credit as the sole "author" and creator of this project, I've been super lucky to have many contributions from early readers. These have massively accelerated the editing progress and flat-out added meaningful content to the book. I'm happy to send substantive contributors free copies of the book and expect the internet goodwill to pay them back in unexpected ways.

See all contributors.

Name		Name	Last commit message	Last commit date
Latest commit History 301 Commits
.claude/skills		.claude/skills
.github/workflows		.github/workflows
book		book
code		code
diagrams		diagrams
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE-CHAPTERS		LICENSE-CHAPTERS
LICENSE-CODE		LICENSE-CODE
Makefile		Makefile
README.md		README.md
netlify.toml		netlify.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

RLHF Book

Repository Structure

Code Library

Book Source

Diagrams

Citation

License

Contributors

Star History

About

Licenses found

Uh oh!

Releases 2

Packages

Contributors 33

Uh oh!

Languages

License

Licenses found

natolambert/rlhf-book

Folders and files

Latest commit

History

Repository files navigation

RLHF Book

Repository Structure

Code Library

Book Source

Diagrams

Citation

License

Contributors

Star History

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 33

Uh oh!

Languages

Packages