Skip to content

Pinned Loading

  1. understand-r1-zero understand-r1-zero Public

    Understanding R1-Zero-Like Training: A Critical Perspective

    Python 1.2k 56

  2. zero-bubble-pipeline-parallelism zero-bubble-pipeline-parallelism Public

    Forked from NVIDIA/Megatron-LM

    Zero Bubble Pipeline Parallelism

    Python 449 31

  3. lorahub lorahub Public

    [COLM 2024] LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition

    Python 666 39

  4. oat oat Public

    🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.

    Python 623 58

  5. stde stde Public

    Official implementation of Stochastic Taylor Derivative Estimator (STDE) NeurIPS2024

    Python 128 10

  6. feedback-conditional-policy feedback-conditional-policy Public

    Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"

    Python 58 2

Repositories

Showing 10 of 99 repositories

Most used topics

Loading…