Skip to content
Change the repository type filter

All

    Repositories list

    • Python
      11310Updated Jan 15, 2026Jan 15, 2026
    • SII-CLI

      Public
      03000Updated Jan 13, 2026Jan 13, 2026
    • ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry
      Python
      54121Updated Jan 5, 2026Jan 5, 2026
    • LiveTalk

      Public
      Python
      1722170Updated Jan 2, 2026Jan 2, 2026
    • ASI-Arch

      Public
      AlphaGo Moment for Model Architecture Discovery.
      Python
      2161.1k90Updated Dec 3, 2025Dec 3, 2025
    • MegaScience: Pushing the Frontiers of Post-Training Datasets for Science Reasoning
      Python
      611000Updated Nov 23, 2025Nov 23, 2025
    • InnovatorBench

      Public
      A benchmark for LLMs on complicated long-horizon tasks that last for days.
      Jupyter Notebook
      01200Updated Nov 12, 2025Nov 12, 2025
    • SR-Scientist: Scientific Equation Discovery With Agentic AI
      Python
      02900Updated Nov 7, 2025Nov 7, 2025
    • Context-Engineering-2.0

      Public
      2126100Updated Nov 6, 2025Nov 6, 2025
    • Scaling Deep Research via Reinforcement Learning in Real-world Environments.
      Python
      4668690Updated Oct 15, 2025Oct 15, 2025
    • LIMI

      Public
      LIMI: Less is More for Agency
      Python
      715860Updated Oct 14, 2025Oct 14, 2025
    • [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy
      Python
      47620Updated Oct 9, 2025Oct 9, 2025
    • DatasetResearch: Benchmarking Agent Systems for Demand-Driven Dataset Discovery
      Python
      02000Updated Sep 24, 2025Sep 24, 2025
    • Python
      0300Updated Sep 9, 2025Sep 9, 2025
    • Efficient Agent Training for Computer Use
      Python
      813500Updated Sep 5, 2025Sep 5, 2025
    • LIMO

      Public
      [COLM 2025] LIMO: Less is More for Reasoning
      Python
      521.1k60Updated Jul 30, 2025Jul 30, 2025
    • ASI4AI

      Public
      JavaScript
      1700Updated Jul 23, 2025Jul 23, 2025
    • Reproducible and flexible LLM evaluations for scientific reasoning.
      Python
      02600Updated Jul 23, 2025Jul 23, 2025
    • Revisiting Mid-training in the Era of Reinforcement Learning Scaling
      Jupyter Notebook
      1418250Updated Jul 23, 2025Jul 23, 2025
    • ProX

      Public
      [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
      Python
      1826420Updated Jul 8, 2025Jul 8, 2025
    • anole

      Public
      Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
      Python
      50818351Updated Jun 16, 2025Jun 16, 2025
    • Doodling our way to AGI ✏️ 🖼️ 🧠
      Python
      412010Updated May 29, 2025May 29, 2025
    • LIMOPro

      Public
      Python
      01310Updated May 27, 2025May 27, 2025
    • DynToM

      Public
      Python
      01000Updated May 26, 2025May 26, 2025
    • ToRL

      Public
      Python
      17327230Updated May 24, 2025May 24, 2025
    • PC-Agent

      Public
      PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital World
      Python
      2930821Updated May 21, 2025May 21, 2025
    • Generative AI Act II: Test Time Scaling Drives Cognition Engineering
      Python
      920910Updated Apr 22, 2025Apr 22, 2025
    • MAYE

      Public
      Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme
      Python
      814630Updated Apr 9, 2025Apr 9, 2025
    • Python
      0100Updated Apr 5, 2025Apr 5, 2025
    • MathPile

      Public
      [NeurlPS D&B 2024] Generative AI for Math: MathPile
      Python
      2241900Updated Apr 4, 2025Apr 4, 2025