-
Institute of Science Tokyo
- Japan
- https://taishi-n324.github.io/
- @Setuna7777_2
- in/taishi-nakamura
Highlights
- Pro
Stars
🪐 The Sebulba architecture to scale reinforcement learning on Cloud TPUs in JAX
Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
A project to improve skills of large language models
Letting Claude Code develop his own MCP tools :)
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
Search-R1: An Efficient, Scalable RL Training Framework for Reasoning & Search Engine Calling interleaved LLM based on veRL
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
Computer gaming agents that run on your PC and laptops.
Official PyTorch implementation for "Large Language Diffusion Models"
Tailor-made Pokémon themes for your Hyper terminal
EasyR1: An Efficient, Scalable, Multi-Modality RL Training Framework based on veRL
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Playing Pokemon Red with Reinforcement Learning
Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym
DeepEP: an efficient expert-parallel communication library
Everything you want to know about Google Cloud TPU
[ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters