CVPR 2025 Robotics Papers

A curated list of ~100 robotics / robotics-adjacent research papers presented at CVPR 2025.

This year’s themes include robot manipulation, vision-language models, simulation, human-robot interaction, 3D perception, motion prediction, and embodied AI agents. Explore interactive papers on Bytez or browse the virtual posters on CVPR.

Note: Bytez is working to make CVPR vision models accessible for free through our Inference API. More to come ✨

Paper List

Paper Title	TLDR	Virtual Poster	Interactive Paper (Bytez)
robotwin: dual-arm robot benchmark with generative digital twins	Introduces a benchmark using generative digital twins to evaluate dual-arm robotic manipulation.	CVPR	Bytez
prof. robot: differentiable robot rendering without static and self-collisions	Proposes a differentiable rendering method to simulate robot motion avoiding collisions.	CVPR	Bytez
lift3d policy: lifting 2d foundation models for robust 3d robotic manipulation	Adapts 2D pretrained models to improve robustness in 3D robotic manipulation tasks.	CVPR	Bytez
robospatial: teaching spatial understanding to 2d and 3d vision-language models for robotics	Enhances vision-language models with spatial reasoning capabilities for robotics.	CVPR	Bytez
a data-centric revisit of pre-trained vision models for robot learning	Re-examines the impact of data-centric approaches on pre-trained vision models for robots.	CVPR	Bytez
autourdf: unsupervised robot modeling from point cloud frames using cluster registration	Uses unsupervised cluster registration of point clouds for robot modeling.	CVPR	Bytez
robopepp: vision-based robot pose and joint angle estimation through embedding predictive pre-training	Embedding predictive pre-training enhances vision-based robot pose and joint angle estimation.	CVPR	Bytez
3d-mvp: 3d multiview pretraining for manipulation	Introduces a multiview 3D pretraining method to improve manipulation skills.	CVPR	Bytez
mitigating the human-robot domain discrepancy in visual pre-training for robotic manipulation	Addresses domain gap in visuals between humans and robots to enhance manipulation pre-training.	CVPR	Bytez
spatial-temporal graph diffusion policy with kinematic modeling for bimanual robotic manipulation	Combines graph diffusion and kinematic modeling for coordinated bimanual manipulation.	CVPR	Bytez
omnimanip: towards general robotic manipulation via object-centric interaction primitives as spatial constraints	Develops general manipulation techniques using object-centric spatial interaction primitives.	CVPR	Bytez
robobrain: a unified brain model for robotic manipulation from abstract to concrete	Proposes a unified brain-inspired model linking abstract reasoning to concrete manipulation actions.	CVPR	Bytez
phoenix: a motion-based self-reflection framework for fine-grained robotic action correction	Introduces a framework where robots self-reflect on motion for precise action correction.	CVPR	Bytez
let humanoids hike! integrative skill development on complex trails	Explores humanoid skill development for navigating complex hiking trails.	CVPR	Bytez
towards autonomous micromobility through scalable urban simulation	Develops scalable urban simulations to aid autonomous micromobility systems.	CVPR	Bytez
flowram: grounding flow matching policy with region-aware mamba framework for robotic manipulation	Introduces a flow matching policy grounded with region-aware frameworks for manipulation.	CVPR	Bytez
g3flow: generative 3d semantic flow for pose-aware and generalizable object manipulation	Proposes a generative 3D semantic flow model for pose-aware object manipulation.	CVPR	Bytez
vidbot: learning generalizable 3d actions from in-the-wild 2d human videos for zero-shot robotic manipulation	Leverages 2D human videos to learn generalizable 3D robotic actions without retraining.	CVPR	Bytez
genmanip: llm-driven simulation for generalizable instruction-following manipulation	Uses large language model driven simulation for instruction-following robotic manipulation.	CVPR	Bytez
robotic visual instruction	Develops systems enabling robots to learn tasks from visual instructions.	CVPR	Bytez
reasoning in visual navigation of end-to-end trained agents: a dynamical systems approach	Applies dynamical systems theory to improve reasoning in end-to-end visual navigation agents.	CVPR	Bytez
maniptrans: efficient dexterous bimanual manipulation transfer via residual learning	Uses residual learning to transfer dexterous bimanual manipulation skills efficiently.	CVPR	Bytez
two by two: learning multi-task pairwise objects assembly for generalizable robot manipulation	Proposes multi-task learning for pairwise object assembly to improve manipulation generalization.	CVPR	Bytez
think small, act big: primitive prompt learning for lifelong robot manipulation	Introduces primitive prompt learning to enable lifelong robot manipulation skills.	CVPR	Bytez
universal actions for enhanced embodied foundation models	Defines universal actions to improve embodied AI foundation models for robotics.	CVPR	Bytez
physvlm: enabling visual language models to understand robotic physical reachability	Enhances visual language models to predict physically reachable areas for robots.	CVPR	Bytez
unigrasptransformer: simplified policy distillation for scalable dexterous robotic grasping	Simplifies policy distillation to scale dexterous robotic grasping with transformers.	CVPR	Bytez
roboground: robotic manipulation with grounded vision-language priors	Integrates grounded vision-language priors to enhance robotic manipulation tasks.	CVPR	Bytez
cap-net: a unified network for 6d pose and size estimation of categorical articulated parts from a single rgb-d image	Estimates 6D pose and size of articulated object parts from RGB-D images using a unified network.	CVPR	Bytez
pdfactor: learning tri-perspective view policy diffusion field for multi-task robotic manipulation	Learns policies using tri-perspective view diffusion fields for diverse robotic tasks.	CVPR	Bytez
mobileh2r: learning generalizable human to mobile robot handover exclusively from scalable and diverse synthetic data	Trains human-to-robot handover skills using diverse synthetic data for generalization.	CVPR	Bytez
skillmimic: learning basketball interaction skills from demonstrations	Uses demonstration learning to acquire basketball interaction skills for robots.	CVPR	Bytez
object-centric prompt-driven vision-language-action model for robotic manipulation	Employs object-centric prompts to drive vision-language-action models in robotics.	CVPR	Bytez
momanipvla: transferring vision-language-action models for general mobile manipulation	Transfers vision-language-action models to improve mobile robot manipulation.	CVPR	Bytez
tartan imu: a light foundation model for inertial positioning in robotics	Proposes a lightweight foundation model for inertial positioning in robotics.	CVPR	Bytez
grove: a generalized reward for learning open-vocabulary physical skill	Develops a generalized reward system to learn open-vocabulary physical skills.	CVPR	Bytez
neural motion simulator pushing the limit of world models in reinforcement learning	Presents a neural motion simulator advancing world models for reinforcement learning.	CVPR	Bytez
intermimic: towards universal whole-body control for physics-based human-object interactions	Works toward universal whole-body control for human-object interaction in physics simulations.	CVPR	Bytez
zerograsp: zero-shot shape reconstruction enabled robotic grasping	Enables zero-shot robotic grasping by reconstructing object shape on the fly.	CVPR	Bytez
code-as-monitor: constraint-aware visual programming for reactive and proactive robotic failure detection	Uses visual programming to monitor and detect robotic failures proactively.	CVPR	Bytez
graphmimic: graph-to-graphs generative modeling from videos for policy learning	Generates graph representations from videos to aid policy learning.	CVPR	Bytez
robosense: large-scale dataset and benchmark for egocentric robot perception and navigation in crowded and unstructured environments	Provides a large dataset and benchmark for egocentric robot perception/navigation in complex settings.	CVPR	Bytez
magma: a foundation model for multimodal ai agents	Introduces a foundation model supporting multimodal AI agents for robotics.	CVPR	Bytez
dynscene: scalable generation of dynamic robotic manipulation scenes for embodied ai	Scales generation of dynamic scenes for embodied AI robotic manipulation.	CVPR	Bytez
dexhanddiff: interaction-aware diffusion planning for adaptive dexterous manipulation	Uses diffusion planning aware of interactions for adaptive dexterous manipulation.	CVPR	Bytez
r2c: mapping room to chessboard to unlock llm as low-level action planner	Maps spatial environments to chessboard representations to enable LLMs as action planners.	CVPR	Bytez
tra-moe: learning trajectory prediction model from multiple domains for adaptive policy conditioning	Learns multi-domain trajectory prediction for adaptive robotic policy conditioning.	CVPR	Bytez
towards visual discrimination and reasoning of real-world physical dynamics: physics-grounded anomaly detection	Uses physics-grounded models to detect anomalies in physical dynamics visually.	CVPR	Bytez
afforddp: generalizable diffusion policy with transferable affordance	Develops a diffusion policy leveraging transferable affordances for generalization.	CVPR	Bytez
generating 6dof object manipulation trajectories from action description in egocentric vision	Generates 6DoF manipulation trajectories based on egocentric action descriptions.	CVPR	Bytez
taste-rob: advancing video generation of task-oriented hand-object interaction for generalizable robotic manipulation	Improves video generation of task-specific hand-object interactions for manipulation.	CVPR	Bytez
partrm: modeling part-level dynamics with large cross-state reconstruction model	Models dynamics at part-level using large reconstruction models across states.	CVPR	Bytez
meshart: generating articulated meshes with structure-guided transformers	Uses transformers to generate articulated 3D meshes guided by structure.	CVPR	Bytez
iaao: interactive affordance learning for articulated objects in 3d environments	Learns interactive affordances for articulated objects in 3D spaces.	CVPR	Bytez
tokenhsi: unified synthesis of physical human-scene interactions through task tokenization	Synthesizes human-scene interactions using task tokenization in a unified framework.	CVPR	Bytez
cot-vla: visual chain-of-thought reasoning for vision-language-action models	Introduces chain-of-thought reasoning in vision-language-action models.	CVPR	Bytez
pidloc: cross-view pose optimization network inspired by pid controllers	Proposes a pose optimization network inspired by PID control theory for localization.	CVPR	Bytez
fiction: 4d future interaction prediction from video	Predicts future 4D human-object interactions from video data.	CVPR	Bytez
how do i do that? synthesizing 3d hand motion and contacts for everyday interactions	Synthesizes 3D hand motion and contact points for everyday tasks.	CVPR	Bytez
from multimodal llms to generalist embodied agents: methods and lessons	Surveys methods to build generalist embodied agents from multimodal LLMs.	CVPR	Bytez
bimart: a unified approach for the synthesis of 3d bimanual interaction with articulated objects	Synthesizes 3D bimanual interactions with articulated objects using a unified approach.	CVPR	Bytez
category-agnostic neural object rigging	Develops neural rigging methods that work across object categories.	CVPR	Bytez
tango: training-free embodied ai agents for open-world tasks	Proposes training-free AI agents for open-world embodied tasks.	CVPR	Bytez
rocket-1: mastering open-world interaction with visual-temporal context prompting	Uses visual-temporal prompts to master interactions in open-world environments.	CVPR	Bytez
crocodl: cross-device collaborative dataset for localization	Provides a cross-device dataset to improve collaborative localization.	CVPR	Bytez
handos: 3d hand reconstruction in one stage	Proposes a one-stage method for 3D hand reconstruction.	CVPR	Bytez
multi-modal knowledge distillation-based human trajectory forecasting	Uses multimodal knowledge distillation to forecast human trajectories.	CVPR	Bytez
citywalker: learning embodied urban navigation from web-scale videos	Learns urban navigation skills from large-scale video data for embodied agents.	CVPR	Bytez
ske-layout: spatial knowledge enhanced layout generation with llms	Enhances layout generation using spatial knowledge integrated with large language models.	CVPR	Bytez
rethinking correspondence-based category-level object pose estimation	Revisits category-level object pose estimation via correspondence learning.	CVPR	Bytez
structure-aware correspondence learning for relative pose estimation	Incorporates structure-awareness into correspondence learning for pose estimation.	CVPR	Bytez
learning physics-based full-body human reaching and grasping from brief walking references	Learns full-body reaching and grasping motions from brief reference motions.	CVPR	Bytez
leveraging global stereo consistency for category-level shape and 6d pose estimation from stereo images	Uses stereo image consistency to estimate shape and 6D pose at category level.	CVPR	Bytez
semgeomo: dynamic contextual human motion generation with semantic and geometric guidance	Generates human motion dynamically with semantic and geometric context.	CVPR	Bytez
chainhoi: joint-based kinematic chain modeling for human-object interaction generation	Models human-object interactions with joint-based kinematic chains.	CVPR	Bytez
lal: enhancing 3d human motion prediction with latency-aware auxiliary learning	Improves 3D human motion prediction using latency-aware auxiliary learning.	CVPR	Bytez
poly-autoregressive prediction for modeling interactions	Uses poly-autoregressive models for capturing complex interactions.	CVPR	Bytez
hand-held object reconstruction from rgb video with dynamic interaction	Reconstructs handheld objects from RGB video considering dynamic interactions.	CVPR	Bytez
dyn-hamr: recovering 4d interacting hand motion from a dynamic camera	Recovers 4D hand motion from videos captured by dynamic cameras.	CVPR	Bytez
gce-pose: global context enhancement for category-level object pose estimation	Enhances object pose estimation by integrating global context.	CVPR	Bytez
guiding human-object interactions with rich geometry and relations	Uses detailed geometry and relational data to guide human-object interactions.	CVPR	Bytez
monotakd: teaching assistant knowledge distillation for monocular 3d object detection	Applies knowledge distillation to improve monocular 3D object detection.	CVPR	Bytez
comrope: scalable and robust rotary position embedding parameterized by trainable commuting angle matrices	Introduces a novel rotary position embedding method using angle matrices.	CVPR	Bytez
boe-vit: boosting orientation estimation with equivariance in self-supervised 3d subtomogram alignment	Improves orientation estimation by enforcing equivariance in self-supervised alignment.	CVPR	Bytez
activating sparse part concepts for 3d class incremental learning	Activates sparse part concepts to enable incremental learning in 3D classification.	CVPR	Bytez
eee-bench: a comprehensive multimodal electrical and electronics engineering benchmark	Presents a multimodal benchmark dataset focused on electrical engineering tasks.	CVPR	Bytez
neuron: learning context-aware evolving representations for zero-shot skeleton action recognition	Learns evolving, context-aware representations to enable zero-shot skeleton action recognition.	CVPR	Bytez
open-vocabulary functional 3d scene graphs for real-world indoor spaces	Builds functional 3D scene graphs with open vocabulary understanding for indoor spaces.	CVPR	Bytez
escape: equivariant shape completion via anchor point encoding	Uses anchor point encoding to perform equivariant 3D shape completion.	CVPR	Bytez
unigoal: towards universal zero-shot goal-oriented navigation	Aims for universal zero-shot goal-directed navigation across environments.	CVPR	Bytez
functionality understanding and segmentation in 3d scenes	Studies segmentation and understanding of functionality in 3D scenes.	CVPR	Bytez
mast3r-slam: real-time dense slam with 3d reconstruction priors	Proposes a real-time dense SLAM system enhanced by 3D reconstruction priors.	CVPR	Bytez
recovering dynamic 3d sketches from videos	Recovers dynamic 3D sketches of scenes from video inputs.	CVPR	Bytez
latte-mv: learning to anticipate table tennis hits from monocular videos	Anticipates table tennis hits by learning from monocular video footage.	CVPR	Bytez
probing the mid-level vision capabilities of self-supervised learning	Investigates mid-level visual features learned via self-supervision.	CVPR	Bytez
horp: human-object relation priors guided hoi detection	Enhances human-object interaction detection with relation priors.	CVPR	Bytez
ecbench: can multi-modal foundation models understand the egocentric world? a holistic embodied cognition benchmark	Benchmarks multimodal foundation models on egocentric embodied cognition tasks.	CVPR	Bytez
cross-modal distillation for 2d/3d multi-object discovery from 2d motion	Uses cross-modal distillation to detect multiple objects in 2D/3D from motion cues.	CVPR	Bytez
crisp: object pose and shape estimation with test-time adaptation	Improves pose and shape estimation by adapting models at test time.	CVPR	Bytez
exploration-driven generative interactive environments	Generates interactive environments driven by exploration objectives.	CVPR	Bytez
on-device self-supervised learning of low-latency monocular depth from only events	Enables low-latency monocular depth learning on-device from event data.	CVPR	Bytez
drawer: digital reconstruction and articulation with environment realism	Performs digital reconstruction with articulated realism in environments.	CVPR	Bytez
crossover: 3d scene cross-modal alignment	Aligns 3D scenes across multiple modalities.	CVPR	Bytez
timotion: temporal and interactive framework for efficient human-human motion generation	Creates efficient temporal models for interactive human-human motion synthesis.	CVPR	Bytez
finephys: fine-grained human action generation by explicitly incorporating physical laws for effective skeletal guidance	Generates human actions guided by physical laws for skeletal realism.	CVPR	Bytez
homogeneous dynamics space for heterogeneous humans	Models human motion in a homogeneous dynamics space despite human heterogeneity.	CVPR	Bytez
from sparse signal to smooth motion: real-time motion generation with rolling prediction models	Generates smooth motions in real time from sparse input signals using rolling prediction.	CVPR	Bytez
videoworld: exploring knowledge learning from unlabeled videos	Explores how AI can learn world knowledge from unlabeled video data.	CVPR	Bytez
bigs: bimanual category-agnostic interaction reconstruction from monocular videos via 3d gaussian splatting	Reconstructs bimanual interactions from monocular video using 3D gaussian splatting.	CVPR	Bytez
microvqa: a multimodal reasoning benchmark for microscopy-based scientific research	Provides a benchmark for multimodal reasoning in microscopy scientific tasks.	CVPR	Bytez
spatial457: a diagnostic benchmark for 6d spatial reasoning of large mutimodal models	Offers a benchmark for 6D spatial reasoning ability in large multimodal models.	CVPR	Bytez
artformer: controllable generation of diverse 3d articulated objects	Enables controllable generation of diverse articulated 3D objects.	CVPR	Bytez
revealing key details to see differences: a novel prototypical perspective for skeleton-based action recognition	Proposes a novel prototypical method highlighting key details for skeleton action recognition.	CVPR	Bytez
gazing into missteps: leveraging eye-gaze for unsupervised mistake detection in egocentric videos of skilled human activities	Uses eye-gaze data to detect mistakes unsupervised in skilled human activity videos.	CVPR	Bytez
learning partonomic 3d reconstruction from image collections	Learns part-level 3D reconstruction from collections of images.	CVPR	Bytez
vid2sim: realistic and interactive simulation from video for urban navigation	Converts video data into realistic interactive urban navigation simulations.	CVPR	Bytez
easyhoi: unleashing the power of large models for reconstructing hand-object interactions in the wild	Uses large models to reconstruct hand-object interactions in natural settings.	CVPR	Bytez
certified human trajectory prediction	Provides certified guarantees on predicted human trajectories.	CVPR	Bytez
4dtam: non-rigid tracking and mapping via dynamic surface gaussians	Tracks and maps non-rigid scenes using dynamic surface Gaussian models.	CVPR	Bytez
sasep: saliency-aware structured separation of geometry and feature for open set learning on point clouds	Separates geometry and features in point clouds for open set learning with saliency awareness.	CVPR	Bytez
cholectrack20: a multi-perspective tracking dataset for surgical tools	Provides a multi-perspective dataset for tracking surgical tools.	CVPR	Bytez
alien: implicit neural representations for human motion prediction under arbitrary latency	Predicts human motion using implicit neural representations handling latency.	CVPR	Bytez
ua-pose: uncertainty-aware 6d object pose estimation and online object completion with partial references	Estimates object pose with uncertainty and completes partial objects online.	CVPR	Bytez
magic-slam: multi-agent gaussian globally consistent slam	Introduces a multi-agent SLAM approach with globally consistent Gaussian modeling.	CVPR	Bytez
humocon: concept discovery for human motion understanding	Discovers concepts to enhance human motion understanding.	CVPR	Bytez
cospace: benchmarking continuous space perception ability for vision-language models	Benchmarks continuous space perception for vision-language models.	CVPR	Bytez
human motion instruction tuning	Tunes models to follow human motion instructions more effectively.	CVPR	Bytez
beyond human perception: understanding multi-object world from monocular view	Understands multi-object scenes from monocular visual input beyond human perception.	CVPR	Bytez
tokenmotion: decoupled motion control via token disentanglement for human-centric video generation	Controls human-centric video generation through disentangled motion tokens.	CVPR	Bytez
checkmanual: a new challenge and benchmark for manual-based appliance manipulation	Introduces a benchmark for manipulation of appliances using manuals.	CVPR	Bytez
vsnet: focusing on the linguistic characteristics of sign language	Analyzes linguistic features in sign language via VSNet.	CVPR	Bytez
collaborative tree search for enhancing embodied multi-agent collaboration	Uses collaborative tree search to improve multi-agent embodied collaboration.	CVPR	Bytez
continuous 3d perception model with persistent state	Develops a continuous 3D perception model maintaining persistent state.	CVPR	Bytez
vision-guided action: enhancing 3d human motion prediction with gaze-informed affordance in 3d scenes	Improves 3D human motion prediction using gaze-informed affordances.	CVPR	Bytez
interdyn: controllable interactive dynamics with video diffusion models	Models controllable interactive dynamics via video diffusion.	CVPR	Bytez
dragin3d: image editing by dragging in 3d space	Enables 3D image editing by dragging controls within 3D space.	CVPR	Bytez
unistd: towards unified spatio-temporal learning across diverse disciplines	Proposes a unified model for spatio-temporal learning across domains.	CVPR	Bytez
h-more: learning human-centric motion representation for action analysis	Learns motion representations centered on humans for action analysis.	CVPR	Bytez
closed-loop supervised fine-tuning of tokenized traffic models	Fine-tunes traffic prediction models in closed-loop supervision with tokenized inputs.	CVPR	Bytez
grae-3dmot: geometry relation-aware encoder for online 3d multi-object tracking	Encodes geometry relations to improve online 3D multi-object tracking.	CVPR	Bytez
echoworld: learning motion-aware world models for echocardiography probe guidance	Learns motion-aware models to guide echocardiography probes effectively.	CVPR	Bytez
videogem: training-free action grounding in videos	Grounds actions in video data without additional training.	CVPR	Bytez
reanimating images using neural representations of dynamic stimuli	Uses neural representations to reanimate static images dynamically.	CVPR	Bytez
uncertainty meets diversity: a comprehensive active learning framework for indoor 3d object detection	Combines uncertainty and diversity for active learning in 3D object detection.	CVPR	Bytez
hsi-gpt: a general-purpose large scene-motion-language model for human scene interaction	Proposes a large model combining scene, motion, and language for human-scene interactions.	CVPR	Bytez
simulator hc: regression-based online simulation of starting problem-solution pairs for homotopy continuation in geometric vision	Uses regression to simulate homotopy continuation in geometric vision online.	CVPR	Bytez
gem: a generalizable ego-vision multimodal world model for fine-grained ego-motion, object dynamics, and scene composition control	Introduces an ego-vision multimodal world model for fine-grained control.	CVPR	Bytez
gigahands: a massive annotated dataset of bimanual hand activities	Releases a large dataset of annotated bimanual hand activities.	CVPR	Bytez
posetraj: pose-aware trajectory control in video diffusion	Controls trajectory in video diffusion models using pose information.	CVPR	Bytez
logosp: local-global grouping of superpoints for unsupervised semantic segmentation of 3d point clouds	Proposes a method for semantic segmentation by grouping superpoints locally and globally.	CVPR	Bytez
pomp: physics-consistent motion generative model through phase manifolds	Generates motions consistent with physics using phase manifold modeling.	CVPR	Bytez
pose priors from language models	Extracts pose priors from large language models for robotics.	CVPR	Bytez
articulatedgs: self-supervised digital twin modeling of articulated objects using 3d gaussian splatting	Creates self-supervised digital twins of articulated objects with 3D Gaussian splatting.	CVPR	Bytez
dexgrasp anything: towards universal robotic dexterous grasping with physics awareness	Proposes a physics-aware approach for universal dexterous robotic grasping across diverse objects and scenarios.	CVPR	Bytez

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CVPR 2025 Robotics Papers

Paper List

About

Uh oh!

Releases

Packages

smallfryy/cvpr2025-robotics-papers

Folders and files

Latest commit

History

Repository files navigation

CVPR 2025 Robotics Papers

Paper List

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages