Skip to content

smallfryy/cvpr2025-robotics-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

CVPR 2025 Robotics Papers

A curated list of ~100 robotics / robotics-adjacent research papers presented at CVPR 2025.

This year’s themes include robot manipulation, vision-language models, simulation, human-robot interaction, 3D perception, motion prediction, and embodied AI agents. Explore interactive papers on Bytez or browse the virtual posters on CVPR.

Note: Bytez is working to make CVPR vision models accessible for free through our Inference API. More to come ✨

Paper List

Paper Title TLDR Virtual Poster Interactive Paper (Bytez)
robotwin: dual-arm robot benchmark with generative digital twins Introduces a benchmark using generative digital twins to evaluate dual-arm robotic manipulation. CVPR Bytez
prof. robot: differentiable robot rendering without static and self-collisions Proposes a differentiable rendering method to simulate robot motion avoiding collisions. CVPR Bytez
lift3d policy: lifting 2d foundation models for robust 3d robotic manipulation Adapts 2D pretrained models to improve robustness in 3D robotic manipulation tasks. CVPR Bytez
robospatial: teaching spatial understanding to 2d and 3d vision-language models for robotics Enhances vision-language models with spatial reasoning capabilities for robotics. CVPR Bytez
a data-centric revisit of pre-trained vision models for robot learning Re-examines the impact of data-centric approaches on pre-trained vision models for robots. CVPR Bytez
autourdf: unsupervised robot modeling from point cloud frames using cluster registration Uses unsupervised cluster registration of point clouds for robot modeling. CVPR Bytez
robopepp: vision-based robot pose and joint angle estimation through embedding predictive pre-training Embedding predictive pre-training enhances vision-based robot pose and joint angle estimation. CVPR Bytez
3d-mvp: 3d multiview pretraining for manipulation Introduces a multiview 3D pretraining method to improve manipulation skills. CVPR Bytez
mitigating the human-robot domain discrepancy in visual pre-training for robotic manipulation Addresses domain gap in visuals between humans and robots to enhance manipulation pre-training. CVPR Bytez
spatial-temporal graph diffusion policy with kinematic modeling for bimanual robotic manipulation Combines graph diffusion and kinematic modeling for coordinated bimanual manipulation. CVPR Bytez
omnimanip: towards general robotic manipulation via object-centric interaction primitives as spatial constraints Develops general manipulation techniques using object-centric spatial interaction primitives. CVPR Bytez
robobrain: a unified brain model for robotic manipulation from abstract to concrete Proposes a unified brain-inspired model linking abstract reasoning to concrete manipulation actions. CVPR Bytez
phoenix: a motion-based self-reflection framework for fine-grained robotic action correction Introduces a framework where robots self-reflect on motion for precise action correction. CVPR Bytez
let humanoids hike! integrative skill development on complex trails Explores humanoid skill development for navigating complex hiking trails. CVPR Bytez
towards autonomous micromobility through scalable urban simulation Develops scalable urban simulations to aid autonomous micromobility systems. CVPR Bytez
flowram: grounding flow matching policy with region-aware mamba framework for robotic manipulation Introduces a flow matching policy grounded with region-aware frameworks for manipulation. CVPR Bytez
g3flow: generative 3d semantic flow for pose-aware and generalizable object manipulation Proposes a generative 3D semantic flow model for pose-aware object manipulation. CVPR Bytez
vidbot: learning generalizable 3d actions from in-the-wild 2d human videos for zero-shot robotic manipulation Leverages 2D human videos to learn generalizable 3D robotic actions without retraining. CVPR Bytez
genmanip: llm-driven simulation for generalizable instruction-following manipulation Uses large language model driven simulation for instruction-following robotic manipulation. CVPR Bytez
robotic visual instruction Develops systems enabling robots to learn tasks from visual instructions. CVPR Bytez
reasoning in visual navigation of end-to-end trained agents: a dynamical systems approach Applies dynamical systems theory to improve reasoning in end-to-end visual navigation agents. CVPR Bytez
maniptrans: efficient dexterous bimanual manipulation transfer via residual learning Uses residual learning to transfer dexterous bimanual manipulation skills efficiently. CVPR Bytez
two by two: learning multi-task pairwise objects assembly for generalizable robot manipulation Proposes multi-task learning for pairwise object assembly to improve manipulation generalization. CVPR Bytez
think small, act big: primitive prompt learning for lifelong robot manipulation Introduces primitive prompt learning to enable lifelong robot manipulation skills. CVPR Bytez
universal actions for enhanced embodied foundation models Defines universal actions to improve embodied AI foundation models for robotics. CVPR Bytez
physvlm: enabling visual language models to understand robotic physical reachability Enhances visual language models to predict physically reachable areas for robots. CVPR Bytez
unigrasptransformer: simplified policy distillation for scalable dexterous robotic grasping Simplifies policy distillation to scale dexterous robotic grasping with transformers. CVPR Bytez
roboground: robotic manipulation with grounded vision-language priors Integrates grounded vision-language priors to enhance robotic manipulation tasks. CVPR Bytez
cap-net: a unified network for 6d pose and size estimation of categorical articulated parts from a single rgb-d image Estimates 6D pose and size of articulated object parts from RGB-D images using a unified network. CVPR Bytez
pdfactor: learning tri-perspective view policy diffusion field for multi-task robotic manipulation Learns policies using tri-perspective view diffusion fields for diverse robotic tasks. CVPR Bytez
mobileh2r: learning generalizable human to mobile robot handover exclusively from scalable and diverse synthetic data Trains human-to-robot handover skills using diverse synthetic data for generalization. CVPR Bytez
skillmimic: learning basketball interaction skills from demonstrations Uses demonstration learning to acquire basketball interaction skills for robots. CVPR Bytez
object-centric prompt-driven vision-language-action model for robotic manipulation Employs object-centric prompts to drive vision-language-action models in robotics. CVPR Bytez
momanipvla: transferring vision-language-action models for general mobile manipulation Transfers vision-language-action models to improve mobile robot manipulation. CVPR Bytez
tartan imu: a light foundation model for inertial positioning in robotics Proposes a lightweight foundation model for inertial positioning in robotics. CVPR Bytez
grove: a generalized reward for learning open-vocabulary physical skill Develops a generalized reward system to learn open-vocabulary physical skills. CVPR Bytez
neural motion simulator pushing the limit of world models in reinforcement learning Presents a neural motion simulator advancing world models for reinforcement learning. CVPR Bytez
intermimic: towards universal whole-body control for physics-based human-object interactions Works toward universal whole-body control for human-object interaction in physics simulations. CVPR Bytez
zerograsp: zero-shot shape reconstruction enabled robotic grasping Enables zero-shot robotic grasping by reconstructing object shape on the fly. CVPR Bytez
code-as-monitor: constraint-aware visual programming for reactive and proactive robotic failure detection Uses visual programming to monitor and detect robotic failures proactively. CVPR Bytez
graphmimic: graph-to-graphs generative modeling from videos for policy learning Generates graph representations from videos to aid policy learning. CVPR Bytez
robosense: large-scale dataset and benchmark for egocentric robot perception and navigation in crowded and unstructured environments Provides a large dataset and benchmark for egocentric robot perception/navigation in complex settings. CVPR Bytez
magma: a foundation model for multimodal ai agents Introduces a foundation model supporting multimodal AI agents for robotics. CVPR Bytez
dynscene: scalable generation of dynamic robotic manipulation scenes for embodied ai Scales generation of dynamic scenes for embodied AI robotic manipulation. CVPR Bytez
dexhanddiff: interaction-aware diffusion planning for adaptive dexterous manipulation Uses diffusion planning aware of interactions for adaptive dexterous manipulation. CVPR Bytez
r2c: mapping room to chessboard to unlock llm as low-level action planner Maps spatial environments to chessboard representations to enable LLMs as action planners. CVPR Bytez
tra-moe: learning trajectory prediction model from multiple domains for adaptive policy conditioning Learns multi-domain trajectory prediction for adaptive robotic policy conditioning. CVPR Bytez
towards visual discrimination and reasoning of real-world physical dynamics: physics-grounded anomaly detection Uses physics-grounded models to detect anomalies in physical dynamics visually. CVPR Bytez
afforddp: generalizable diffusion policy with transferable affordance Develops a diffusion policy leveraging transferable affordances for generalization. CVPR Bytez
generating 6dof object manipulation trajectories from action description in egocentric vision Generates 6DoF manipulation trajectories based on egocentric action descriptions. CVPR Bytez
taste-rob: advancing video generation of task-oriented hand-object interaction for generalizable robotic manipulation Improves video generation of task-specific hand-object interactions for manipulation. CVPR Bytez
partrm: modeling part-level dynamics with large cross-state reconstruction model Models dynamics at part-level using large reconstruction models across states. CVPR Bytez
meshart: generating articulated meshes with structure-guided transformers Uses transformers to generate articulated 3D meshes guided by structure. CVPR Bytez
iaao: interactive affordance learning for articulated objects in 3d environments Learns interactive affordances for articulated objects in 3D spaces. CVPR Bytez
tokenhsi: unified synthesis of physical human-scene interactions through task tokenization Synthesizes human-scene interactions using task tokenization in a unified framework. CVPR Bytez
cot-vla: visual chain-of-thought reasoning for vision-language-action models Introduces chain-of-thought reasoning in vision-language-action models. CVPR Bytez
pidloc: cross-view pose optimization network inspired by pid controllers Proposes a pose optimization network inspired by PID control theory for localization. CVPR Bytez
fiction: 4d future interaction prediction from video Predicts future 4D human-object interactions from video data. CVPR Bytez
how do i do that? synthesizing 3d hand motion and contacts for everyday interactions Synthesizes 3D hand motion and contact points for everyday tasks. CVPR Bytez
from multimodal llms to generalist embodied agents: methods and lessons Surveys methods to build generalist embodied agents from multimodal LLMs. CVPR Bytez
bimart: a unified approach for the synthesis of 3d bimanual interaction with articulated objects Synthesizes 3D bimanual interactions with articulated objects using a unified approach. CVPR Bytez
category-agnostic neural object rigging Develops neural rigging methods that work across object categories. CVPR Bytez
tango: training-free embodied ai agents for open-world tasks Proposes training-free AI agents for open-world embodied tasks. CVPR Bytez
rocket-1: mastering open-world interaction with visual-temporal context prompting Uses visual-temporal prompts to master interactions in open-world environments. CVPR Bytez
crocodl: cross-device collaborative dataset for localization Provides a cross-device dataset to improve collaborative localization. CVPR Bytez
handos: 3d hand reconstruction in one stage Proposes a one-stage method for 3D hand reconstruction. CVPR Bytez
multi-modal knowledge distillation-based human trajectory forecasting Uses multimodal knowledge distillation to forecast human trajectories. CVPR Bytez
citywalker: learning embodied urban navigation from web-scale videos Learns urban navigation skills from large-scale video data for embodied agents. CVPR Bytez
ske-layout: spatial knowledge enhanced layout generation with llms Enhances layout generation using spatial knowledge integrated with large language models. CVPR Bytez
rethinking correspondence-based category-level object pose estimation Revisits category-level object pose estimation via correspondence learning. CVPR Bytez
structure-aware correspondence learning for relative pose estimation Incorporates structure-awareness into correspondence learning for pose estimation. CVPR Bytez
learning physics-based full-body human reaching and grasping from brief walking references Learns full-body reaching and grasping motions from brief reference motions. CVPR Bytez
leveraging global stereo consistency for category-level shape and 6d pose estimation from stereo images Uses stereo image consistency to estimate shape and 6D pose at category level. CVPR Bytez
semgeomo: dynamic contextual human motion generation with semantic and geometric guidance Generates human motion dynamically with semantic and geometric context. CVPR Bytez
chainhoi: joint-based kinematic chain modeling for human-object interaction generation Models human-object interactions with joint-based kinematic chains. CVPR Bytez
lal: enhancing 3d human motion prediction with latency-aware auxiliary learning Improves 3D human motion prediction using latency-aware auxiliary learning. CVPR Bytez
poly-autoregressive prediction for modeling interactions Uses poly-autoregressive models for capturing complex interactions. CVPR Bytez
hand-held object reconstruction from rgb video with dynamic interaction Reconstructs handheld objects from RGB video considering dynamic interactions. CVPR Bytez
dyn-hamr: recovering 4d interacting hand motion from a dynamic camera Recovers 4D hand motion from videos captured by dynamic cameras. CVPR Bytez
gce-pose: global context enhancement for category-level object pose estimation Enhances object pose estimation by integrating global context. CVPR Bytez
guiding human-object interactions with rich geometry and relations Uses detailed geometry and relational data to guide human-object interactions. CVPR Bytez
monotakd: teaching assistant knowledge distillation for monocular 3d object detection Applies knowledge distillation to improve monocular 3D object detection. CVPR Bytez
comrope: scalable and robust rotary position embedding parameterized by trainable commuting angle matrices Introduces a novel rotary position embedding method using angle matrices. CVPR Bytez
boe-vit: boosting orientation estimation with equivariance in self-supervised 3d subtomogram alignment Improves orientation estimation by enforcing equivariance in self-supervised alignment. CVPR Bytez
activating sparse part concepts for 3d class incremental learning Activates sparse part concepts to enable incremental learning in 3D classification. CVPR Bytez
eee-bench: a comprehensive multimodal electrical and electronics engineering benchmark Presents a multimodal benchmark dataset focused on electrical engineering tasks. CVPR Bytez
neuron: learning context-aware evolving representations for zero-shot skeleton action recognition Learns evolving, context-aware representations to enable zero-shot skeleton action recognition. CVPR Bytez
open-vocabulary functional 3d scene graphs for real-world indoor spaces Builds functional 3D scene graphs with open vocabulary understanding for indoor spaces. CVPR Bytez
escape: equivariant shape completion via anchor point encoding Uses anchor point encoding to perform equivariant 3D shape completion. CVPR Bytez
unigoal: towards universal zero-shot goal-oriented navigation Aims for universal zero-shot goal-directed navigation across environments. CVPR Bytez
functionality understanding and segmentation in 3d scenes Studies segmentation and understanding of functionality in 3D scenes. CVPR Bytez
mast3r-slam: real-time dense slam with 3d reconstruction priors Proposes a real-time dense SLAM system enhanced by 3D reconstruction priors. CVPR Bytez
recovering dynamic 3d sketches from videos Recovers dynamic 3D sketches of scenes from video inputs. CVPR Bytez
latte-mv: learning to anticipate table tennis hits from monocular videos Anticipates table tennis hits by learning from monocular video footage. CVPR Bytez
probing the mid-level vision capabilities of self-supervised learning Investigates mid-level visual features learned via self-supervision. CVPR Bytez
horp: human-object relation priors guided hoi detection Enhances human-object interaction detection with relation priors. CVPR Bytez
ecbench: can multi-modal foundation models understand the egocentric world? a holistic embodied cognition benchmark Benchmarks multimodal foundation models on egocentric embodied cognition tasks. CVPR Bytez
cross-modal distillation for 2d/3d multi-object discovery from 2d motion Uses cross-modal distillation to detect multiple objects in 2D/3D from motion cues. CVPR Bytez
crisp: object pose and shape estimation with test-time adaptation Improves pose and shape estimation by adapting models at test time. CVPR Bytez
exploration-driven generative interactive environments Generates interactive environments driven by exploration objectives. CVPR Bytez
on-device self-supervised learning of low-latency monocular depth from only events Enables low-latency monocular depth learning on-device from event data. CVPR Bytez
drawer: digital reconstruction and articulation with environment realism Performs digital reconstruction with articulated realism in environments. CVPR Bytez
crossover: 3d scene cross-modal alignment Aligns 3D scenes across multiple modalities. CVPR Bytez
timotion: temporal and interactive framework for efficient human-human motion generation Creates efficient temporal models for interactive human-human motion synthesis. CVPR Bytez
finephys: fine-grained human action generation by explicitly incorporating physical laws for effective skeletal guidance Generates human actions guided by physical laws for skeletal realism. CVPR Bytez
homogeneous dynamics space for heterogeneous humans Models human motion in a homogeneous dynamics space despite human heterogeneity. CVPR Bytez
from sparse signal to smooth motion: real-time motion generation with rolling prediction models Generates smooth motions in real time from sparse input signals using rolling prediction. CVPR Bytez
videoworld: exploring knowledge learning from unlabeled videos Explores how AI can learn world knowledge from unlabeled video data. CVPR Bytez
bigs: bimanual category-agnostic interaction reconstruction from monocular videos via 3d gaussian splatting Reconstructs bimanual interactions from monocular video using 3D gaussian splatting. CVPR Bytez
microvqa: a multimodal reasoning benchmark for microscopy-based scientific research Provides a benchmark for multimodal reasoning in microscopy scientific tasks. CVPR Bytez
spatial457: a diagnostic benchmark for 6d spatial reasoning of large mutimodal models Offers a benchmark for 6D spatial reasoning ability in large multimodal models. CVPR Bytez
artformer: controllable generation of diverse 3d articulated objects Enables controllable generation of diverse articulated 3D objects. CVPR Bytez
revealing key details to see differences: a novel prototypical perspective for skeleton-based action recognition Proposes a novel prototypical method highlighting key details for skeleton action recognition. CVPR Bytez
gazing into missteps: leveraging eye-gaze for unsupervised mistake detection in egocentric videos of skilled human activities Uses eye-gaze data to detect mistakes unsupervised in skilled human activity videos. CVPR Bytez
learning partonomic 3d reconstruction from image collections Learns part-level 3D reconstruction from collections of images. CVPR Bytez
vid2sim: realistic and interactive simulation from video for urban navigation Converts video data into realistic interactive urban navigation simulations. CVPR Bytez
easyhoi: unleashing the power of large models for reconstructing hand-object interactions in the wild Uses large models to reconstruct hand-object interactions in natural settings. CVPR Bytez
certified human trajectory prediction Provides certified guarantees on predicted human trajectories. CVPR Bytez
4dtam: non-rigid tracking and mapping via dynamic surface gaussians Tracks and maps non-rigid scenes using dynamic surface Gaussian models. CVPR Bytez
sasep: saliency-aware structured separation of geometry and feature for open set learning on point clouds Separates geometry and features in point clouds for open set learning with saliency awareness. CVPR Bytez
cholectrack20: a multi-perspective tracking dataset for surgical tools Provides a multi-perspective dataset for tracking surgical tools. CVPR Bytez
alien: implicit neural representations for human motion prediction under arbitrary latency Predicts human motion using implicit neural representations handling latency. CVPR Bytez
ua-pose: uncertainty-aware 6d object pose estimation and online object completion with partial references Estimates object pose with uncertainty and completes partial objects online. CVPR Bytez
magic-slam: multi-agent gaussian globally consistent slam Introduces a multi-agent SLAM approach with globally consistent Gaussian modeling. CVPR Bytez
humocon: concept discovery for human motion understanding Discovers concepts to enhance human motion understanding. CVPR Bytez
cospace: benchmarking continuous space perception ability for vision-language models Benchmarks continuous space perception for vision-language models. CVPR Bytez
human motion instruction tuning Tunes models to follow human motion instructions more effectively. CVPR Bytez
beyond human perception: understanding multi-object world from monocular view Understands multi-object scenes from monocular visual input beyond human perception. CVPR Bytez
tokenmotion: decoupled motion control via token disentanglement for human-centric video generation Controls human-centric video generation through disentangled motion tokens. CVPR Bytez
checkmanual: a new challenge and benchmark for manual-based appliance manipulation Introduces a benchmark for manipulation of appliances using manuals. CVPR Bytez
vsnet: focusing on the linguistic characteristics of sign language Analyzes linguistic features in sign language via VSNet. CVPR Bytez
collaborative tree search for enhancing embodied multi-agent collaboration Uses collaborative tree search to improve multi-agent embodied collaboration. CVPR Bytez
continuous 3d perception model with persistent state Develops a continuous 3D perception model maintaining persistent state. CVPR Bytez
vision-guided action: enhancing 3d human motion prediction with gaze-informed affordance in 3d scenes Improves 3D human motion prediction using gaze-informed affordances. CVPR Bytez
interdyn: controllable interactive dynamics with video diffusion models Models controllable interactive dynamics via video diffusion. CVPR Bytez
dragin3d: image editing by dragging in 3d space Enables 3D image editing by dragging controls within 3D space. CVPR Bytez
unistd: towards unified spatio-temporal learning across diverse disciplines Proposes a unified model for spatio-temporal learning across domains. CVPR Bytez
h-more: learning human-centric motion representation for action analysis Learns motion representations centered on humans for action analysis. CVPR Bytez
closed-loop supervised fine-tuning of tokenized traffic models Fine-tunes traffic prediction models in closed-loop supervision with tokenized inputs. CVPR Bytez
grae-3dmot: geometry relation-aware encoder for online 3d multi-object tracking Encodes geometry relations to improve online 3D multi-object tracking. CVPR Bytez
echoworld: learning motion-aware world models for echocardiography probe guidance Learns motion-aware models to guide echocardiography probes effectively. CVPR Bytez
videogem: training-free action grounding in videos Grounds actions in video data without additional training. CVPR Bytez
reanimating images using neural representations of dynamic stimuli Uses neural representations to reanimate static images dynamically. CVPR Bytez
uncertainty meets diversity: a comprehensive active learning framework for indoor 3d object detection Combines uncertainty and diversity for active learning in 3D object detection. CVPR Bytez
hsi-gpt: a general-purpose large scene-motion-language model for human scene interaction Proposes a large model combining scene, motion, and language for human-scene interactions. CVPR Bytez
simulator hc: regression-based online simulation of starting problem-solution pairs for homotopy continuation in geometric vision Uses regression to simulate homotopy continuation in geometric vision online. CVPR Bytez
gem: a generalizable ego-vision multimodal world model for fine-grained ego-motion, object dynamics, and scene composition control Introduces an ego-vision multimodal world model for fine-grained control. CVPR Bytez
gigahands: a massive annotated dataset of bimanual hand activities Releases a large dataset of annotated bimanual hand activities. CVPR Bytez
posetraj: pose-aware trajectory control in video diffusion Controls trajectory in video diffusion models using pose information. CVPR Bytez
logosp: local-global grouping of superpoints for unsupervised semantic segmentation of 3d point clouds Proposes a method for semantic segmentation by grouping superpoints locally and globally. CVPR Bytez
pomp: physics-consistent motion generative model through phase manifolds Generates motions consistent with physics using phase manifold modeling. CVPR Bytez
pose priors from language models Extracts pose priors from large language models for robotics. CVPR Bytez
articulatedgs: self-supervised digital twin modeling of articulated objects using 3d gaussian splatting Creates self-supervised digital twins of articulated objects with 3D Gaussian splatting. CVPR Bytez
dexgrasp anything: towards universal robotic dexterous grasping with physics awareness Proposes a physics-aware approach for universal dexterous robotic grasping across diverse objects and scenarios. CVPR Bytez

About

100 robotics papers from CVPR 2025 - TLDRs, virtual posters, and interactive papers on Bytez

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published