This is the official repository for the Detect Anything 3D in the Wild, a promptable 3D detection foundation model capable of detecting any novel object under arbitrary camera configurations using only monocular inputs
- 📌 TODO
- 🚀 Getting Started
- 📦 Checkpoints
- 📁 Dataset Preparation
- 🏋️♂️ Training
- 🔍 Inference
- 🌐 Launch Online Demo
- 📚 Citation
- Release full code
- Provide training and inference scripts
- TODO: Release the model weights
- TODO: Provide full conversion scripts for constructing DA3D locally
- TODO: Simplify the inference process
- TODO: Provide a tutorial for creating customized datasets and finetune
conda create -n detany3d python=3.8
conda activate detany3d
✅ (1) Install Segment Anything (SAM)
Follow the official instructions to install SAM and download its checkpoints.
✅ (2) Install UniDepth
Follow the UniDepth setup guide to compile and install all necessary packages.
✅ (3) Clone and configure GroundingDINO
git clone https://github.com/IDEA-Research/GroundingDINO.git
cd GroundingDINO
pip install -e .
👉 The exact dependency versions are listed in our
requirements.txt
Please download third-party checkpoints from the following sources:
- SAM checkpoint: Please download
sam_vit_h.pth
from the official SAM GitHub Releases - UniDepth / DINO checkpoints: Available via Google Drive
detany3d_private/
├── checkpoints/
│ ├── sam_ckpts/
│ │ └── sam_vit_h.pth
│ ├── unidepth_ckpts/
│ │ └── unidepth.pth
│ ├── dino_ckpts/
│ │ └── dino_swin_large.pth
│ └── detany3d_ckpts/
│ └── detany3d.pth
GroundingDINO's checkpoint should be downloaded from its official repo and placed as instructed in their documentation.
📩 The pretrained DetAny3D model weights (
detany3d_ckpts
) are not publicly released at this time.
If you are interested in using the model and collaborating, please contact us via email.
📬 Contact:
[email protected]
.
The data/
directory should follow the structure below:
data/
├── DA3D_pkls/ # DA3D processed pickle files
├── kitti/
│ ├── test_depth_front/
│ ├── ImageSets/
│ ├── training/
│ └── testing/
├── nuscenes/
| ├── nuscenes_depth/
│ └── samples/
├── 3RScan/
│ └── <token folders>/ # e.g., 10b17940-3938-...
├── hypersim/
| ├── depth_in_meter/
│ └── ai_XXX_YYY/ # e.g., ai_055_009
├── waymo/
│ └── kitti_format/ # KITTI-format data for Waymo
│ ├── validation_depth_front/
│ ├── ImageSets/
│ ├── training/
│ └── testing/
├── objectron/
│ ├── train/
│ └── test/
├── ARKitScenes/
│ ├── Training/
│ └── Validation/
├── cityscapes3d/
│ ├── depth/
│ └── leftImg8bit/
├── SUNRGBD/
│ ├── realsense/
│ ├── xtion/
| ├── kv1/
│ └── kv2/
The download for
kitti
,nuscenes
,hypersim
,objectron
,arkitscenes
, andsunrgbd
follow the Omni3D convention. Please refer to the Omni3D repository for details on how to organize and preprocess these datasets.
🗂️ The
DA3D_pkls
(minimal metadata for inference) can be downloaded from Google Drive.
🧩 Note: This release currently supports a minimal inference-only version. The conversion scripts of full dataset + all depth-related files will be provided later.
⚠️ Depth files are not required for inference. You can safely setdepth_path = None
in detany3d_dataset.py to bypass depth loading.
torchrun \
--nproc_per_node=8 \
--master_addr=${MASTER_ADDR} \
--master_port=${MASTER_PORT} \
--nnodes=8 \
--node_rank=${RANK} \
./train.py \
--config_path \
./detect_anything/configs/train.yaml
torchrun \
--nproc_per_node=8 \
--master_addr=${MASTER_ADDR} \
--master_port=${MASTER_PORT} \
--nnodes=1 \
--node_rank=${RANK} \
./train.py \
--config_path \
./detect_anything/configs/inference_indomain_gt_prompt.yaml
After inference, a file named {dataset}_output_results.json
will be generated in the exps/<your_exp_dir>/
directory.
⚠️ Due to compatibility issues betweenpytorch3d
and the current environment, we recommend copying the output JSON file into the evaluation script of repositories like Omni3D or OVMono3D for standardized metric evaluation.
TODO: Evaluation for zero-shot datasets currently requires manual modification of the Omni3D or OVMono3D repositories and is not yet fully supported here.
We plan to release a merged evaluation script in this repository to make direct evaluation more convenient in the future.
python ./deploy.py
If you find this repository useful, please consider citing:
@article{zhang2025detect,
title={Detect Anything 3D in the Wild},
author={Zhang, Hanxue and Jiang, Haoran and Yao, Qingsong and Sun, Yanan and Zhang, Renrui and Zhao, Hao and Li, Hongyang and Zhu, Hongzi and Yang, Zetong},
journal={arXiv preprint arXiv:2504.07958},
year={2025}
}