-
Notifications
You must be signed in to change notification settings - Fork 184
Description
Hi, thanks for the great work on MapAnything!
I'm trying to reproduce the camera pose estimation results on the ScanNet test set and noticed some unexpected behavior. I'm using the Apache model (facebook/map-anything-apache) with the following setup:
model = MapAnything.from_pretrained("facebook/map-anything-apache").to(device) predictions = model.infer(views, memory_efficient_inference=True, use_amp=True, amp_dtype="bf16")
During visualization, the predicted depth maps appear fragmentary/incomplete in some regions. The ATE on ScanNet test scenes is also higher than expected compared to other methods. Are these expected?
I also noticed in Table 4 of the paper that the ScanNet results appear lower than the baseline (shown in grey).
I'd appreciate any guidance on reproducing the reported results. Happy to share more details about my evaluation setup if helpful.
Thanks!