Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the PV branch and BEV module #27

Open
Zhenghao97 opened this issue Sep 20, 2023 · 3 comments
Open

About the PV branch and BEV module #27

Zhenghao97 opened this issue Sep 20, 2023 · 3 comments

Comments

@Zhenghao97
Copy link

No description provided.

@Zhenghao97 Zhenghao97 changed the title About the PB About the PV branch and BEV module Sep 20, 2023
@Zhenghao97
Copy link
Author

Hi, Sarlin

Recently various meetings appear many bev methods for perception tasks, such as BEVFormer. These methods generally use 6 or more PV images from different views to enlarge the FoV of the bev representation. So the OrienterNet maybe can also introduce more PV images to lift model performance? However I am not sure the affinity bewteen the bev part of these methods like BEVFormer and the regist part of the OrienterNet.

Maybe I can also use the original method from the OrienterNet, but how can i leverage bev feature from the other PV images? Simply stack these bev features sounds like not a good idea.

I hope to refer to some of your ideas, thanks!

@sarlinpe
Copy link
Collaborator

You could infer a BEV for each image independently and stitch them into a single local map based on relative poses between the cameras, averaging the features for overlapping regions. This should give similar results to fusing the likelihoods (like in our sequential localization experiments) but computationally lighter. I'm happy to integrate this in the repo if you have a working example.

This would not leverage multi-view information, though. In our latest paper SNAP, we have a solution to transparently fuse information from a single or multiple images into a BEV. The code is not yet public but it shouldn't be hard to implement.

What kinds of dataset provide calibrated and time-synchronized multi-camera images with ground-truth geolocation?

@Zhenghao97
Copy link
Author

Thanks for your ideas! I would develop them on the orienternet.

As far as I know, the nuscenes dataset satisfies all the above conditions, and the recently appeared BEV look-around methods perform their experiments on this dataset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants