With advancements in the field of ego vehicles, there has been substantial research focusing on perception for subsequent motion planning. To that end, Bird's Eye View representations can be used to help autonomous vehicles to gain an understanding of their surroundings to make decisions. BEV map is an elevated view of a portion of the world, with a top-down perspective as if the observer were a bird. Information from multiple sensors are used to generate maps of the environment that describe where objects are positioned in space and how the vehicle is oriented in reference to them. BEV maps have the distinctive characteristic of collapsing vertical information on a flat surface.
In this project, we focus on generating BEV maps and detect surrounding vehicles in bad weather and lighting conditions like rain, snow, fog and night.
Goals:
- Image pre-processing through transformers that works to restore images.
- Early objection detection and transformation of images for BEV.
- Early RADAR-LiDAR fusion for BEV map generation, late camera fusion for object detection.
- Weather conditions: night, rain, snow and fog.
A big thanks to Heriot-Watt University for creating RADIATE (RAdar Dataset In Adverse weaThEr) dataset which includes Radar, Lidar, Stereo Camera and GPS/IMU.
We collected data in different weather scenarios (sunny, overcast, night, fog, rain and snow) to help the research community to develop new methods of vehicle perceptio Sensors
- Stereo Camera: An off-the-shelf ZED stereo camera is used. It is set at 672 × 376 image resolution at 15 frames per second for each camera. It is protected by a waterproof housing for extreme weather. The images can be seriously blurred, hazy or fully blocked due to rain drops, dense fog or heavy snow, respectively.
- LiDAR: A 32 channel, 10Hz, Velodyne HDL-32e LiDAR is used to give 360° coverage. Since the LiDAR signal can be severely attenuated and reflected by intervening fog or snow the data can be missing, noisy and incorrect.
- Radar RADIATE adopts the Navtech CTS350-X radar. It is a scanning radar which provides 360° high-resolution range-azimuth images. It has 100 meters maximum operating range with 0.175m range resolution, 1.8° azimuth resolution and 1.8° elevation resolution, Currently, it does not provide Doppler information.
- GPS/IMU : Advanced Navigation GPS/IMU is provided.
We use the TransWeather[https://github.com/jeya-maria-jose/TransWeather] model for preprocessing input images to the camera pipeline. Here are how the results of this model look like in adverse weather conditions:
Approach 1: Since we were provided with only 1 camera, it becomes difficult for us to create a camera-only BEV,
as was outlined in the BevFusion paper. Hence, we began with investigating mathematical models
that would help us achieve a BEV map of what the ego vehicle perceives in front of it. Beginning
with Detectron2, we identified bounding boxes in the image’s 2D co-ordinate plane. From this, we
estimated 3D bounding boxes, picking the top edge to scale and overlay on the RADAR images.
Below is outlined the flow for a singular image going through the Camera to BEV model.
Approach 2: Object detection was carried out on each frame using the YOLOv5 model. Object tracking was also
implemented to track objects across frames based on their centroids. Centroids were calculated for
detected objects, and unique identifiers were assigned to track their movement.
To provide a holistic view of the spatial distribution of objects, we visualized the detected objects in a
bird’s-eye view perspective. This involved applying perspective transformation to the original frame,
transforming it into a bird’s-eye view.
Simulated objects were overlaid onto the transformed frames to enhance the visual representation
of detected objects. For the sake of simplicity, we used the same yellow box across all the different
kinds of objects detected.
We use early fusion to combine the lidar and radar data. The radar data is available in cartesian and polar frame as images. The lidar data is available as point cloud. The first step involved converting this point cloud data by flattening into 2D plane and storing them as images. These images are then concatenated with radar cartesian images.
Our current downstream task after fusion is vehicle detection. We use this fused data for the same. We used the radar annotations provided by the RADIATE dataset as our ground truth locations of vehicles in the images. We then split the data in train and test set including diverse bad weather conditions like rain, fog and snow across different domains like junctions, motorways and even rural areas. We finetune a pretrained Faster-RCNN model, which was pretrained on COCO dataset, to our train set.
We conduct three experiments for vehicle detection using lidar and radar data. Firstly using only radar data, followed by using only lidar data and finally using both lidar and radar data. We wanted to see the impact of each type of sensor towards object detection and then deciding which is the best. Radar model performs the best for us. Here are some links to watch the model perform in:
- Good Weather : [https://youtu.be/BHOxSMDHqgQ?feature=shared]
- Bad Weather : [https://youtu.be/74tuGAbgg0Q?feature=shared]
All models can be found here: [https://drive.google.com/drive/folders/1pjZ8GnHhS9cDGjkwIJMU6Mr3aJyzbqHt?usp=sharing]
We would like to thank:
- Professor Ragunathan Rajkumar for guiding us throughout the course 18744
- Heriot-Watt University for creating RADIATE
- Marcel Sheeny[https://github.com/marcelsheeny] for creating the radiate sdk
- bharath5673[https://github.com/bharath5673] for his cool work on Cam2BEV