It seems like the input should be mesh or point cloud, does that mean we should first convert the images to mesh or point cloud?