I'm Tianheng Cheng, and now a researcher at ByteDance Seed Team and working on cutting-edge large multimodal models and world models. I have finished my Ph.D. career at the HUST Vision Lab of Huazhong University of Science and Technology.
My lifelong research goal is to enable machines/robots to comprehend world knowledge and interact with environments like human beings.
Previous works/publications are listed at Google Scholar 📚.
Currently, I'm devoted to research about large multimodal models, foundational visual-language modeling, and image generation. Before that, I mainly focused on fundamental tasks such as object detection and instance segmentation, as well as visual perception for autonomous driving.