I am a Ph.D. candidate in the Department of Artificial Intelligence at Korea University. My research focuses on computer vision and video understanding, particularly temporal action detection and object detection. I've worked on various video-related tasks, and more recently, I've been exploring vision-language modeling to integrate visual and language representations for various video applications.
Here are my recent publications.
-
Ho-Joong Kim, Yearang Lee, Jung-Ho Hong, and Seong-Whan Lee, "DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer," CVPR, 2025. [paper] [code]
-
Jung-Ho Hong, Ho-Joong Kim, Kyu-Sung Jeon, and Seong-Whan Lee, "Comprehensive Information Bottleneck for Unveiling Universal Attribution to Interpret Vision Transformers," CVPR (Highlight), 2025. [paper] [code]
-
Ho-Joong Kim, Jung-Ho Hong, Heejo Kong, and Seong-Whan Lee, "TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression," CVPR, 2024. [paper] [code]
-
Yearang Lee, Ho-Joong Kim, and Seong-Whan Lee, "Text-Infused Attention and Foreground-Aware Modeling for Zero-Shot Temporal Action Detection," NeurIPS, 2024. [paper] [code]
-
Heejo Kong, Suneung Kim, Ho-Joong Kim, and Seong-Whan Lee, "Unknown-Aware Graph Regularization for Robust Semi-supervised Learning from Uncurated Data," AAAI, 2024. [paper] [code]
You can find more my publications on my Google Scholar.


