Query-Dependent Video Representation for Moment Retrieval and Highlight Detection (CVPR23)
WonJun Moon*1 SangEek Hyun*1 SangUk Park2 Dongchan Park2 Jae-Pil Heo1 Sungkyunkwan University1 Pyler2
Hello. I’m WonJun Moon, a postdoctoral researcher at KAIST, South Korea. I received my Ph.D. from Sungkyunkwan University under the supervision of Prof. Jae-Pil Heo. Currently, I am a member of the Computer Vision Lab advised by Prof. Seungryong Kim.
My research goal is to develop scalable multimodal video understanding systems deployable in real-world environments. My research focuses on video/image representation learning under multimodal ambiguity, temporal complexity, and limited supervision, with applications spanning retrieval, grounding, and segmentation. Most recently, I am dedicated to uncovering and enhancing the visual reasoning processes of Multimodal Large Language Models.
(* : equal contribution)
Video Object-Centric Learning (Compact visual representation / Efficiency) Text-Video Retrieval & Grounding Semantic Segmentation Vision-Language Models & Multimodal Robustness (Few-Shot & OOD & Long-tailed Recognition)
| ECCV 2026 | WonJun Moon, Jae-Pil Heo | Selective Synergistic Learning for Video Object-Centric Learning
[Arxiv] [Code] [Project] |
|---|---|---|
| CVPR 2026 | WonJun Moon, Hyun Seok Seong, Jae-Pil Heo | Reconstruction-Guided Slot Curriculum: Addressing Object Over-Fragmentation in Video Object-Centric Learning
[Arxiv] [Code] |
| CVPR 2026 | Yerim Jeon, Miso Lee, WonJun Moon, Jae-Pil Heo | Masking Matters: Unlocking the Spatial Reasoning Capabilities of LLMs for 3D Scene-Language Understanding
[Arxiv] [Code] |
| CVPR 2026 | ByeongCheol Lee, Hyun Seok Seong, Sangeek Hyun, Gilhan Park, WonJun Moon, Jae-Pil Heo | Looking Beyond the Window: Global-Local Aligned CLIP for Training-free Open-Vocabulary Semantic Segmentation
[Arxiv] [Code] |
| ICLR 2026 | Hyun Seok Seong*, WonJun Moon*, Jae-Pil Heo | From Vicious to Virtuous Cycles: Synergistic Representation Learning for Unsupervised Video Object-Centric Learning
[Arxiv] [Paper] [Code] |
| NeurIPS 2025 | WonJun Moon*, MinSeok Jung*, Gilhan Park, Tae-Young Kim, Cheol-Ho Cho, Woojin Jun, Jae-Pil Heo | Mitigating Semantic Collapse in Partially Relevant Video Retrieval
[Arxiv] [Paper] [Code] |
| ICCV 2025 | WonJun Moon*, Hyun Seok Seong*, Jae-Pil Heo | Selective Contrastive Learning for Weakly Supervised Affordance Grounding
[Arxiv] [Code] |
| ICCV 2025 | WonJun Moon, Cheol-Ho Cho, Woojin Jun, Minho Shim, Taeoh Kim, Inwoong Lee, Dongyoon Wee, Jae-Pil Heo | Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
[Arxiv] |
| CVPR 2025 (oral) | SuBeen Lee, WonJun Moon, Hyun Seok Seong, Jae-Pil Heo | Temporal Alignment-Free Video Matching for Few-shot Action Recognition
[Arxiv] [Paper] [Code] |
| AAAI 2025 | Cheol-Ho Cho, WonJun Moon, Woojin Jun, MinSeok Jung, Jae-Pil Heo | Ambiguity-Restrained Text-Video Representation Learning for Partially Relevant Video Retrieval
[Arxiv] |
| AAAI 2025 | Woojin Jun, WonJun Moon, Cheol-Ho Cho, MinSeok Jung, Jae-Pil Heo | Bridging the Semantic Granularity Gap Between Text and Frame Representations for Partially Relevant Video Retrieval
[Paper] |
| TPAMI 2024 | SuBeen Lee, WonJun Moon, Hyun Seok Seong, Jae-Pil Heo | Task-oriented channel attention for fine-grained few-shot classification
[Arxiv] [Paper] |
| ECCV 2024 | Hyun Seok Seong, WonJun Moon, SuBeen Lee, Jae-Pil Heo | Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation
[Arxiv] [Code] |
| ECCV 2024 | Gilhan Park, WonJun Moon, SuBeen Lee, Tae-Young Kim, Jae-Pil Heo | Mitigating Background Shift in Class-Incremental Semantic Segmentation
[Arxiv] [Code] |
| Pattern Recognition 2025 | WonJun Moon, Sangeek Hyun, SuBeen Lee, Jae-Pil Heo | Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
[Arxiv] [Code] |
| AAAI 2024 | Seunggu Kang, WonJun Moon, Euiyeon Kim, Jae-Pil Heo | VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting
[Arxiv] [Paper] [Code] |
| CVPR 2023 | WonJun Moon*, Sangeek Hyun*, Sanguk Park, Dongchan Park, Jae-Pil Heo | Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
[Arxiv] [Paper] [Code] [Video] |
| CVPR 2023 | Hyun Seok Seong, WonJun Moon, SuBeen Lee, Jae-Pil Heo | Leveraging Hidden Positives for Unsupervised Semantic Segmentation
[Arxiv] [Paper] [Code] |
| AAAI 2023 (oral) | WonJun Moon, Hyun Seok Seong, Jae-Pil Heo | Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition
[Arxiv] [Paper] [Code] [Video] |
| ECCV 2022 | WonJun Moon, Ji-Hwan Kim, Jae-Pil Heo | Tailoring Self-Supervision for Supervised Learning
[Arxiv] [Paper] [Code] [Video] |
| ECCV 2022 | WonJun Moon, Junho Park, Hyun Seok Seong, Cheol-Ho Cho, Jae-Pil Heo | Difficulty-Aware Simulator for Open Set Recognition
[Arxiv] [Paper] [Code] [Video] |
| CVPR 2022 (oral) | SuBeen Lee, WonJun Moon, Jae-Pil Heo | Task Discrepancy Maximization for Fine-grained Few-Shot Classification
[Arxiv] [Paper] [Code] |
| KIISE 2022 | Learning from Data Imbalance with Class Grouping Loss | WonJun Moon, Jae-Pil Heo |
|---|---|---|
| KIISE 2020 | Mix-Contrastive Match | WonJun Moon, Jae-Pil Heo |
Ph.D. Dept of Artificial Intelligence, Sungkyunkwan University
MS. Dept of Artificial Intelligence, Sungkyunkwan University
GPA : 4.38 / 4.5
Bs. Dept of Software, Sungkyunkwan University
GPA : 4.42 / 4.5
Language : Korean, English
WonJun Moon*1 SangEek Hyun*1 SangUk Park2 Dongchan Park2 Jae-Pil Heo1 Sungkyunkwan University1 Pyler2
WonJun Moon Hyun Seok Seong Jae-Pil Heo Sungkyunkwan University
Abstract Recently, it is shown that deploying a proper self-supervision is a prospective way to enhance the performance of supervised learning. Yet, the ben...
Abstract Open set recognition (OSR) assumes unknown instances appear out of the blue at the inference time. The main challenge of OSR is that the response o...