💬 About Me

I am currently a senior researcher at Huawei Noah’s Ark Lab in Hong Kong, working on LLMs, MLLMs, and generative models. I obtained my Ph.D. from National University of Singapore in 2018, receiving National Semiconductor Gold Medal. Prior to that, I received my bachelor’s degree from Shanghai Jiao Tong University in 2014.

My group focuses on building generalizable AI systems from a data-centric perspective. Our mission is to understand the power and limitations of existing models, explore their corner cases, and propose efficient next-generation models and algorithms. Representative projects include:

Omni-modal Large Language Model: EMOVA
Alignment of Large Language Model: Mistake Analysis, CoSafe, ECSO
Corner Case Understanding and Video Generation for Self-Driving: MagicDrive, GeoDiffusion, CODA
Generalization of Deep Learning Models: OOD-Bench, Continual Self-Supervised Learning, MixedAE

🔥 News

2025.01: One paper accepted by NAACL 2025!
2025.01: Two papers accepted by ICLR 2025!
2024.10: Two papers accepted by WACV 2025!
2024.09: 🎉🎉 We have released EMOVA, the very first end-to-end omni-modal model with SoTA vision-language and speech capabilities, further supporting emotional dialogue. Stay tuned for more details!
2024.09: We hosted the ECCV Workshop “Multimodal Perception and Comprehension of Corner Cases in Autonomous Driving: Towards Next-Generation Solutions” (W-CODA) in Milan, Italy!
2024.09: Two papers accepted by NeurIPS 2024!
2024.09: One paper accepted by EMNLP 2024!
2024.08: One paper accepted by COLM 2024!
2024.07: Two papers accepted by ECCV 2024!
2024.06: The First Autonomous Driving Corner Case Understanding and Video Generation Challenge is now open with generous prizes! We welcome your participation! [see details]
2024.06: 🎉🎉 MagicDrive, as a core video generation feature of PanGu Large Model 5.0, was unveiled at Huawei Developer Conference 2024 (HDC 2024)! [see details]
2024.02: Two papers accepted by CVPR 2024!
2024.01: Three papers accepted by ICLR 2024! See you in Vienna!

✨ Selected Projects

Proposeed EMOVA, an end-to-end omni-modal LLM that can see, hear and speak. We use a continuous vision encoder and a semantic-acoustic disentangled speech tokenizer for seamless omni-modal alignment and diverse speech style controllability.
Introduced an efficient text-centric omni-modal alignment which can further improve the vision-language and speech capabilities, even compared with the corresponding bi-modal aligned counterparts (i.e., image-text only and speech-text only alignment).
For the first time, EMOVA achieve SoTA comparable performance on both the vision-language and speech benchmarks simultaneously, while supporting flexible spoken dialogues with vivid emotions, featured by Synced.

Alignment of Large Language Model (2023-2024)

Proposed LLMs and MLLMs self-alignment framework Mistake Analysis (ICLR) and ECSO (ECCV), enhancing LLMs’ safety pass rates by over 20% while maintaining the performance;
Established CoSafe (EMNLP), a benchmark for evaluating LLM’s safety in multi-turn dialogues, systematically assessing LLM’s safety performance across multiple dialogue rounds.
Featured by QbitAI; supported the PanGu Large Model’s compliance with the National Cyberspace Administration’s AIGC Large Model Regulatory Filing.

Corner Case Understanding and Video Generation for Self-Driving (2021-2023)

Established a controlable video generation framework for corner cases in autonomous driving by integrating physical laws, featuring works such as GeoDiffusion (ICLR), MagicDrive (ICLR), and DetDiffusion (CVPR), addressing challenges of cross-view and cross-frame spatiotemporal consistency in video generation.
Developed CODA (ECCV) and CODA-LM, autonomous driving corner case datasets, covering over 5000 rare scenes; these significantly reduced model perception and understanding performance (including GPT-4V), effectively evaluating and pinpointing model weaknesses in autonomous driving corner cases.
Featured by QbitAI and other public channels; implemented in Huawei vehicles and highlighted as a core feature of the PanGu Large Model 5.0 at the Huawei HDC 2024 [see details].

Generalization of Deep Learning Models (2019-2021)

Proposed a multi-dimensional out-of-distribution (OOD) generalization benchmark, addressing OOD generalization challenges across three dimensions: training data, paradigms, and model architectures. Published works include OOD-Bench (CVPR Oral), NAS-OOD (ICCV), and DecAug (AAAI), among others.
Explored improving model generalization through self-supervised learning (SSL), with methods such as MultiSiam (ICCV) and MixedAE (CVPR) for complex multi-instance scenarios, MoCE (ICLR Spotlight) for task-customized SSL, Continual SSL (ICLR), and SADE (NeurIPS) for SSL multi-expert integration.
Widely cited by prominent researchers, including Kaiming He and Percy Liang; featured by Synced (OOD-Bench, SADE) and AI Era (DecAug). Algorithms applied in Huawei Music to decouple irrelevant user features, effectively reducing the “Matthew Effect” in recommendation systems.

📝 Recent Publications

The full publication list can be found on Google Scholar.

Preprint:

• EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Kai Chen^*, Yunhao Gou^*, Runhui Huang^*, Zhili Liu^*, Daxin Tan^*, Jing Xu, Chunwei Wang, Yi Zhu, Yihan Zeng, Kuo Yang, Dingdong Wang, Kun Xiang, Haoyuan Li, Haoli Bai, Jianhua Han, Xiaohui Li, Weike Jin, Nian Xie, Yu Zhang, James T. Kwok, Hengshuang Zhao, Xiaodan Liang, Dit-Yan Yeung, Xiao Chen, Zhenguo Li, Wei Zhang, Qun Liu, Lanqing Hong^†, Lu Hou^†, Hang Xu^†

Paper Project

• MagicDrive3D: Controllable 3D Generation for Any-View Rendering in Street Scenes

Ruiyuan Gao, Kai Chen, Zhihao Li, Lanqing Hong^†, Zhenguo Li, Qiang Xu^†

Paper Project

• Automated Evaluation of Large Vision-Language Models on Self-Driving Corner Cases

Kai Chen^*, Yanze Li^*, Wenhua Zhang^*, Yanxin Liu, Pengxiang Li, Ruiyuan Gao, Lanqing Hong^†, Meng Tian, Xinhai Zhao, Zhenguo Li, Dit-Yan Yeung, Huchuan Lu, Xu Jia^†

Paper

• Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning

Yunhao Gou, Zhili Liu, Kai Chen, Lanqing Hong, Hang Xu, Aoxue Li, Dit-Yan Yeung, James T. Kwok, Yu Zhang

Paper

2024

• CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference

Erxin Yu, Jing Li, Ming Liao, Siqi Wang, Zuchen Gao, Fei Mi, Lanqing Hong

Empirical Methods in Natural Language Processing (EMNLP), 2024

Paper

• CoCA: Regaining Safety-awareness of Multimodal Large Language Models with Constitutional Calibration

Jiahui Gao, Renjie Pi, Tianyang Han, Han Wu, Lanqing Hong, Lingpeng Kong, Xin Jiang, Zhenguo Li

Conference on Language Modeling (COLM), 2024

Paper

• Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation

Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, James T. Kwok, Yu Zhang

European Conference on Computer Vision (ECCV), 2024

Paper Project

• Implicit Concept Removal of Diffusion Models

Zhili Liu, Kai Chen, Yifan Zhang, Jianhua Han, Lanqing Hong, Hang Xu, Zhenguo Li, Dit-Yan Yeung, and James T. Kwok

European Conference on Computer Vision (ECCV), 2024

Paper

• CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs

Yingji Zhong, Lanqing Hong, Zhenguo Li, Dan Xu

Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Paper

• DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Yibo Wang^*, Ruiyuan Gao^*, Kai Chen^*, Kaiqiang Zhou, Yingjie Cai, Lanqing Hong, Zhenguo Li, Lihui Jiang, Dit-Yan Yeung, Qiang Xu, Kai Zhang

Conference on Computer Vision and Pattern Recognition (CVPR), 2024

Paper

• Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

Kai Chen^*, Chunwei Wang^*, Kuo Yang, Jianhua Han, Lanqing Hong^†, Fei Mi^†, Hang Xu, Zhengying Liu, Wenyong Huang, Zhenguo Li, Dit-Yan Yeung, Lifeng Shang, Xin Jiang, Qun Liu

International Conference on Learning Representations (ICLR), 2024

Paper

• MagicDrive: Street View Generation with Diverse 3D Geometry Control

Ruiyuan Gao^*, Kai Chen^*, Enze Xie, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung, Qiang Xu

International Conference on Learning Representations (ICLR), 2024

Paper Project

• GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation

Kai Chen^*, Enze Xie^*, Zhe Chen, Yibo Wang, Lanqing Hong^†, Zhenguo Li, Dit-Yan Yeung

International Conference on Learning Representations (ICLR), 2024

Paper Project

🎖 Professional Services

Area Chair of IJCAI 2025
Industrial Chair of 3DV 2025
Senior Program Committee Members of IJCAI 2023, 2024
Organizer of ECCV Workshop W-CODA
Reviewer of TPAMI, ICLR, NeurIPS, CVPR, ECCV, ICCV, etc.

🌐 Internship Opportunities

We are now recruiting self-motivated interns/full-time researchers. If you are interested in, please directly send your CV to my email.

Current Interns

Kai CHEN (Hong Kong University of Science and Technology)
Ruiyuan GAO (The Chinese University of Hong Kong)
Yunhao GOU (Hong Kong University of Science and Technology)
Runhui HUANG (The University of Hong Kong)
Kaican LI (Hong Kong University of Science and Technology)
Zhili LIU (Hong Kong University of Science and Technology)
Yingji ZHONG (Hong Kong University of Science and Technology)

Former Interns

Haoyue BAI (University of Wisconsin-Madison)
Shoukang HU (Sony AI)
Haonan WANG (National University of Singapore)
Shipeng YAN (ByteDance)
Longhui YU (Peking University)
Xinyun ZHANG (The Chinese University of Hong Kong)
Kaichen ZHOU (University of Oxford)

Lanqing HONG

💬 About Me

🔥 News

✨ Selected Projects

Omni-modal Large Language Model (2024)

Alignment of Large Language Model (2023-2024)

Corner Case Understanding and Video Generation for Self-Driving (2021-2023)

Generalization of Deep Learning Models (2019-2021)

📝 Recent Publications

Preprint:

2024

🎖 Professional Services

🌐 Internship Opportunities

Current Interns

Former Interns