Kangjie Zheng's Homepage

About Me

I am a Postdoctoral Researcher at the Wellcome Sanger Institute in Cambridge, UK, working with Dr. Mo Lotfollahi on developing scalable, generalizable foundation models for large-scale biological data. My work spans sequence modeling (proteins, genomes) and structural modeling (3D molecular structures), with the goal of advancing integrative understanding across biological modalities. During my Ph.D., I conducted research on foundation models for molecular modeling, which formed the core of my doctoral thesis titled Research on Molecular Modeling Based on Pre-trained Models, and this work was recognized as the Winner of the ACM Beijing Doctoral Dissertation Award.

Academic Background

Postdoctoral Fellow, Wellcome Sanger Institute (Sep 2025 – Present)
Mentor: Dr. Mo Lotfollahi
Ph.D. in Computer Science, Peking University (Aug 2020 – Jun 2025)
Supervisor: Prof. Ming Zhang
B.Eng. in Computer Science, Harbin Institute of Technology (Aug 2016 – Jun 2020)
College: The Honors School of HIT (Top 10 graduates)

Industry Experience

Research Intern, AIR in Tsinghua University (Aug 2022 – Nov 2024)
Mentors: Prof. Wei-Ying Ma , Prof. Hao Zhou
Research Intern, Tencent AI Lab (Aug 2021 – Aug 2022)
Mentors: Dr. Longyue Wang , Dr. Zhaopeng Tu
Research Intern, Baidu Research (Aug 2019 – May 2020)
Mentor: Dr. Mingming Sun

Research Highlights

A multi-modal, multi-scale suite of foundation models targeting diverse large-scale datasets and practical applications, organized around four key aspects:

Selected Publications

* equal contribution · See a full list of papers on Google Scholar .

AI for Science

ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling. ICML 2024.
Kangjie Zheng^*, Siyu Long^*, Tianyu Lu, Junwei Yang, Xinyu Dai, Ming Zhang, Zaiqing Nie, Wei-Ying Ma, Hao Zhou.

PDF OpenReview Code
SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision. ICLR 2025.
Kangjie Zheng, Siyue Liang, Junwei Yang, Bin Feng, Zequn Liu, Wei Ju, Zhiping Xiao, Ming Zhang.

PDF OpenReview Code
Mol-AE: Auto-Encoder Based Molecular Representation Learning With 3D Cloze Test Objective. ICML 2024.
Junwei Yang^*, Kangjie Zheng^*, Siyu Long, Zaiqing Nie, Ming Zhang, Xinyu Dai, Wei-Ying Ma, Hao Zhou.

PDF OpenReview Code

Language Modeling

ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models. ICML 2025.
Kangjie Zheng, Junwei Yang, Siyue Liang, Bin Feng, Zequn Liu, Wei Ju, Zhiping Xiao, Ming Zhang.

PDF OpenReview Code
Towards A Unified Training for Levenshtein Transformer. ICASSP 2023.
Kangjie Zheng, Longyue Wang, Zhihao Wang, Binqi Chen, Ming Zhang, Zhaopeng Tu.

Publisher Code
A Decoding Algorithm Based on Directed Acyclic Transformers for Length-Control Summarization. EMNLP Findings 2024.
Chenyang Huang, Hao Zhou, Cameron Jen, Kangjie Zheng, Osmar Zaiane, Lili Mou.

PDF Code
Gloss Matters: Unlocking the Potential of Non-Autoregressive Sign Language Translation. ACM MM 2025.
Zhihao Wang, Shiyu Liu, Zhiwei He, Kangjie Zheng, Liangying Shao, Junfeng Yao, Jinsong Su.

GNN & Data Mining

Learning Generalizable Contrastive Representations for Graph Zero-shot Learning. IEEE Trans. Multimedia (2025).
Siyu Yi, Zhengyang Mao, Kangjie Zheng, Zhiping Xiao, Ziyue Qiao, Chong Chen, Xian-Sheng Hua, Yongdao Zhou, Ming Zhang, Wei Ju.

PDF Publisher
Zero-shot Node Classification with Graph Contrastive Embedding Network. Trans. on Machine Learning Research (2023).
Wei Ju, Yifang Qin, Siyu Yi, Zhengyang Mao, Kangjie Zheng, Luchen Liu, Xiao Luo, Ming Zhang.

PDF OpenReview
Constrained Truth Discovery. IEEE Trans. on Knowledge and Data Engineering (2020).
Chen Ye, Hongzhi Wang, Kangjie Zheng, Youkang Kong, Rong Zhu, Jing Gao, Jianzhong Li.

Publisher
Multi-Source Data Repairing Powered by Integrity Constraints and Source Reliability. Information Sciences (2020).
Chen Ye, Hongzhi Wang, Kangjie Zheng, Jing Gao, Jianzhong Li.

Publisher