Kangjie Zheng Photo

Kangjie Zheng Pronounced /kahng-dʒyeh dʒeng/

Postdoctoral Fellow, Wellcome Sanger Institute

AI for Biology · Genomics · Biomolecules

I develop scalable, data-driven AI models that learn from large-scale biological data to decode the language of life and uncover novel fundamental principles of biological systems.

About Me

I am a Postdoctoral Researcher at the Wellcome Sanger Institute in Cambridge, UK, working with Dr. Mo Lotfollahi on developing scalable, generalizable foundation models for large-scale biological data. My work spans sequence modeling (proteins, genomes) and structural modeling (3D molecular structures), with the goal of advancing integrative understanding across biological modalities.

Academic Background

  • Postdoctoral Fellow, Wellcome Sanger Institute (Sep 2025 – Present)
  • Ph.D. in Computer Science, Peking University (Aug 2020 – Jun 2025)
    Supervisor: Prof. Ming Zhang
  • B.Eng. in Computer Science, Harbin Institute of Technology (Aug 2016 – Jun 2020)
    College: The Honors School of HIT (Top 10 graduates)

Industry Experience

Research Highlights

Selected Publications

* equal contribution · See a full list of papers on Google Scholar .

AI for Science

  1. ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling. ICML 2024.
    Kangjie Zheng*, Siyu Long*, Tianyu Lu, Junwei Yang, Xinyu Dai, Ming Zhang, Zaiqing Nie, Wei-Ying Ma, Hao Zhou.
  2. SMI-Editor: Edit-based SMILES Language Model with Fragment-level Supervision. ICLR 2025.
    Kangjie Zheng, Siyue Liang, Junwei Yang, Bin Feng, Zequn Liu, Wei Ju, Zhiping Xiao, Ming Zhang.
  3. Mol-AE: Auto-Encoder Based Molecular Representation Learning With 3D Cloze Test Objective. ICML 2024.
    Junwei Yang*, Kangjie Zheng*, Siyu Long, Zaiqing Nie, Ming Zhang, Xinyu Dai, Wei-Ying Ma, Hao Zhou.

Language Modeling

  1. ExLM: Rethinking the Impact of [MASK] Tokens in Masked Language Models. ICML 2025.
    Kangjie Zheng, Junwei Yang, Siyue Liang, Bin Feng, Zequn Liu, Wei Ju, Zhiping Xiao, Ming Zhang.
  2. Towards A Unified Training for Levenshtein Transformer. ICASSP 2023.
    Kangjie Zheng, Longyue Wang, Zhihao Wang, Binqi Chen, Ming Zhang, Zhaopeng Tu.
  3. A Decoding Algorithm Based on Directed Acyclic Transformers for Length-Control Summarization. EMNLP Findings 2024.
    Chenyang Huang, Hao Zhou, Cameron Jen, Kangjie Zheng, Osmar Zaiane, Lili Mou.
  4. Gloss Matters: Unlocking the Potential of Non-Autoregressive Sign Language Translation. ACM MM 2025.
    Zhihao Wang, Shiyu Liu, Zhiwei He, Kangjie Zheng, Liangying Shao, Junfeng Yao, Jinsong Su.

Others

  1. Constrained Truth Discovery. IEEE TKDE 2020.
    Chen Ye, Hongzhi Wang, Kangjie Zheng, Youkang Kong, Rong Zhu, Jing Gao, Jianzhong Li.
  2. Zero-shot Node Classification with Graph Contrastive Embedding Network. TMLR 2023.
    Wei Ju, Yifang Qin, Siyu Yi, Zhengyang Mao, Kangjie Zheng, Luchen Liu, Xiao Luo, Ming Zhang.
  3. Multi-Source Data Repairing Powered by Integrity Constraints and Source Reliability. Information Sciences 2020 (507:386–403).
    Chen Ye, Hongzhi Wang, Kangjie Zheng, Jing Gao, Jianzhong Li.
  4. A Predicate-Function-Argument Annotation of Natural Language for Open-Domain Information Expression. EMNLP 2020.
    Mingming Sun, Wenyue Hua, Zoey Liu, Kangjie Zheng, Xin Wang, Ping Li.
  5. CRISPRminer is a knowledge base for exploring CRISPR-Cas systems in microbe and phage interactions. Communications Biology 2018.
    Fan Zhang, Shijia Zhao, Chunyan Ren, Yuwei Zhu, Haibin Zhou, Yongkui Lai, Fengxia Zhou, Yuqiang Jia, Kangjie Zheng, Zhiwei Huang.

Talks & Presentations

Contact Me

Feel free to reach out for collaboration, discussion, or to learn more about me.