I develop scalable, data-driven AI models that learn from large-scale biological data to decode the language of life and uncover novel fundamental principles of biological systems.
I am a Postdoctoral Researcher at the Wellcome Sanger Institute in Cambridge, UK, working with Dr. Mo Lotfollahi on developing scalable, generalizable foundation models for large-scale biological data. My work spans sequence modeling (proteins, genomes) and structural modeling (3D molecular structures), with the goal of advancing integrative understanding across biological modalities.
Understands multi-scale molecular data (e.g. drug molecules and proteins), achieving state-of-the-art results on protein–molecule tasks.
Utilizes large-scale 3D molecular data to achieve high accuracy in molecular property prediction through robust structural understanding.
Uses edit-style pretraining to model fragment-level molecular semantics, enabling property prediction and retrosynthesis inference.
Enhances language models’ ability to capture long-range dependencies, yielding strong performance on NLU tasks.
[MASK]
Tokens in Masked Language Models. ICML 2025.
Feel free to reach out for collaboration, discussion, or to learn more about me.