FAQs

If you previously had my WeChat, please add my new one by replacing the '-' symbol in my wechat ID with '_'.

My Curriculum Vitae.

Education & Work


Zeyuan on the photo day of MSR AI
@ Microsoft Research Redmond
MIT

Tsinghua University

Some Awards

A family photo of my gold and silver medals
A family photo of my gold and silver medals

In algorithm competitions, I was fortunate to win a few awards in my past life, including two IOI gold medals, a USACO world champion, an ACM/ICPC world-final gold medal (2nd place), a Google Codejam world runner-up, and a USA MCM Oustanding Prize.

In research, I used to be supported by a Microsoft Young Fellow Award, a Simons Student Award and a Microsoft Azure Research Award.

For a full list, click here.


Personal Information

Research Interests

My current research focuses on Physics of Language Models, a scientific framework focused on establishing what is needed for the next generation of AI—beyond benchmark chasing. The goal is to understand the universal laws governing how large models learn, reason, and generalize, and to uncover the fundamental limits shared by today’s AI systems. Through controlled experiments and neuron-level probing, I study how data design, pre-/post-training, and model architectures shape these behaviors, seeking principled breakthroughs to guide the development of better and safer AGI. This line of work was featured in my ICML2024 tutorial and has influenced modern LLM training practice; for example, Part 3.1 introduced the necessity of paraphrasing and rewriting pretraining data for efficient knowledge learning, which is now standard practice across every AI frontier lab.

Before that, I worked on the mathematics of deep learning, developing proofs on the learnability of neural networks to explain phenomena observed in practice. Our work on ensemble and knowledge distillation received the ICLR 2023 Best Paper Runner-Up, and our COLT 2023 paper provided the first formal proof of why and how deep networks perform deep learning (e.g., achieve superior performance over layer-wise training). This theoretical line inspired our LoRA fine-tuning method, now widely adopted across the AI community, and continues to shape the Physics of Language Models.

Earlier in my career, I worked on optimization theory and theoretical computer science; this background gives me a deep understanding of optimization dynamics in practice, and a clearer perspective on the fundamental learnability limits of modern AI systems.

Email