Zeyuan Allen-Zhu's Home Page

FAQs

If you previously had my WeChat, please add my new one by replacing the '-' symbol in my wechat ID with '_'.

My Curriculum Vitae.

Education & Work

Zeyuan on the photo day of MSR AI
@ Microsoft Research Redmond

FAIR at Meta (Facebook AI Research) (2022 – present)
- AI research scientist, in Seattle/Bellevue office
Mirosoft Research Redmond (2017 – 2022)
- principal researcher
PRINCETON and Institute for Advanced Study (2015 - 2017)
- postdoc (hosted by Elad Hazan and Avi Wigderson)
MIT, Csail (2010 – 2015)
- Sc.D. in computer science (advised by Jon Kelner and Silvio Micali)
- M.S. in computer science (advised by Silvio Micali)
Tsinghua, Department of Physics (2006 – 2010)
- B.S. in mathematics and physics (summa cum laude)
- Academic talent program (基科班)
- Chi-Sun Yeh prize for physics major
NFLS (2000 – 2006)
- high school diplomat with English major

Some Awards

A family photo of my gold and silver medals

In algorithm competitions, I was fortunate to win a few awards in my past life, including two IOI gold medals, a USACO world champion, an ACM/ICPC world-final gold medal (2nd place), a Google Codejam world runner-up, and a USA MCM Oustanding Prize.

In research, I used to be supported by a Microsoft Young Fellow Award, a Simons Student Award and a Microsoft Azure Research Award.

For a full list, click here.

Personal Information

Research Interests

My current research focuses on Physics of Language Models—a scientific framework to uncover the universal laws governing how large AI models learn, reason, and generalize. I design controlled experiments and probe neurons to reveal the mechanisms behind their strengths and failures, aiming to provide both theoretical insight and practical guidance—on data preparation, pre-/post-training, and architecture—for building better and safer AGI beyond today’s AI systems. This line of work was featured in my ICML2024 tutorial (see below).

Before that, I worked on the mathematics of deep learning, developing proofs on the learnability of neural networks to explain phenomena observed in practice. Our work on ensemble and knowledge distillation received the ICLR 2023 Best Paper Runner-Up, and our COLT 2023 paper provided the first formal proof of why and how deep networks perform deep learning (e.g., achieve superior performance over layer-wise training). This theoretical line inspired our LoRA fine-tuning method, now widely adopted across the AI community, and continues to shape the Physics of Language Models.

Earlier in my career, I also worked on optimization theory and theoretical computer science.

Incredibly honored and humbled by the overwhelming response to my tutorial, and thank you everyone who attended in person. Truly heartwarming to hear how much you enjoyed it. Many have been asking for a recording, and I prepared one with my own subtitles https://t.co/RjTm9ZHpId https://t.co/PFi2elHnsi pic.twitter.com/hBy1aPzIFU
— Zeyuan Allen-Zhu (@ZeyuanAllenZhu) July 25, 2024

Zeyuan Allen-Zhu, Sc.D.

Pages

FAQs

Education & Work

Some Awards

Personal Information

Research Interests

Email