I am Anhao Zhao, a joint Ph.D. student at the NLP Group of The Hong Kong Polytechnic University & EIT NLP of Eastern Institute of Technology, Ningbo, fortunately supervised by Dr. Xiaoyu Shen and Prof. Wenjie Li.

Research

I am a long-termist, focusing on LLM efficiency with the goal of exploring the Pareto frontier between accuracy and deployment cost. My work centers on two complementary directions:

  • Model architecture and KV-cache efficiency (vertical perspective): Aiming to reduce the per-token computational and memory cost. Recent work includes SkipGPT [ICML’25], which proposes per-token adaptive activation of parameter subsets (token-aware gating), as well as strategies for selective KV-cache dropping [EMNLP’25].

  • Token generation efficiency (horizontal perspective): Reducing the average number of generated tokens. One recent line of work surveys latent reasoning as a promising paradigm for achieving more efficient chain-of-thought (CoT) reasoning [arXiv’25]. Another line of work investigates early-answer generation strategies, where models can output responses before fully processing the input or completing reasoning steps [ACL’25].

I believe that democratizing large language models (LLMs)—making them more accessible, affordable, and widely usable—requires progress along both of these axes.

📬 I am open to collaborations and discussions. Please feel free to reach out to me if you are interested in my research or any relevant topics.

News

[2025.09] Started my Ph.D. study at the NLP Group @ PolyU & EIT NLP, supervised by Dr. Xiaoyu Shen and Prof. Wenjie Li.
[2025.08] Got one paper accepted by EMNLP 2025🎉!
[2025.05] Released our new survey on Latent Chain-of-Thought Reasoning.
[2025.05] Got one paper accepted by ACL 2025🎉!
[2025.05] Got one paper accepted by ICML 2025🎉!
[2024.09] Got one paper accepted by EMNLP 2024🎉!

Publications

Most recent publications on Google Scholar.
* indicates equal contribution

Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Xinghao Chen*, Anhao Zhao*, Heming Xia, Xuan Lu, Hanlin Wang, Yanjun Chen, Wei Zhang, Jian Wang, Wenjie Li, Xiaoyu Shen
Arxiv 2025. [link] [code]

LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding
Junlong Tong, Jinlan Fu, Zixuan Lin, Yingqi Fan, Anhao Zhao, Hui Su, Xiaoyu Shen
Findings of ACL 2025. [link] [code]

SkipGPT: Each Token is One of a Kind
Anhao Zhao, Fanghua Ye, Yingqi Fan, Junlong Tong, Jing Xiong, Zhiwei Fei, Hui Su, Xiaoyu Shen
ICML 2025. [link] [code]

Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism
Anhao Zhao, Fanghua Ye, Jinlan Fu, Xiaoyu Shen
EMNLP 2024 Main. [link] [code]

A dynamic multi-modal deep reinforcement learning framework for 3D bin packing problem
Anhao Zhao, Tianrui Li, Andrew Lim
Knowledge-Based Systems 2024. [link] [code]