I am Anhao Zhao, a joint Ph.D. student at the NLP Group of The Hong Kong Polytechnic University & EIT NLP of Eastern Institute of Technology, Ningbo, fortunately supervised by Dr. Xiaoyu Shen and Prof. Wenjie Li.
Research
I am a long-termist, focusing on LLM efficiency with the goal of exploring the Pareto frontier between accuracy and deployment cost. My work centers on two complementary directions:
Model architecture and KV-cache efficiency (vertical perspective): Aiming to reduce the per-token computational and memory cost. Recent work includes SkipGPT [ICML’25], which proposes per-token adaptive activation of parameter subsets (token-aware gating), as well as strategies for selective KV-cache dropping [EMNLP’25].
Token generation efficiency (horizontal perspective): Reducing the average number of generated tokens. One recent line of work surveys latent reasoning as a promising paradigm for achieving more efficient chain-of-thought (CoT) reasoning [arXiv’25]. Another line of work investigates early-answer generation strategies, where models can output responses before fully processing the input or completing reasoning steps [ACL’25,ICLR’26].
I believe that democratizing large language models (LLMs)—making them more accessible, affordable, and widely usable—requires progress along both of these axes.
📬 I am open to collaborations and discussions. Please feel free to reach out to me if you are interested in my research or any relevant topics.
News
[2026.02] Got one paper accepted by CVPR 2026🎉!
[2026.01] Got one paper accepted by ICLR 2026🎉!
[2025.11] Attended EMNLP 2025 in person for the first time — a truly exciting experience 🎉
[2025.09] Started my Ph.D. study at the NLP Group @ PolyU & EIT NLP, supervised by Dr. Xiaoyu Shen and Prof. Wenjie Li.
[2025.08] Got one paper accepted by EMNLP 2025🎉!
[2025.05] Released our new survey on Latent Chain-of-Thought Reasoning.
[2025.05] Got one paper accepted by ACL 2025🎉!
[2025.05] Got one paper accepted by ICML 2025🎉!
[2024.09] Got one paper accepted by EMNLP 2024🎉!
Publications
Most recent publications on Google Scholar.
* indicates equal contribution
Conference & Journal Papers
What Do Visual Tokens Really Encode? Uncovering Sparsity and Redundancy in Multimodal Large Language Models
Yingqi Fan, Junlong Tong, Anhao Zhao, Xiaoyu Shen†
CVPR 2026. [link] [code]
StreamingThinker: Large Language Models Can Think While Reading
Junlong Tong, Yingqi Fan, Anhao Zhao, Yunpu Ma, Xiaoyu Shen†
ICLR 2026. [link] [code]
VisiPruner: Decoding Discontinuous Cross-Modal Dynamics for Efficient Multimodal LLMs
Yingqi Fan, Anhao Zhao, Jinlan Fu, Junlong Tong, Hui Su, Yijie Pan, Wei Zhang, Xiaoyu Shen†
EMNLP 2025 Main. [link] [code]
LLM as Effective Streaming Processor: Bridging Streaming-Batch Mismatches with Group Position Encoding
Junlong Tong, Jinlan Fu, Zixuan Lin, Yingqi Fan, Anhao Zhao, Hui Su, Xiaoyu Shen†
Findings of ACL 2025. [link] [code]
SkipGPT: Each Token is One of a Kind
Anhao Zhao, Fanghua Ye†, Yingqi Fan, Junlong Tong, Jing Xiong, Zhiwei Fei, Hui Su, Xiaoyu Shen†
ICML 2025. [link] [code]
Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism
Anhao Zhao, Fanghua Ye, Jinlan Fu, Xiaoyu Shen†
EMNLP 2024 Main. [link] [code]
A dynamic multi-modal deep reinforcement learning framework for 3D bin packing problem
Anhao Zhao, Tianrui Li, Andrew Lim
Knowledge-Based Systems 2024. [link] [code]
ArXiv Preprints
On-Policy Supervised Fine-Tuning for Efficient Reasoning
Anhao Zhao, Ziyang Chen, Junlong Tong, Yingqi Fan, Fanghua Ye, Shuhao Li, Yunpu Ma, Wenjie Li, Xiaoyu Shen†
Arxiv 2026. [link] [code]
ViCA: Efficient Multimodal LLMs with Vision-Only Cross-Attention
Wenjie Liu, Hao Wu, Xin Qiu, Yingqi Fan, Yihan Zhang, Anhao Zhao, Yunpu Ma, Xiaoyu Shen†
Arxiv 2026. [link] [code]
From LLMs to LRMs: Rethinking Pruning for Reasoning-Centric Models
Longwei Ding, Anhao Zhao, Fanghua Ye, Ziyang Chen, Xiaoyu Shen†
Arxiv 2026. [link] [code]
Reasoning Beyond Language: A Comprehensive Survey on Latent Chain-of-Thought Reasoning
Xinghao Chen*, Anhao Zhao*, Heming Xia, Xuan Lu, Hanlin Wang, Yanjun Chen, Wei Zhang, Jian Wang†, Wenjie Li, Xiaoyu Shen†
Arxiv 2025. [link] [code]
Service
Reviewer/Program Committee Member:
ICLR26, CVPR26, ECCV26, ICML26
Teaching Assistant:
COMP 5311: Internet Infrastructure and Protocols, Fall 2025, PolyU
COMP 5532: DIGITAL TWINS & VIRTUAL HUMAN, Spring 2026, PolyU
