About Me
I am a Ph.D. student in the Software and Societal Systems Department at Carnegie Mellon University’s School of Computer Science, where I am fortunate to be advised by Prof. Fei Fang. Before joining CMU, I received my B.Eng. in Computer Science from the ACM Honors Class at Shanghai Jiao Tong University. Here is my latest CV.
My research centers on LLM post-training, multi-agent reinforcement learning, explainable reinforcement learning (XRL), and explainable AI. I am particularly interested in making learning dramatically more efficient in sparse-reward settings, especially through exploration, credit assignment, and scalable post-training methods. My long-term goal is to develop learning systems that can discover robust behaviors from minimal feedback in complex environments. I previously interned at ByteDance Seed Post-training (2024) and Meta Superintelligence Labs (2025).
Contact Info
- Email: zczhang [at] cmu.edu, or zhichen3 [at] cs.cmu.edu
Publications
* means equal contribution.
-
Verbalized Action Masking for Exploration in RL Post-Training: A Case Study in Chess.
Zhicheng Zhang, Ziyan Wang, Yali Du, Fei Fang.
arXiv preprint arXiv:2602.16833, 2026. [Link] -
Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models.
Zhicheng Zhang, Ziyan Wang, Yali Du, Fei Fang.
arXiv preprint arXiv:2506.20061, 2025. [Link] -
Aligning Agent Policies with Preferences: Human-Centered Interpretable Reinforcement Learning.
Stephanie Milani*, Zhicheng Zhang*, Nicholay Topin, Lirong Xia, and Fei Fang.
AIES, 2025. [Link] -
Incorporating Human Preferences into Interpretable Reinforcement Learning with Tree Policies.
Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Lirong Xia, and Fei Fang.
NeurIPS WiML Workshop, 2025. [Link] -
Interpretable Multi-Agent Reinforcement Learning with Decision-Tree Policies.
Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Zheyuan Ryan Shi, Charles Kamhoua, Evangelos E. Papalexakis, and Fei Fang.
In Explainable Agency in Artificial Intelligence, CRC Press, 2024. [Link] -
Predicting and Presenting Task Difficulty for Crowdsourcing Food Rescue Platforms.
Zheyuan Ryan Shi, Jiayin Zhi, Siqi Zeng, Zhicheng Zhang, Ameesh Kapoor, Sean Hudson, Hong Shen, and Fei Fang.
WWW Web4Good Track, 2024. [Link] -
Model-based Multi-agent Reinforcement Learning: Recent Progress and Prospects.
Xihuai Wang, Zhicheng Zhang, Weinan Zhang.
arXiv preprint arXiv:2203.10603, 2022. [Link] -
Model-based Offline Policy Optimization with Distribution Correcting Regularization.
Jian Shen*, Mingcheng Chen*, Zhicheng Zhang, Zhengyu Yang, Weinan Zhang, and Yong Yu.
ECML-PKDD, 2021. [Link]