About Me

I am a Ph.D. student in the Software and Societal Systems Department at Carnegie Mellon University’s School of Computer Science, where I am fortunate to be advised by Prof. Fei Fang. Before joining CMU, I received my B.Eng. in Computer Science from the ACM Honors Class at Shanghai Jiao Tong University. Here is my latest CV.

My research centers on LLM post-training, multi-agent reinforcement learning, explainable reinforcement learning (XRL), and explainable AI. I am particularly interested in making learning dramatically more efficient in sparse-reward settings, especially through exploration, credit assignment, and scalable post-training methods. My long-term goal is to develop learning systems that can discover robust behaviors from minimal feedback in complex environments. I previously interned at ByteDance Seed Post-training (2024) and Meta Superintelligence Labs (2025).

Contact Info

  • Email: zczhang [at] cmu.edu, or zhichen3 [at] cs.cmu.edu

Publications

* means equal contribution.

  • Figure for Verbalized Action Masking for Exploration in RL Post-Training: A Case Study in Chess
    Verbalized Action Masking for Exploration in RL Post-Training: A Case Study in Chess.
    Zhicheng Zhang, Ziyan Wang, Yali Du, Fei Fang.
    arXiv preprint arXiv:2602.16833, 2026. [Link]
  • Figure for Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models
    Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models.
    Zhicheng Zhang, Ziyan Wang, Yali Du, Fei Fang.
    arXiv preprint arXiv:2506.20061, 2025. [Link]
  • Figure for Aligning Agent Policies with Preferences: Human-Centered Interpretable Reinforcement Learning
    Aligning Agent Policies with Preferences: Human-Centered Interpretable Reinforcement Learning.
    Stephanie Milani*, Zhicheng Zhang*, Nicholay Topin, Lirong Xia, and Fei Fang.
    AIES, 2025. [Link]
  • Figure for Making Teams and Influencing Agents: Efficiently Coordinating Decision Trees for Interpretable Multi-Agent Reinforcement Learning
    Making Teams and Influencing Agents: Efficiently Coordinating Decision Trees for Interpretable Multi-Agent Reinforcement Learning.
    Rex Chen, Stephanie Milani, Zhicheng Zhang, Norman Sadeh, Fei Fang.
    AIES, 2025. [Link] [code]
  • Figure for M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality
    M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality.
    Ziyan Wang, Zhicheng Zhang, Fei Fang, Yali Du.
    ICML, 2025, Poster (24.3%). [Link] [code]
  • Figure for Incorporating Human Preferences into Interpretable Reinforcement Learning with Tree Policies
    Incorporating Human Preferences into Interpretable Reinforcement Learning with Tree Policies.
    Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Lirong Xia, and Fei Fang.
    NeurIPS WiML Workshop, 2025. [Link]
  • Figure for Flaming-hot Initiation with Regular Execution Sampling for Large Language Models
    Flaming-hot Initiation with Regular Execution Sampling for Large Language Models.
    Weizhe Chen, Zhicheng Zhang, Guanlin Liu, Renjie Zheng, Wenlei Shi, Chen Dun, Zheng Wu, Xing Jin, Lin Yan.
    NAACL Findings, 2025. [Link] [code] (integrated into veRL)
  • Figure for Interpretable Multi-Agent Reinforcement Learning with Decision-Tree Policies
    Interpretable Multi-Agent Reinforcement Learning with Decision-Tree Policies.
    Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Zheyuan Ryan Shi, Charles Kamhoua, Evangelos E. Papalexakis, and Fei Fang.
    In Explainable Agency in Artificial Intelligence, CRC Press, 2024. [Link]
  • Figure for Predicting and Presenting Task Difficulty for Crowdsourcing Food Rescue Platforms
    Predicting and Presenting Task Difficulty for Crowdsourcing Food Rescue Platforms.
    Zheyuan Ryan Shi, Jiayin Zhi, Siqi Zeng, Zhicheng Zhang, Ameesh Kapoor, Sean Hudson, Hong Shen, and Fei Fang.
    WWW Web4Good Track, 2024. [Link]
  • Figure for MESA: Multi-Agent Meta-Exploration through Exploiting State-Action Space Structure
    MESA: Multi-Agent Meta-Exploration through Exploiting State-Action Space Structure.
    Zhicheng Zhang*, Yancheng Liang*, Yi Wu, Fei Fang.
    AAMAS, 2024. [Link] [code]
  • Figure for MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning
    MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning.
    Stephanie Milani*, Zhicheng Zhang*, Nicholay Topin, Zheyuan Ryan Shi, Charles Kamhoua, Evangelos E. Papalexakis, Fei Fang.
    ECML-PKDD, 2022. [Link] [code]
  • Figure for Model-based Multi-agent Reinforcement Learning: Recent Progress and Prospects
    Model-based Multi-agent Reinforcement Learning: Recent Progress and Prospects.
    Xihuai Wang, Zhicheng Zhang, Weinan Zhang.
    arXiv preprint arXiv:2203.10603, 2022. [Link]
  • Figure for Model-based Offline Policy Optimization with Distribution Correcting Regularization
    Model-based Offline Policy Optimization with Distribution Correcting Regularization.
    Jian Shen*, Mingcheng Chen*, Zhicheng Zhang, Zhengyu Yang, Weinan Zhang, and Yong Yu.
    ECML-PKDD, 2021. [Link]
  • Figure for MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks
    MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks.
    Menghui Zhu*, Minghuan Liu*, Jian Shen, Zhicheng Zhang, Sheng Chen, Weinan Zhang, Deheng Ye, Yong Yu, Qiang Fu, Wei Yang.
    IJCAI, 2021. [Link] [code]