Portrait of Zhicheng Zhang

Zhicheng Zhang

I am a Ph.D. student in the School of Computer Science at Carnegie Mellon University, where I am fortunate to be advised by Prof. Fei Fang. Before that, I studied computer science in the ACM Honors Class at Shanghai Jiao Tong University, where I was fortunate to be advised by Prof. Weinan Zhang and Prof. Yong Yu.

I am interested in (multi-agent) reinforcement learning, efficient exploration, and the intersection of RL and LLMs, especially in settings where limited feedback makes useful behavior hard to discover. I have also spent time at Meta Superintelligence Labs and ByteDance Seed Post-training, working on LLM post-training and reinforcement learning for language agents. Here is my latest CV.

Email: zhichen3 [at] cs [dot] cmu [dot] edu.

Selected Publications

* means equal contribution.

Figure for VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study

VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study.

Zhicheng Zhang, Ziyan Wang, Yali Du, Fei Fang.

arXiv:2602.16833; Under Review.

Figure for Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models

Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models.

Zhicheng Zhang, Ziyan Wang, Yali Du, Fei Fang.

arXiv:2506.20061; Under Review.

Figure for Aligning Agent Policies with Preferences: Human-Centered Interpretable Reinforcement Learning

Aligning Agent Policies with Preferences: Human-Centered Interpretable Reinforcement Learning.

Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Lirong Xia, Fei Fang.

The AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2025.

Figure for M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality

M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality.

Ziyan Wang, Zhicheng Zhang, Fei Fang, Yali Du.

The Forty-Second International Conference on Machine Learning (ICML), 2025.

Figure for Flaming-hot Initiation with Regular Execution Sampling for Large Language Models

Flaming-hot Initiation with Regular Execution Sampling for Large Language Models.

Weizhe Chen, Zhicheng Zhang, Guanlin Liu, Renjie Zheng, Wenlei Shi, Chen Dun, Zheng Wu, Xing Jin, Lin Yan.

Findings of the Association for Computational Linguistics: NAACL 2025.

Figure for MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure

MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure.

Zhicheng Zhang*, Yancheng Liang*, Yi Wu, Fei Fang.

The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024).

Figure for MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning

MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning.

Stephanie Milani*, Zhicheng Zhang*, Nicholay Topin, Zheyuan Ryan Shi, Charles Kamhoua, Evangelos E. Papalexakis, Fei Fang.

The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2022.

Figure for Model-based Multi-agent Reinforcement Learning: Recent Progress and Prospects

Model-based Multi-agent Reinforcement Learning: Recent Progress and Prospects.

Xihuai Wang, Zhicheng Zhang, Weinan Zhang.

arXiv preprint arXiv:2203.10603, 2022.

Activities

Talks

Jul 2025

RL China Seminar (Session 123)

Presented Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with LLMs. Recording.

Mar 2023

CHIP Fellows' Symposium, Boston Children's Hospital

Presented Interpretable Multi-Agent Reinforcement Learning.

Service

May 2024 - Dec 2024

CMU REUSE Program Mentor

Mentored an undergraduate student with Prof. Fei Fang on interpretable reinforcement learning research using influence functions.

Sep 2023, Apr 2023

Predictive Intelligence for Pandemic Prevention (PIPP)

  • Moderated the panel on disease and misinformation co-evolution for the PILOT Synthesis Workshop (Sep 2023).
  • Co-organized the Modeling Intervention Acceptance for Disease Mitigation Workshop (Apr 2023).

Conference and Journal Reviewing

GameSec ICML NeurIPS ICLR IEEE Transactions on Artificial Intelligence

Teaching

Fall 2025, Spring 2024, Summer 2020, Spring 2020

Teaching Assistantships

Carnegie Mellon University

  • Demystifying AI: Concepts and Applications (17-709) (Fall 2025)
  • AI Methods for Social Good (17-737) (Spring 2024)

Shanghai Jiao Tong University

  • Practice of Computer Algorithms (MS125) (Summer 2020)
  • Data Structure (CS147) (Spring 2020)