Zhicheng Zhang

I am a Ph.D. student in the School of Computer Science at Carnegie Mellon University, where I am fortunate to be advised by Prof. Fei Fang. Before that, I studied computer science in the ACM Honors Class at Shanghai Jiao Tong University, where I was fortunate to be advised by Prof. Weinan Zhang and Prof. Yong Yu.

I am interested in (multi-agent) reinforcement learning, efficient exploration, and the intersection of RL and LLMs, especially in settings where limited feedback makes useful behavior hard to discover. I have also spent time at Meta Superintelligence Labs and ByteDance Seed Post-training, working on LLM post-training and reinforcement learning for language agents. Here is my latest CV.

Email: zhichen3 [at] cs [dot] cmu [dot] edu.

Selected Publications

* means equal contribution.

VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study.

Zhicheng Zhang, Ziyan Wang, Yali Du, Fei Fang.

arXiv:2602.16833; Under Review.

Paper

@misc{zhang2026vamverbalizedactionmasking,
  title={VAM: Verbalized Action Masking for Controllable Exploration in RL Post-Training -- A Chess Case Study},
  author={Zhicheng Zhang and Ziyan Wang and Yali Du and Fei Fang},
  year={2026},
  eprint={2602.16833},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2602.16833}
}

Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models.

Zhicheng Zhang, Ziyan Wang, Yali Du, Fei Fang.

arXiv:2506.20061; Under Review.

Paper

@misc{zhang2025learninginstructionfollowingpoliciesopenended,
  title={Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with Large Language Models},
  author={Zhicheng Zhang and Ziyan Wang and Yali Du and Fei Fang},
  year={2025},
  eprint={2506.20061},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2506.20061}
}

Aligning Agent Policies with Preferences: Human-Centered Interpretable Reinforcement Learning.

Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Lirong Xia, Fei Fang.

The AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2025.

Paper

@article{milani2025aligning,
  author={Milani, Stephanie and Zhang, Zhicheng and Topin, Nicholay and Xia, Lirong and Fang, Fei},
  title={Aligning Agent Policies with Preferences: Human-Centered Interpretable Reinforcement Learning},
  journal={Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society},
  volume={8},
  number={2},
  pages={1711--1723},
  year={2025},
  month={oct},
  doi={10.1609/aies.v8i2.36668},
  url={https://doi.org/10.1609/aies.v8i2.36668},
  publisher={Association for the Advancement of Artificial Intelligence (AAAI)}
}

M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality.

Ziyan Wang, Zhicheng Zhang, Fei Fang, Yali Du.

The Forty-Second International Conference on Machine Learning (ICML), 2025.

Paper Code

@inproceedings{wang2025m3hf,
  title={M3HF: Multi-agent Reinforcement Learning from Multi-phase Human Feedback of Mixed Quality},
  author={Wang, Ziyan and Zhang, Zhicheng and Fang, Fei and Du, Yali},
  booktitle={Proceedings of the 42nd International Conference on Machine Learning},
  pages={65429--65448},
  year={2025},
  editor={Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  volume={267},
  series={Proceedings of Machine Learning Research},
  publisher={PMLR},
  url={https://proceedings.mlr.press/v267/wang25el.html}
}

Flaming-hot Initiation with Regular Execution Sampling for Large Language Models.

Weizhe Chen, Zhicheng Zhang, Guanlin Liu, Renjie Zheng, Wenlei Shi, Chen Dun, Zheng Wu, Xing Jin, Lin Yan.

Findings of the Association for Computational Linguistics: NAACL 2025.

Paper Code

@inproceedings{chen-etal-2025-flaming,
  title={Flaming-hot Initiation with Regular Execution Sampling for Large Language Models},
  author={Chen, Weizhe and Zhang, Zhicheng and Liu, Guanlin and Zheng, Renjie and Shi, Wenlei and Dun, Chen and Wu, Zheng and Jin, Xing and Yan, Lin},
  editor={Chiruzzo, Luis and Ritter, Alan and Wang, Lu},
  booktitle={Findings of the Association for Computational Linguistics: NAACL 2025},
  pages={7133--7142},
  address={Albuquerque, New Mexico},
  month={apr},
  year={2025},
  publisher={Association for Computational Linguistics},
  isbn={979-8-89176-195-7},
  url={https://aclanthology.org/2025.findings-naacl.396/},
  doi={10.18653/v1/2025.findings-naacl.396}
}

MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure.

Zhicheng Zhang*, Yancheng Liang*, Yi Wu, Fei Fang.

The 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024).

Paper Code

@inproceedings{zhang2024mesa,
  author={Zhicheng Zhang and Yancheng Liang and Yi Wu and Fei Fang},
  title={MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure},
  booktitle={Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2024)},
  pages={2085--2093},
  year={2024},
  publisher={International Foundation for Autonomous Agents and Multiagent Systems},
  isbn={978-1-4007-0486-4},
  url={https://www.ifaamas.org/Proceedings/aamas2024/pdfs/p2085.pdf}
}

MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning.

Stephanie Milani*, Zhicheng Zhang*, Nicholay Topin, Zheyuan Ryan Shi, Charles Kamhoua, Evangelos E. Papalexakis, Fei Fang.

The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2022.

Paper Code

@inproceedings{milani2023maviper,
  author={Stephanie Milani and Zhicheng Zhang and Nicholay Topin and Zheyuan Ryan Shi and Charles Kamhoua and Evangelos E. Papalexakis and Fei Fang},
  title={MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning},
  booktitle={Machine Learning and Knowledge Discovery in Databases},
  series={Lecture Notes in Computer Science},
  volume={13716},
  pages={251--266},
  address={Cham},
  publisher={Springer Nature Switzerland},
  year={2023},
  doi={10.1007/978-3-031-26412-2_16},
  url={https://doi.org/10.1007/978-3-031-26412-2_16}
}

Model-based Multi-agent Reinforcement Learning: Recent Progress and Prospects.

Xihuai Wang, Zhicheng Zhang, Weinan Zhang.

arXiv preprint arXiv:2203.10603, 2022.

Paper

@misc{wang2022modelbasedmultiagentreinforcementlearning,
  title={Model-based Multi-agent Reinforcement Learning: Recent Progress and Prospects},
  author={Xihuai Wang and Zhicheng Zhang and Weinan Zhang},
  year={2022},
  eprint={2203.10603},
  archivePrefix={arXiv},
  primaryClass={cs.MA},
  url={https://arxiv.org/abs/2203.10603}
}

Figure for Making Teams and Influencing Agents: Efficiently Coordinating Decision Trees for Interpretable Multi-Agent Reinforcement Learning

Making Teams and Influencing Agents: Efficiently Coordinating Decision Trees for Interpretable Multi-Agent Reinforcement Learning.

Rex Chen, Stephanie Milani, Zhicheng Zhang, Norman Sadeh, Fei Fang.

The AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2025.

Paper Code

@article{chen2025making,
  author={Chen, Rex and Milani, Stephanie and Zhang, Zhicheng and Sadeh, Norman and Fang, Fei},
  title={Making Teams and Influencing Agents: Efficiently Coordinating Decision Trees for Interpretable Multi-Agent Reinforcement Learning},
  journal={Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society},
  volume={8},
  number={1},
  pages={567--578},
  year={2025},
  doi={10.1609/aies.v8i1.36571},
  url={https://doi.org/10.1609/aies.v8i1.36571}
}

Interpretable Multi-Agent Reinforcement Learning with Decision-Tree Policies.

Stephanie Milani, Zhicheng Zhang, Nicholay Topin, Zheyuan Ryan Shi, Charles Kamhoua, Evangelos E. Papalexakis, Fei Fang.

In Explainable Agency in Artificial Intelligence, CRC Press, 2024.

Paper

@incollection{milani2024interpretable,
  author={Stephanie Milani and Zhicheng Zhang and Nicholay Topin and Zheyuan Ryan Shi and Charles Kamhoua and Evangelos E. Papalexakis and Fei Fang},
  title={Interpretable Multi-Agent Reinforcement Learning with Decision-Tree Policies},
  booktitle={Explainable Agency in Artificial Intelligence: Research and Practice},
  editor={Silvia Tulli and David W. Aha},
  pages={86--120},
  year={2024},
  publisher={CRC Press},
  doi={10.1201/9781003355281-5},
  url={https://www.taylorfrancis.com/books/9781003355281/chapters/10.1201/9781003355281-5}
}

Predicting and Presenting Task Difficulty for Crowdsourcing Food Rescue Platforms.

Zheyuan Ryan Shi, Jiayin Zhi, Siqi Zeng, Zhicheng Zhang, Ameesh Kapoor, Sean Hudson, Hong Shen, Fei Fang.

The ACM Web Conference 2024 (Web4Good Track).

Paper

@inproceedings{shi2024predicting,
  author={Shi, Zheyuan Ryan and Zhi, Jiayin and Zeng, Siqi and Zhang, Zhicheng and Kapoor, Ameesh and Hudson, Sean and Shen, Hong and Fang, Fei},
  title={Predicting and Presenting Task Difficulty for Crowdsourcing Food Rescue Platforms},
  booktitle={Proceedings of the ACM Web Conference 2024},
  series={WWW '24},
  pages={4686--4696},
  publisher={ACM},
  year={2024},
  month={may},
  doi={10.1145/3589334.3648155},
  url={https://doi.org/10.1145/3589334.3648155}
}

Model-based Offline Policy Optimization with Distribution Correcting Regularization.

Jian Shen*, Mingcheng Chen*, Zhicheng Zhang, Zhengyu Yang, Weinan Zhang, Yong Yu.

The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2021.

Paper

@inproceedings{shen2021modelbased,
  author={Jian Shen and Mingcheng Chen and Zhicheng Zhang and Zhengyu Yang and Weinan Zhang and Yong Yu},
  title={Model-Based Offline Policy Optimization with Distribution Correcting Regularization},
  editor={Nuria Oliver and Fernando P{\\'e}rez-Cruz and Stefan Kramer and Jesse Read and Jose A. Lozano},
  booktitle={Machine Learning and Knowledge Discovery in Databases. Research Track},
  series={Lecture Notes in Computer Science},
  volume={12975},
  pages={174--189},
  year={2021},
  publisher={Springer International Publishing},
  address={Cham},
  isbn={978-3-030-86486-6},
  doi={10.1007/978-3-030-86486-6_11},
  url={https://doi.org/10.1007/978-3-030-86486-6_11}
}

MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks.

Menghui Zhu*, Minghuan Liu*, Jian Shen, Zhicheng Zhang, Sheng Chen, Weinan Zhang, Deheng Ye, Yong Yu, Qiang Fu, Wei Yang.

The 30th International Joint Conference on Artificial Intelligence (IJCAI 2021).

Paper Code

@inproceedings{ijcai2021p480,
  author={Zhu, Menghui and Liu, Minghuan and Shen, Jian and Zhang, Zhicheng and Chen, Sheng and Zhang, Weinan and Ye, Deheng and Yu, Yong and Fu, Qiang and Yang, Wei},
  title={MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks},
  booktitle={Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, {IJCAI-21}},
  publisher={International Joint Conferences on Artificial Intelligence Organization},
  editor={Zhi-Hua Zhou},
  pages={3484--3491},
  year={2021},
  month={8},
  note={Main Track},
  doi={10.24963/ijcai.2021/480},
  url={https://doi.org/10.24963/ijcai.2021/480}
}

Activities

Talks

Jul 2025

RL China Seminar (Session 123)

Presented Learning Instruction-Following Policies through Open-Ended Instruction Relabeling with LLMs. Recording.

Mar 2023

CHIP Fellows' Symposium, Boston Children's Hospital

Presented Interpretable Multi-Agent Reinforcement Learning.

Service

May 2024 - Dec 2024

CMU REUSE Program Mentor

Mentored an undergraduate student with Prof. Fei Fang on interpretable reinforcement learning research using influence functions.

Sep 2023, Apr 2023

Predictive Intelligence for Pandemic Prevention (PIPP)

Moderated the panel on disease and misinformation co-evolution for the PILOT Synthesis Workshop (Sep 2023).
Co-organized the Modeling Intervention Acceptance for Disease Mitigation Workshop (Apr 2023).

Conference and Journal Reviewing

GameSec ICML NeurIPS ICLR IEEE Transactions on Artificial Intelligence

Teaching

Fall 2025, Spring 2024, Summer 2020, Spring 2020

Teaching Assistantships

Carnegie Mellon University

Demystifying AI: Concepts and Applications (17-709) (Fall 2025)
AI Methods for Social Good (17-737) (Spring 2024)

Shanghai Jiao Tong University

Practice of Computer Algorithms (MS125) (Summer 2020)
Data Structure (CS147) (Spring 2020)