Hande Dong

LLM R&D Lead, CodeBuddy/WorkBuddy

Tencent, Shenzhen, China

Email: donghd66 AT gmail.com
Google Scholar • Github

I am the LLM R&D Lead of Tencent CodeBuddy/WorkBuddy, leading the model research and development team. My expertise is centered around Large Language Models, spanning pre-training, post-training, reinforcement learning, and LLM Agent. Currently, I am focused on leveraging the vast experience data generated by widely deployed Agent applications to enhance model capabilities.

News

Jul 2026 Our paper Rethinking Entropy Interventions in RLVR received the ACL'26 Outstanding Paper Award! 🎉
May 2026 Technical report Echo: Learning from Experience Data via User-Driven Refinement released. Echo increases code acceptance rate from 25.7% to 35.7% in production.
Apr 2026 Two papers accepted by ACL'26 main conference: ReCreate and Rethinking Entropy Interventions in RLVR.
Apr 2026 Three papers accepted by ACL'26 Findings.
Jan 2026 Paper Scheduling Your LLM RL with Reasoning Trees accepted by ICLR'26.

Work Experience

LLM R&D Lead

Tencent, 2026.04 - present

Leading the model R&D team of CodeBuddy / WorkBuddy. Large language model, code intelligence, AI agent.

Senior Researcher

Tencent, 2023.08 - 2026.04

Model research and development of CodeBuddy. Large language model, code intelligence, RAG, code agent.

Algorithm Engineer

International Digital Economy Academy (IDEA), 2022.07 - 2023.08

Code understanding and generation, pretrained language model, large language model.

Education

University of Science and Technology of China (USTC)

Master in School of Information Science and Technology, 2019.09 - 2022.06

Advisor: Prof. Xiangnan He

University of Science and Technology of China (USTC)

Bachelor in School of Physical Sciences, 2015.08 - 2019.06

Chung-Yao Chao Talent Program in Applied Physics

Selected Publications

Echo: Learning from Experience Data via User-Driven Refinement

Hande Dong, Xiaoyun Liang, Jiarui Yu, Jiayi Lin, Changqing Ai, Feng Liu, Wenjun Zhang, et al.

Technical Report • arXiv • Project Leader

Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective 🏆 ACL'26 Outstanding Paper

Zhezheng Hao, Hong Wang, Haoyang Liu, Jian Luo, Jiarui Yu, Hande Dong, Qiang Lin, Can Wang, Jiawei Chen

ACL 2026 • arXiv • Code • Corresponding author

ReCreate: Reasoning and Creating Domain Agents Driven by Experience

Zhezheng Hao, Hong Wang, Jian Luo, Jianqing Zhang, Yuyan Zhou, Qiang Lin, Can Wang, Hande Dong, Jiawei Chen

ACL 2026 • arXiv • Code • Corresponding author

LEPO: Latent Reasoning Policy Optimization for Large Language Models

Yuyan Zhou, Jiarui Yu, Hande Dong, Zhezheng Hao, Hong Wang, Jianqing Zhang, Qiang Lin

ACL 2026 Findings • Corresponding author

Scheduling Your LLM Reinforcement Learning with Reasoning Trees

Hong Wang, Zhezheng Hao, Jian Luo, Chenxing Wei, Yao Shu, Lei Liu, Qiang Lin, Hande Dong, Jiawei Chen

ICLR 2026 • arXiv • Code • Corresponding author

AP2O: Correcting LLM-Generated Code Errors Type by Type Like Humans via Adaptive Progressive Preference Optimization

Jianqing Zhang, Wei Xia, Hande Dong, Qiang Lin, Jian Cao

AAAI 2026 • arXiv • Code

UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models

Jinke Li, Jiarui Yu, Chenxing Wei, Hande Dong, Qiang Lin, Liangjing Yang, Zhicai Wang, Yanbin Hao

ACMMM 2025 • arXiv • Code • Corresponding author

EFIM: Efficient Serving of LLMs for Infilling Tasks with Improved KV Cache Reuse

Tianyu Guo, Hande Dong, Yichong Leng, Feng Liu, Cheater Lin, Nong Xiao, Xianwei Zhang

EURO-PAR 2025 • arXiv • Code • Corresponding author

Improving Code Search with Hard Negative Sampling Based on Fine-tuning

Hande Dong, Jiayi Lin, Yanlin Wang, Yichong Leng, Jiawei Chen, Yutao Xie

APSEC 2024 • arXiv • Code

Survey of Code Search based on Deep Learning

Yutao Xie, Jiayi Lin, Hande Dong, Lei Zhang, Zhonghai Wu

ACM TOSEM • arXiv

Bias and Debias in Recommender System: A Survey and Future Directions

Jiawei Chen, Hande Dong, Xiang Wang, Fuli Feng, Meng Wang, Xiangnan He

ACM TOIS • arXiv

AutoDebias: Learning to Debias for Recommendation

Jiawei Chen, Hande Dong, Yang Qiu, Xiangnan He, Xin Xin, Liang Chen, Guli Lin, Keping Yang

SIGIR 2021 • arXiv • Code • Slides • Co-first author

On the Equivalence of Decoupled Graph Convolution Network and Label Propagation

Hande Dong, Jiawei Chen, Fuli Feng, Xiangnan He, Shuxian Bi, Zhaolin Ding, Peng Cui

WWW 2021 • arXiv • Code • Slides

Selected Honors & Awards

2026.07 Outstanding Paper Award, ACL 2026 (top ~1%)
2025.12 GM Lightning Award, Tencent (top 2%)
2024 Outstanding Mentor for New Employees, Tencent
2024.12 Outstanding Contributor, Tencent (top 10%)
2024.06 Outstanding Contributor, Tencent (top 10%)
2022.06 Outstanding Graduates, USTC (top 10%)
2019.06 Character and Academic Outstanding Graduates, Anhui Province (top 2%)
2019.06 Outstanding Graduates, USTC (top 10%)
2018.05 Outstanding Student Leaders, USTC