LLM systems
Production training and inference systems for MoE and long-context LLM workloads, with emphasis on throughput, resilience, and scalability.
AI systems researcher focusing on large-scale LLM training systems, efficient deep learning, sparse computation, GPU kernels, and compiler/runtime techniques. He graduated from Shanghai Jiao Tong University, worked at Microsoft Research Asia after graduation, and joined ByteDance Seed in 2023 to build production-scale LLM training systems.
Production training and inference systems for MoE and long-context LLM workloads, with emphasis on throughput, resilience, and scalability.
Low-level kernels and compiler/runtime support for sparse weights, low-precision tensors, communication overlap, and tile-centric programming.
Frameworks and transformations that expose sparsity while preserving dense-kernel efficiency, including TeSA, PIT, and dynamic sparse execution.
Efficient architecture search, pruning, latency prediction, and hardware-friendly optimization for practical deployment constraints.
Publications
EuroSys 2026 · Chao Jin, Ziheng Jiang, Zhihao Bai, Zheng Zhong, Juncai Liu, Xiang Li, Ningxin Zheng, et al.
EuroSys 2026 · Chunyu Xue, Yangrui Chen, Jianyu Jiang, Ningxin Zheng, et al.
CoRR 2026 · Size Zheng, Xuegui Zheng, Hanshi Sun, Qi Hou, Wenlei Bao, Shiyu Li, Ningxin Zheng, et al.
CoRR 2026 · Zhichen Zeng, Chi-Chih Chang, Jiayi Wang, Zezhou Wang, Zheng Zhong, Ningxin Zheng, et al.
ICML 2025 · Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, et al.
MLSys 2025 · Shulai Zhang, Ningxin Zheng, Haibin Lin, Ziheng Jiang, Wenlei Bao, Chengquan Jiang, et al.
MLSys 2025 · Size Zheng, Jin Fang, Xuegui Zheng, Qi Hou, Wenlei Bao, Ziheng Jiang, Ningxin Zheng, et al.
CoRR 2025 · Size Zheng, Wenlei Bao, Qi Hou, Xuegui Zheng, Jin Fang, Chenhui Huang, Ningxin Zheng, et al.
CoRR 2025 · Shulai Zhang, Ao Xu, Quan Chen, Han Zhao, Weihao Cui, Haibin Lin, Ningxin Zheng, et al.
OSDI 2024 · Lei Wang, Lingxiao Ma, Shijie Cao, Quanlu Zhang, Jilong Xue, Yining Shi, Ningxin Zheng, et al.
CoRR 2024 · Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Yinmin Zhong, Xuanrun Zhang, Ningxin Zheng, et al.
MLSys 2023 · Bin Lin, Ningxin Zheng, Lei Wang, Shijie Cao, Lingxiao Ma, Quanlu Zhang, et al.
OSDI 2023 · Weihao Cui, Zhenhua Han, Lingji Ouyang, Yichuan Wang, Ningxin Zheng, et al.
SOSP 2023 · Ningxin Zheng, Huiqiang Jiang, Quanlu Zhang, Zhenhua Han, Lingxiao Ma, et al.
CoRR 2023 · Ningxin Zheng, Huiqiang Jiang, Quanlu Zhang, Zhenhua Han, Yuqing Yang, Lingxiao Ma, et al.
IEEE Transactions on Computers 2022 · Wei Zhang, Quan Chen, Ningxin Zheng, Weihao Cui, Kaihua Fu, Minyi Guo.
ASPLOS 2022 · Wei Zhang, Quan Chen, Kaihua Fu, Ningxin Zheng, Zhiyi Huang, Jingwen Leng, Minyi Guo.
OSDI 2022 · Ningxin Zheng, Bin Lin, Quanlu Zhang, Lingxiao Ma, Yuqing Yang, Fan Yang, et al.
SC 2022 · Kaihua Fu, Jiuchen Shi, Quan Chen, Ningxin Zheng, Wei Zhang, Deze Zeng, Minyi Guo.
MobiSys 2021 · Li Lyna Zhang, Shihao Han, Jianyu Wei, Ningxin Zheng, Ting Cao, Yuqing Yang, Yunxin Liu. Also received all three highest Artifact Evaluation badges.
ICCD 2021 · Wei Zhang, Kaihua Fu, Ningxin Zheng, Quan Chen, Chao Li, Wenli Zheng, Minyi Guo.
SC 2021 · Weihao Cui, Han Zhao, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, et al.
ICPP 2020 · Wei Zhang, Ningxin Zheng, Quan Chen, Yong Yang, Zhuo Song, Tao Ma, et al.
PACT 2019 · Ningxin Zheng, Quan Chen, Yong Yang, Jin Li, Wenli Zheng, Minyi Guo.
HPCC/SmartCity/DSS 2018 · Ningxin Zheng, Quan Chen, Chen Chen, Minyi Guo.
IEEE Transactions on Image Processing 2024 · Guanghao Yin, Zefan Qu, Xinyang Jiang, Shan Jiang, Zhenhua Han, Ningxin Zheng, et al.
IEEE Transactions on Multimedia 2023 · Jun Xiao, Xinyang Jiang, Ningxin Zheng, Huan Yang, Yifan Yang, Yuqing Yang, et al.
CVPR 2023 · Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu, Yixuan Yuan.
ICCV 2023 · Xudong Wang, Li Lyna Zhang, Jiahang Xu, Quanlu Zhang, Yujing Wang, Yuqing Yang, Ningxin Zheng, et al.
CoRR 2021 · Bo Li, Xinyang Jiang, Donglin Bai, Yuge Zhang, Ningxin Zheng, Xuanyi Dong, et al.
Open source
A GPU communication-overlap library for tensor and expert parallelism, built around CUDA/CUTLASS kernels and PyTorch integration.
MicrosoftAn open-source AutoML toolkit covering neural architecture search, model compression, hyper-parameter tuning, and ML lifecycle automation.
ByteDance-SeedA distributed compiler based on Triton for programming parallel AI systems and generating compute-communication overlapping kernels.
Contact
For publications, source code, and recent work, the links below are the most reliable public entry points.