Ningxin Zheng GitHub avatar

Ningxin Zheng

AI systems researcher focusing on large-scale LLM training systems, efficient deep learning, sparse computation, GPU kernels, and compiler/runtime techniques. He graduated from Shanghai Jiao Tong University, worked at Microsoft Research Asia after graduation, and joined ByteDance Seed in 2023 to build production-scale LLM training systems.

Research focus

01

LLM systems

Production training and inference systems for MoE and long-context LLM workloads, with emphasis on throughput, resilience, and scalability.

02

GPU kernels

Low-level kernels and compiler/runtime support for sparse weights, low-precision tensors, communication overlap, and tile-centric programming.

03

Sparse computation

Frameworks and transformations that expose sparsity while preserving dense-kernel efficiency, including TeSA, PIT, and dynamic sparse execution.

04

Model compression

Efficient architecture search, pruning, latency prediction, and hardware-friendly optimization for practical deployment constraints.

Publications

Selected papers

2026

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production

EuroSys 2026 · Chao Jin, Ziheng Jiang, Zhihao Bai, Zheng Zhong, Juncai Liu, Xiang Li, Ningxin Zheng, et al.

2026

MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production

EuroSys 2026 · Chunyu Xue, Yangrui Chen, Jianyu Jiang, Ningxin Zheng, et al.

2026

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

CoRR 2026 · Size Zheng, Xuegui Zheng, Hanshi Sun, Qi Hou, Wenlei Bao, Shiyu Li, Ningxin Zheng, et al.

2026

DisagMoE: Computation-Communication Overlapped MoE Training via Disaggregated AF-Pipe Parallelism

CoRR 2026 · Zhichen Zeng, Chi-Chih Chang, Jiayi Wang, Zezhou Wang, Zheng Zhong, Ningxin Zheng, et al.

2025

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

ICML 2025 · Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, et al.

2025

COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts MLSys 2025 Honorable Mention Outstanding Paper

MLSys 2025 · Shulai Zhang, Ningxin Zheng, Haibin Lin, Ziheng Jiang, Wenlei Bao, Chengquan Jiang, et al.

2025

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives

MLSys 2025 · Size Zheng, Jin Fang, Xuegui Zheng, Qi Hou, Wenlei Bao, Ziheng Jiang, Ningxin Zheng, et al.

2025

Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler

CoRR 2025 · Size Zheng, Wenlei Bao, Qi Hou, Xuegui Zheng, Jin Fang, Chenhui Huang, Ningxin Zheng, et al.

2025

Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution

CoRR 2025 · Shulai Zhang, Ao Xu, Quan Chen, Han Zhao, Weihao Cui, Haibin Lin, Ningxin Zheng, et al.

2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation

OSDI 2024 · Lei Wang, Lingxiao Ma, Shijie Cao, Quanlu Zhang, Jilong Xue, Yining Shi, Ningxin Zheng, et al.

2024

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

CoRR 2024 · Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Yinmin Zhong, Xuanrun Zhang, Ningxin Zheng, et al.

2023

Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning

MLSys 2023 · Bin Lin, Ningxin Zheng, Lei Wang, Shijie Cao, Lingxiao Ma, Quanlu Zhang, et al.

2023

Optimizing Dynamic Neural Networks with Brainstorm

OSDI 2023 · Weihao Cui, Zhenhua Han, Lingji Ouyang, Yichuan Wang, Ningxin Zheng, et al.

2023

PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation

SOSP 2023 · Ningxin Zheng, Huiqiang Jiang, Quanlu Zhang, Zhenhua Han, Lingxiao Ma, et al.

2023

SparDA: Accelerating Dynamic Sparse Deep Neural Networks via Sparse-Dense Transformation

CoRR 2023 · Ningxin Zheng, Huiqiang Jiang, Quanlu Zhang, Zhenhua Han, Yuqing Yang, Lingxiao Ma, et al.

2022

Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs

IEEE Transactions on Computers 2022 · Wei Zhang, Quan Chen, Ningxin Zheng, Weihao Cui, Kaihua Fu, Minyi Guo.

2022

Astraea: towards QoS-aware and resource-efficient multi-stage GPU services

ASPLOS 2022 · Wei Zhang, Quan Chen, Kaihua Fu, Ningxin Zheng, Zhiyi Huang, Jingwen Leng, Minyi Guo.

2022

SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute

OSDI 2022 · Ningxin Zheng, Bin Lin, Quanlu Zhang, Lingxiao Ma, Yuqing Yang, Fan Yang, et al.

2022

QoS-Aware Irregular Collaborative Inference for Improving Throughput of DNN Services

SC 2022 · Kaihua Fu, Jiuchen Shi, Quan Chen, Ningxin Zheng, Wei Zhang, Deze Zeng, Minyi Guo.

2021

nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices MobiSys 2021 Best Paper Award

MobiSys 2021 · Li Lyna Zhang, Shihao Han, Jianyu Wei, Ningxin Zheng, Ting Cao, Yuqing Yang, Yunxin Liu. Also received all three highest Artifact Evaluation badges.

2021

CHARM: Collaborative Host and Accelerator Resource Management for GPU Datacenters

ICCD 2021 · Wei Zhang, Kaihua Fu, Ningxin Zheng, Quan Chen, Chao Li, Wenli Zheng, Minyi Guo.

2021

Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction

SC 2021 · Weihao Cui, Han Zhao, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, et al.

2020

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds

ICPP 2020 · Wei Zhang, Ningxin Zheng, Quan Chen, Yong Yang, Zhuo Song, Tao Ma, et al.

2019

POSTER: Precise Capacity Planning for Database Public Clouds

PACT 2019 · Ningxin Zheng, Quan Chen, Yong Yang, Jin Li, Wenli Zheng, Minyi Guo.

2018

CLIBE: Precise Cluster-Level I/O Bandwidth Enforcement in Distributed File System

HPCC/SmartCity/DSS 2018 · Ningxin Zheng, Quan Chen, Chen Chen, Minyi Guo.

2024

Online Streaming Video Super-Resolution With Convolutional Look-Up Table

IEEE Transactions on Image Processing 2024 · Guanghao Yin, Zefan Qu, Xinyang Jiang, Shan Jiang, Zhenhua Han, Ningxin Zheng, et al.

2023

Online Video Super-Resolution With Convolutional Kernel Bypass Grafts

IEEE Transactions on Multimedia 2023 · Jun Xiao, Xinyang Jiang, Ningxin Zheng, Huan Yang, Yifan Yang, Yuqing Yang, et al.

2023

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

CVPR 2023 · Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu, Yixuan Yuan.

2023

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

ICCV 2023 · Xudong Wang, Li Lyna Zhang, Jiahang Xu, Quanlu Zhang, Yujing Wang, Yuqing Yang, Ningxin Zheng, et al.

2021

Full-Cycle Energy Consumption Benchmark for Low-Carbon Computer Vision

CoRR 2021 · Bo Li, Xinyang Jiang, Donglin Bai, Yuge Zhang, Ningxin Zheng, Xuanyi Dong, et al.

Open source

Selected projects

Contact

Find me online

For publications, source code, and recent work, the links below are the most reliable public entry points.