Ningxin Zheng

AI systems researcher focusing on large-scale LLM training systems, efficient deep learning, sparse computation, GPU kernels, and compiler/runtime techniques. He graduated from Shanghai Jiao Tong University, worked at Microsoft Research Asia after graduation, and joined ByteDance Seed in 2023 to build production-scale LLM training systems.

GitHub Google Scholar DBLP

Research focus

LLM systems

Production training and inference systems for MoE and long-context LLM workloads, with emphasis on throughput, resilience, and scalability.

GPU kernels

Low-level kernels and compiler/runtime support for sparse weights, low-precision tensors, communication overlap, and tile-centric programming.

Sparse computation

Frameworks and transformations that expose sparsity while preserving dense-kernel efficiency, including TeSA, PIT, and dynamic sparse execution.

Model compression

Efficient architecture search, pruning, latency prediction, and hardware-friendly optimization for practical deployment constraints.

Publications

Selected papers

2026

MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production

EuroSys 2026 · Chao Jin, Ziheng Jiang, Zhihao Bai, Zheng Zhong, Juncai Liu, Xiang Li, Ningxin Zheng, et al.

2026

MegaScale-Omni: A Hyper-Scale, Workload-Resilient System for MultiModal LLM Training in Production

EuroSys 2026 · Chunyu Xue, Yangrui Chen, Jianyu Jiang, Ningxin Zheng, et al.

2026

DITRON: Distributed Multi-level Tiling Compiler for Parallel Tensor Programs

CoRR 2026 · Size Zheng, Xuegui Zheng, Hanshi Sun, Qi Hou, Wenlei Bao, Shiyu Li, Ningxin Zheng, et al.

2026

DisagMoE: Computation-Communication Overlapped MoE Training via Disaggregated AF-Pipe Parallelism

CoRR 2026 · Zhichen Zeng, Chi-Chih Chang, Jiayi Wang, Zezhou Wang, Zheng Zhong, Ningxin Zheng, et al.

2025

ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference

ICML 2025 · Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, et al.

2025

COMET: Fine-grained Computation-communication Overlapping for Mixture-of-Experts MLSys 2025 Honorable Mention Outstanding Paper

MLSys 2025 · Shulai Zhang, Ningxin Zheng, Haibin Lin, Ziheng Jiang, Wenlei Bao, Chengquan Jiang, et al.

2025

TileLink: Generating Efficient Compute-Communication Overlapping Kernels using Tile-Centric Primitives

MLSys 2025 · Size Zheng, Jin Fang, Xuegui Zheng, Qi Hou, Wenlei Bao, Ziheng Jiang, Ningxin Zheng, et al.

2025

Triton-distributed: Programming Overlapping Kernels on Distributed AI Systems with the Triton Compiler

CoRR 2025 · Size Zheng, Wenlei Bao, Qi Hou, Xuegui Zheng, Jin Fang, Chenhui Huang, Ningxin Zheng, et al.

2025

Boosting Embodied AI Agents through Perception-Generation Disaggregation and Asynchronous Pipeline Execution

CoRR 2025 · Shulai Zhang, Ao Xu, Quan Chen, Han Zhao, Weihao Cui, Haibin Lin, Ningxin Zheng, et al.

2024

Ladder: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation

OSDI 2024 · Lei Wang, Lingxiao Ma, Shijie Cao, Quanlu Zhang, Jilong Xue, Yining Shi, Ningxin Zheng, et al.

2024

FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion

CoRR 2024 · Li-Wen Chang, Wenlei Bao, Qi Hou, Chengquan Jiang, Yinmin Zhong, Xuanrun Zhang, Ningxin Zheng, et al.

2023

Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning

MLSys 2023 · Bin Lin, Ningxin Zheng, Lei Wang, Shijie Cao, Lingxiao Ma, Quanlu Zhang, et al.

2023

Optimizing Dynamic Neural Networks with Brainstorm

OSDI 2023 · Weihao Cui, Zhenhua Han, Lingji Ouyang, Yichuan Wang, Ningxin Zheng, et al.

2023

PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation

SOSP 2023 · Ningxin Zheng, Huiqiang Jiang, Quanlu Zhang, Zhenhua Han, Lingxiao Ma, et al.

2023

SparDA: Accelerating Dynamic Sparse Deep Neural Networks via Sparse-Dense Transformation

CoRR 2023 · Ningxin Zheng, Huiqiang Jiang, Quanlu Zhang, Zhenhua Han, Yuqing Yang, Lingxiao Ma, et al.

2022

Toward QoS-Awareness and Improved Utilization of Spatial Multitasking GPUs

IEEE Transactions on Computers 2022 · Wei Zhang, Quan Chen, Ningxin Zheng, Weihao Cui, Kaihua Fu, Minyi Guo.

2022

Astraea: towards QoS-aware and resource-efficient multi-stage GPU services

ASPLOS 2022 · Wei Zhang, Quan Chen, Kaihua Fu, Ningxin Zheng, Zhiyi Huang, Jingwen Leng, Minyi Guo.

2022

SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute

OSDI 2022 · Ningxin Zheng, Bin Lin, Quanlu Zhang, Lingxiao Ma, Yuqing Yang, Fan Yang, et al.

2022

QoS-Aware Irregular Collaborative Inference for Improving Throughput of DNN Services

SC 2022 · Kaihua Fu, Jiuchen Shi, Quan Chen, Ningxin Zheng, Wei Zhang, Deze Zeng, Minyi Guo.

2021

nn-Meter: towards accurate latency prediction of deep-learning model inference on diverse edge devices MobiSys 2021 Best Paper Award

MobiSys 2021 · Li Lyna Zhang, Shihao Han, Jianyu Wei, Ningxin Zheng, Ting Cao, Yuqing Yang, Yunxin Liu. Also received all three highest Artifact Evaluation badges.

2021

CHARM: Collaborative Host and Accelerator Resource Management for GPU Datacenters

ICCD 2021 · Wei Zhang, Kaihua Fu, Ningxin Zheng, Quan Chen, Chao Li, Wenli Zheng, Minyi Guo.

2021

Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction

SC 2021 · Weihao Cui, Han Zhao, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, et al.

2020

URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds

ICPP 2020 · Wei Zhang, Ningxin Zheng, Quan Chen, Yong Yang, Zhuo Song, Tao Ma, et al.

2019

POSTER: Precise Capacity Planning for Database Public Clouds

PACT 2019 · Ningxin Zheng, Quan Chen, Yong Yang, Jin Li, Wenli Zheng, Minyi Guo.

2018

CLIBE: Precise Cluster-Level I/O Bandwidth Enforcement in Distributed File System

HPCC/SmartCity/DSS 2018 · Ningxin Zheng, Quan Chen, Chen Chen, Minyi Guo.

2024

Online Streaming Video Super-Resolution With Convolutional Look-Up Table

IEEE Transactions on Image Processing 2024 · Guanghao Yin, Zefan Qu, Xinyang Jiang, Shan Jiang, Zhenhua Han, Ningxin Zheng, et al.

2023

Online Video Super-Resolution With Convolutional Kernel Bypass Grafts

IEEE Transactions on Multimedia 2023 · Jun Xiao, Xinyang Jiang, Ningxin Zheng, Huan Yang, Yifan Yang, Yuqing Yang, et al.

2023

EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention

CVPR 2023 · Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu, Yixuan Yuan.

2023

SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference

ICCV 2023 · Xudong Wang, Li Lyna Zhang, Jiahang Xu, Quanlu Zhang, Yujing Wang, Yuqing Yang, Ningxin Zheng, et al.

2021

Full-Cycle Energy Consumption Benchmark for Low-Carbon Computer Vision

CoRR 2021 · Bo Li, Xinyang Jiang, Donglin Bai, Yuge Zhang, Ningxin Zheng, Xuanyi Dong, et al.

Open source

Selected projects

ByteDance

FLUX

A GPU communication-overlap library for tensor and expert parallelism, built around CUDA/CUTLASS kernels and PyTorch integration.

C++CUDA · GPU · PyTorch

Microsoft

NNI

An open-source AutoML toolkit covering neural architecture search, model compression, hyper-parameter tuning, and ML lifecycle automation.

PythonNAS · Compression · AutoML

ByteDance-Seed

Triton-distributed

A distributed compiler based on Triton for programming parallel AI systems and generating compute-communication overlapping kernels.

PythonTriton · Compiler · Distributed

Contact

Find me online

For publications, source code, and recent work, the links below are the most reliable public entry points.

GitHub Google Scholar DBLP