Zheng, Ningxin (郑宁馨)
About me
I received a B.S. from HuaZhong University of Science and Technology in 2017, and an M.S. from Shanghai Jiao Tong University in 2020, under the guidance of Professors Minyi Guo and Quan Chen. Presently, I contribute to ByteDance's AML team, focusing on enhancing the efficiency and scalability of Large Language Model (LLM) training. My research pursuits encompass AI systems, with an emphasis on LLM training optimization, model deployment (inference), and sparsity; cloud computing, aiming to boost resource utilization through job co-location and data center resource management; and model compression.
Research
System Publications
Lei Wang, Lingxiao Ma, Shijie Cao, Quanlu Zhang, Jilong Xue, Yining Shi, Ningxin Zheng, Ziming Miao, Fan Yang, Ting Cao, Yuqing Yang, Mao Yang, "Bitter: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation", OSDI24
Ningxin Zheng, Huiqiang Jiang, Quanlu Zhang, Zhenhua Han, Lingxiao Ma, Yuqing Yang, Fan Yang, Chengruidong Zhang, Lili Qiu, Mao Yang, Lidong Zhou, "PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation", SOSP23
Weihao Cui, Zhenhua Han, Lingji Ouyang, Yichuan Wang, Ningxin Zheng, Lingxiao Ma, Yuqing Yang, Fan Yang, Jilong Xue, Lili Qiu, Lidong Zhou, Quan Chen, Haisheng Tan, Minyi Guo, "Optimizing Dynamic Neural Networks with Brainstorm", OSDI23
Lei Wang, Lingxiao Ma, Shijie Cao, Ningxin Zheng, Quanlu Zhang, Jilong Xue, Ziming Miao, Ting Cao, Yuqing Yang, "LADDER: Efficient Tensor Compilation on Customized Data Format", OSDI23 POSTER Session
Bin Lin, Ningxin Zheng, Shijie Cao, Lingxiao Ma, Quanlu Zhang, Yi Zhu, Ting Cao, Jilong Xue, Yuqing Yang, Fan Yang, "Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning", Co-first Author MLSys23 [code]
Ningxin Zheng, Bin Lin, Quanlu Zhang, Lingxiao Ma, Yuqing Yang, Fan Yang, Yang Wang, Mao Yang, Lidong Zhou, "SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute", OSDI22 [pdf][code]
Wei Zhang, Quan Chen, Kaihua Fu, Ningxin Zheng, Zhiyi Huang, Jingwen Leng, Minyi Guo, "Astraea: towards QoS-aware and resource-efficient multi-stage GPU services", ASPLOS22, [pdf]
Kaihua Fu, Jiuchen Shi, Quan Chen, Ningxin Zheng, Wei Zhang, Deze Zeng, Minyi Guo, "QoS-Aware Irregular Collaborative Inference for Improving Throughput of DNN Services", SC22, [pdf]
Wei Zhang, Kaihua Fu, Ningxin Zheng, Quan Chen, Chao Li, Wenli Zheng, Minyi Guo, "CHARM: Collaborative Host and Accelerator Resource Management for GPU Datacenters", ICCD21, [pdf]
Weihao Cui, Han Zhao, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, Zhuo Song, Tao Ma, Yong Yang, Chao Li, Minyi Guo, "Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction.", SC21, [pdf]
Li Lyna Zhang, Shihao Han, Jianyu Wei, Ningxin Zheng, Ting Cao, Yuqing Yang, Yunxin Liu, "nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices", MobiSys21, Best Paper Award && SigMobile Research Highlight [pdf][code]
Wei Zhang, Quan Chen, Ningxin Zheng, Weihao Cui, Kaihua Fu, Minyi Guo, "Towards QoS-awareness and Improved Utilization of Spatial Multitasking GPUs", TC21, [pdf]
Wei Zhang, Ningxin Zheng, Quan Chen, Yong Yang, Zhuo Song, Tao Ma, Jingwen Leng, Minyi Guo, "URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds", ICPP20, Co-first Author [pdf]
Ningxin Zheng, Quan Chen, Yong Yang, Jin Li, Wenli Zheng, Minyi Guo, "POSTER:Precise Capacity Planning for Database Public Clouds", PACT19
Ningxin Zheng, Quan Chen, Chen Chen, Minyi Guo, "CLIBE: Precise Cluster-Level I/O Bandwidth Enforcement in Distributed File System", HPCC18
Algorithm Publications
Li Lyna Zhang, Xudong Wang, Jiahang Xu, Quanlu Zhang, Yujing Wang, Yuqing Yang, Ningxin Zheng, Ting Cao, Mao Yang, "SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference", ICCV23 [pdf]
Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu, Yixuan Yuan, "EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention", CVPR23 [pdf]
Jun Xiao, Xinyang Jiang, Ningxin Zheng, Huan Yang, Yifan Yang, Yuqing Yang, Dongsheng Li, Kin-Man Lam, "Online Video Super-Resolution with Convolutional Kernel Bypass Graft", IEEE Transaction on Multimedia 22 [pdf]
Full list of publications in Google Scholar.
Projects
NNI
NNI is a very popular Deep learning framework (over 10k stars) including Neural Architecture Search(NAS), Model Compression, Hyperparameter Tuning, and Feature engineering. As the DNN models grow significantly, they are inevitably becoming sparse. Model Compression is an essential step before model deployment. As a core contributor, I designed and developed the automatic deployment process of the compressed model("Speedup" Module in NNI). "Speedup" can infer the sparsity of the whole model and generate the corresponding optimized faster model automatically. It simplifies the model deployment progress.
SparTA
SparTA is an extensible sparse framework based on Pytorch that supports different kinds of sparsity scenarios. It is the open-source implementation of our OSDI paper(SparTA). It contains lots of easy-to-use sparse modules that can be easily used in many scenarios such as large model training and sparse model inference. Compared to other sparse libraries, SparTA has achieved better performance and covered more application scenarios.
Performance Optimization for High frequency trading system
China foreign exchange trade system(CFETS) receives a large number of transaction requests every second, therefore it has extremely high requirements for performance. Constrained by the complex transaction logic, it is difficult to improve system throughput through task parallelism. In order to improve the performance of the system, we analyzed the performance bottleneck of the system through "Perf", split the transaction logic into three parts, and parallelized them in the pipeline. Finally, the end-to-end throughput improves by around 30%.
CLIBE
Patent
Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds, Alibaba
SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute, Microsoft
Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation
Education
Master degree, Computer Science and Technology, Shanghai Jiao Tong University, 09.2017~03.2020
Bachelor degree, Computer Science and Technology, Huazhong University of Science and Technology, 09.2013~06.2017
Competitions and Awards
Mobisys Best Paper, 2021
SigMobile Research Highlight, 2021
Outstanding Graduate of Shanghai Jiao Tong University, 2020
Scholarship of DongShi DongFang of Shanghai Jiao Tong University, 2019
Bronze medal of Intel Parallel performance Optimization Competition, 2017
First-Class Scholarship of Shanghai Jiao Tong University, 2017
Outstanding student of Huazhong University of Science and Technology, 2015
Work experience
ByteDance AML
Microsoft Research
Alibaba Cloud
|