Zheng, Ningxin (郑宁馨)

alt text 

ByteDance AML,
Caohejing, Xuhui District,
Shanghai, China
E-mail: NingxinZheng@sjtu.edu.cn

About me

I received a B.S. from HuaZhong University of Science and Technology in 2017, and an M.S. from Shanghai Jiao Tong University in 2020, under the guidance of Professors Minyi Guo and Quan Chen. Presently, I contribute to ByteDance's AML team, focusing on enhancing the efficiency and scalability of Large Language Model (LLM) training. My research pursuits encompass AI systems, with an emphasis on LLM training optimization, model deployment (inference), and sparsity; cloud computing, aiming to boost resource utilization through job co-location and data center resource management; and model compression.

Research

System Publications

  1. Lei Wang, Lingxiao Ma, Shijie Cao, Quanlu Zhang, Jilong Xue, Yining Shi, Ningxin Zheng, Ziming Miao, Fan Yang, Ting Cao, Yuqing Yang, Mao Yang, "Bitter: Enabling Efficient Low-Precision Deep Learning Computing through Hardware-aware Tensor Transformation", OSDI24

  2. Ningxin Zheng, Huiqiang Jiang, Quanlu Zhang, Zhenhua Han, Lingxiao Ma, Yuqing Yang, Fan Yang, Chengruidong Zhang, Lili Qiu, Mao Yang, Lidong Zhou, "PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation", SOSP23

  3. Weihao Cui, Zhenhua Han, Lingji Ouyang, Yichuan Wang, Ningxin Zheng, Lingxiao Ma, Yuqing Yang, Fan Yang, Jilong Xue, Lili Qiu, Lidong Zhou, Quan Chen, Haisheng Tan, Minyi Guo, "Optimizing Dynamic Neural Networks with Brainstorm", OSDI23

  4. Lei Wang, Lingxiao Ma, Shijie Cao, Ningxin Zheng, Quanlu Zhang, Jilong Xue, Ziming Miao, Ting Cao, Yuqing Yang, "LADDER: Efficient Tensor Compilation on Customized Data Format", OSDI23 POSTER Session

  5. Bin Lin, Ningxin Zheng, Shijie Cao, Lingxiao Ma, Quanlu Zhang, Yi Zhu, Ting Cao, Jilong Xue, Yuqing Yang, Fan Yang, "Efficient GPU Kernels for N:M-Sparse Weights in Deep Learning", Co-first Author MLSys23 [code]

  6. Ningxin Zheng, Bin Lin, Quanlu Zhang, Lingxiao Ma, Yuqing Yang, Fan Yang, Yang Wang, Mao Yang, Lidong Zhou, "SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute", OSDI22 [pdf][code]

  7. Wei Zhang, Quan Chen, Kaihua Fu, Ningxin Zheng, Zhiyi Huang, Jingwen Leng, Minyi Guo, "Astraea: towards QoS-aware and resource-efficient multi-stage GPU services", ASPLOS22, [pdf]

  8. Kaihua Fu, Jiuchen Shi, Quan Chen, Ningxin Zheng, Wei Zhang, Deze Zeng, Minyi Guo, "QoS-Aware Irregular Collaborative Inference for Improving Throughput of DNN Services", SC22, [pdf]

  9. Wei Zhang, Kaihua Fu, Ningxin Zheng, Quan Chen, Chao Li, Wenli Zheng, Minyi Guo, "CHARM: Collaborative Host and Accelerator Resource Management for GPU Datacenters", ICCD21, [pdf]

  10. Weihao Cui, Han Zhao, Quan Chen, Ningxin Zheng, Jingwen Leng, Jieru Zhao, Zhuo Song, Tao Ma, Yong Yang, Chao Li, Minyi Guo, "Enable simultaneous DNN services based on deterministic operator overlap and precise latency prediction.", SC21, [pdf]

  11. Li Lyna Zhang, Shihao Han, Jianyu Wei, Ningxin Zheng, Ting Cao, Yuqing Yang, Yunxin Liu, "nn-Meter: Towards Accurate Latency Prediction of Deep-Learning Model Inference on Diverse Edge Devices", MobiSys21, Best Paper Award && SigMobile Research Highlight [pdf][code]

  12. Wei Zhang, Quan Chen, Ningxin Zheng, Weihao Cui, Kaihua Fu, Minyi Guo, "Towards QoS-awareness and Improved Utilization of Spatial Multitasking GPUs", TC21, [pdf]

  13. Wei Zhang, Ningxin Zheng, Quan Chen, Yong Yang, Zhuo Song, Tao Ma, Jingwen Leng, Minyi Guo, "URSA: Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds", ICPP20, Co-first Author [pdf]

  14. Ningxin Zheng, Quan Chen, Yong Yang, Jin Li, Wenli Zheng, Minyi Guo, "POSTER:Precise Capacity Planning for Database Public Clouds", PACT19

  15. Ningxin Zheng, Quan Chen, Chen Chen, Minyi Guo, "CLIBE: Precise Cluster-Level I/O Bandwidth Enforcement in Distributed File System", HPCC18

Algorithm Publications

  1. Li Lyna Zhang, Xudong Wang, Jiahang Xu, Quanlu Zhang, Yujing Wang, Yuqing Yang, Ningxin Zheng, Ting Cao, Mao Yang, "SpaceEvo: Hardware-Friendly Search Space Design for Efficient INT8 Inference", ICCV23 [pdf]

  2. Xinyu Liu, Houwen Peng, Ningxin Zheng, Yuqing Yang, Han Hu, Yixuan Yuan, "EfficientViT: Memory Efficient Vision Transformer with Cascaded Group Attention", CVPR23 [pdf]

  3. Jun Xiao, Xinyang Jiang, Ningxin Zheng, Huan Yang, Yifan Yang, Yuqing Yang, Dongsheng Li, Kin-Man Lam, "Online Video Super-Resolution with Convolutional Kernel Bypass Graft", IEEE Transaction on Multimedia 22 [pdf]

Full list of publications in Google Scholar.

Projects

  1. NNI

    • NNI is a very popular Deep learning framework (over 10k stars) including Neural Architecture Search(NAS), Model Compression, Hyperparameter Tuning, and Feature engineering. As the DNN models grow significantly, they are inevitably becoming sparse. Model Compression is an essential step before model deployment. As a core contributor, I designed and developed the automatic deployment process of the compressed model("Speedup" Module in NNI). "Speedup" can infer the sparsity of the whole model and generate the corresponding optimized faster model automatically. It simplifies the model deployment progress.

  2. SparTA

    • SparTA is an extensible sparse framework based on Pytorch that supports different kinds of sparsity scenarios. It is the open-source implementation of our OSDI paper(SparTA). It contains lots of easy-to-use sparse modules that can be easily used in many scenarios such as large model training and sparse model inference. Compared to other sparse libraries, SparTA has achieved better performance and covered more application scenarios.

  3. Performance Optimization for High frequency trading system

    • China foreign exchange trade system(CFETS) receives a large number of transaction requests every second, therefore it has extremely high requirements for performance. Constrained by the complex transaction logic, it is difficult to improve system throughput through task parallelism. In order to improve the performance of the system, we analyzed the performance bottleneck of the system through "Perf", split the transaction logic into three parts, and parallelized them in the pipeline. Finally, the end-to-end throughput improves by around 30%.

  4. CLIBE

    • CLIBE provides a precise cluster-level I/O bandwidth enforcement mechanism for distributed file systems. The big data file system is widely used in different scenarios and such distributed file system(DFS) is usually shared by multiple tenants/jobs. Such sharing may lead to uncontrollable I/O bandwidth interference. The Quality-of-Service(QoS) of high-priority jobs may be violated due to I/O bandwidth interference. CLIBE allows the user to allocate a cluter-level I/O bandwidth quota for each jod and ensures that the I/O bandwidth consumed by the target task in the entire cluster is lower than the allocated quota.

Patent

Precise Capacity Planning and Fair Scheduling based on Low-level Statistics for Public Clouds, Alibaba

SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute, Microsoft

Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation

Education

Master degree, Computer Science and Technology, Shanghai Jiao Tong University, 09.2017~03.2020

Bachelor degree, Computer Science and Technology, Huazhong University of Science and Technology, 09.2013~06.2017

Competitions and Awards

  1. Mobisys Best Paper, 2021

  2. SigMobile Research Highlight, 2021

  3. Outstanding Graduate of Shanghai Jiao Tong University, 2020

  4. Scholarship of DongShi DongFang of Shanghai Jiao Tong University, 2019

  5. Bronze medal of Intel Parallel performance Optimization Competition, 2017

  6. First-Class Scholarship of Shanghai Jiao Tong University, 2017

  7. Outstanding student of Huazhong University of Science and Technology, 2015

Work experience

  1. ByteDance AML

    • 2023.09~Currnet

  2. Microsoft Research

    • Research Software Development Engineer, Shanghai System Group, 2020.03~2023.9

  3. Alibaba Cloud

    • Software Developer Intern, Linux Kernel Group, 2018.03~2019.07