[Development] MXNet 2.0 Update #18931
Description
Overview
As MXNet development approaches its 2.0 major milestone, we would like to update our community on roadmap status, and highlight new and upcoming features.
Motivation
The deep learning community has largely evolved independently from the data science and machine learning(ML) user base in NumPy. While most deep learning frameworks now implement NumPy-like math and array libraries, they differ in the definition of the APIs which creates confusion and a steeper learning curve of deep learning for ML practitioners and data scientists. This creates a barrier not only in the skillsets of the two different communities, but also hinders the knowledge sharing and code interoperability. MXNet 2.0 seeks to unify the deep learning and machine learning ecosystems.
What's new in version 2.0?
MXNet 2.0 is a major version upgrade of MXNet that provides NumPy-like programming interface, and is integrated with the new, easy-to-use Gluon 2.0 interface. Under the hood, we provide an enhanced DL implementation in NumPy. As a result, NumPy users can easily adopt MXNet. Version 2.0 incorporated accumulative learnings from MXNet 1.x and focuses on usability, extensibility, and developer experiences.
What's coming next?
We plan to make a series of beta releases of MXNet 2.0 in lockstep with downstream projects migration schedule. The first release is tracked in #19139. Also, subscribe to [email protected] for additional announcements.
How do I get started?
As a developer of MXNet, you can check out our main 2.0 branch. MXNet 2.0 nightly builds are available for download.
How can I help?
There are many ways you can contribute:
- By submitting bug reports, you can help us identify issues and fix them.
- If there are issues you would like to help with, let us know in the issue comments and one of the committers will help provide suggestions and pointers.
- If you have a project that you would like to build on top of MXNet 2.0, post an RFC and let the MXNet developers know.
- Looking for ideas to get started with developing MXNet? Check out the good-first-issues labels for Python developers and C++ developers
Highlights
Below are the highlights of new features that are available now in the MXNet 2.0 nightly build.
NumPy-compatible Array and Math Library
NumPy has long been established as the standard array and math library in Python and the MXNet community recognizes significant benefits in bridging the existing NumPy machine learning community and the growing deep learning community. In #14253, the MXNet community reached consensus on moving towards a NumPy-compatible programming experience, and committed to a major effort on providing NumPy compatible array library and operators.
To see what the new programming experience is like, check out Dive into Deep Learning book, the most comprehensive interactive deep learning book with code+math+forum. The latest version has an MXNet implementation with the new MXNet np, the NumPy-compatible math and array interface.
Gluon 2.0
Since the introduction of the Gluon API in MXNet 1.x, it has superseded other MXNet API for model development such as symbolic, module, and model APIs. Conceptually, Gluon was the first attempt in the deep learning community to unify the flexibility of imperative programming with the performance benefits of symbolic programming, through just-in-time compilation.
In Gluon 2.0, we are extending support to MXNet np with simplified interface and new functionalities:
- Simplified hybridization with deferred compute and tracing: Deferred compute allows the imperative execution to be used for graph construction, which allows us to unify the historic divergence of NDArray and Symbol. Hybridization now works in simplified hybrid forward interface; users only need to specify the computation through imperative programming. Hybridization also works through tracing.
- Data 2.0: The new design for data loading in Gluon allows hybridizing and deploy data processing pipeline in the same way as model hybridization. The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
- Distributed 2.0: The new distributed-training design in Gluon 2.0 provides a unified distributed data parallel interface across native Parameter Server, BytePS, and Horovod, and is extensible for supporting custom distributed training libraries.
- Gluon Probability: parameterizable probability distributions and sampling functions to facilitate more areas of research such as Baysian methods and AutoML.
- Gluon Metrics and Optimizers: refactored with MXNet np interface and addressed legacy issues.
3rdparty Plugin Support
Extensibility is important for both academia and industry users who want to develop new, and customized capabilities. In MXNet 2.0, we added the following support for plugging in 3rdparty functionality at runtime.
- C++ custom operators: Enable operators to be implemented in separate libraries and loaded at runtime without re-compiling MXNet and maintaining MXNet fork.
- Custom subgraph property for 3rdparty acceleration libraries: enable dispatching subgraphs to 3rdparty acceleration libraries that are plugged in at runtime.
- Custom graph passes for 3rd party acceleration libraries: enable custom graph modification in C++ to enable fusing params, replacing operators, and other custom optimizations.
Developer Experiences
In MXNet 2.0, we are making development process more efficient in MXNet.
- New CMake build system: improved CMake build system for compiling the most performant MXNet backend library based on the available environment, as well as cross-compilation support.
- Memory profiler: the goal is to provide visibility and insight into the memory consumption of the MXNet backend.
- Pythonic exception type in backend: updated error reporting in MXNet backend that allows directly defining exception types with Python exception classes to enable Pythonic error handling.
Documentation for Developers
We are improving the documentation for MXNet and deep learning developers.
- CWiki for developers: reorganized and improved the development section in MXNet CWiki.
- Developer Guide: new developer guides on how to develop and improve deep learning application with MXNet.
Ecosystem: GluonNLP NumPy
We are refactoring GluonNLP with NumPy interface for the next generation of GluonNLP. The initial version is available on dmlc/gluon-nlp master branch:
- NLP models with NumPy: we support a large number of state-of-the-art backbone networks in GluonNLP including
BERT, ALBERT, ELECTRA, MobileBERT, RoBERTa, XLMR, Transformer, Transformer XL - New Data Processing CLI: Consolidated data processing scripts into one CLI.
API Deprecation
As described in #17676, we are taking this major version upgrade as an opportunity to address the legacy issues in MXNet 1.x. Most notably, we are deprecating the following API:
- Model, Module, Symbol: we are deprecating the legacy modeling and graph construction API in favor of automated graph tracing through deferred compute and Gluon.
- mx.rnn: we are deprecating the symbolic RNN API in favor of the Gluon RNN API.
- NDArray: we are deprecating NDArray and the old nd API in favor of the NumPy-compatible np and npx. The NDArray operators will be provided as an optional feature potentially in a separate repo. This will enable existing users who rely on MXNet 1.x for inference to have an easy upgrade path as old models will continue to work.
- Caffe converter and Torch plugin: both extensions see low usage nowadays. We are extending support in DLPack to better support interoperability with PyTorch and Tensorflow instead.
Related Projects
Below is a list of project trackers for MXNet 2.0.
- MXNet 2.0: the tracking project for MXNet 2.0.
- MXNet NumPy API: the goal is to provide full feature coverage for NumPy in MXNet with auto-differentiation and GPU support.
- MXNet Website 2.0: the revamped MXNet official website for better browsing experiences.
- np interface bug fixes: the goal is to address technical debts and performance issues in np and npx operators.
- CI & CD ops and developer experience improvement: reduce the development overhead by upgrading the CI/CD infrastructure and toolchain to improve the stability of CI/CD and developer efficiency.
- MXNet 2.0 JVM language binding redesign ([RFC] MXNet 2.0 JVM Language development #17783)
@apache/mxnet-committers feel free to comment or directly edit this post for updates in additional areas.
Activity