singa-3.0.0.rc1 Release Notes

Release Notes - SINGA - Version singa-3.0.0.rc1

SINGA is a distributed deep learning library.

This release includes following changes:

Code quality has been promoted by introducing linting check in CI and auto code formatter. For linting, the tools, cpplint and pylint, are used and configured to comply google coding styles details in tool/linting/. Similarly, formatting tools, clang-format and yapf configured with google coding styles, are the recommended one for developers to clean code before submitting changes, details in tool/code-format/. LGTM is enabled on Github for code quality check; License check is also enabled.
New Tensor APIs are added for naming consistency, and feature enhancement:
- size(), mem_size(), get_value(), to_proto(), l1(), l2(): added for the sake of naming consistency
- AsType(): convert data type between float and int
- ceil(): perform element-wise ceiling of the input
- concat(): concatenate two tensor
- index selector: e.g. tensor1[:,:,1:,1:]
- softmax(in, axis): allow to perform softmax on a axis on a multi-dimensional tensor
14 new operators are added into the autograd module: Gemm, GlobalAveragePool, ConstantOfShape, Dropout, ReduceSum, ReduceMean, Slice, Ceil, Split, Gather, Tile, NonZero, Cast, OneHot. Their unit tests are added as well.
14 new operators are added to sonnx module for both backend and frontend: Gemm, GlobalAveragePool, ConstantOfShape, Dropout, ReduceSum, ReduceMean, Slice, Ceil, Split, Gather, Tile, NonZero, Cast, OneHot. Their tests are added as well.
Some ONNX models are imported into SINGA, including Bert-squad, Arcface, FER+ Emotion, MobileNet, ResNet18, Tiny Yolov2, Vgg16, and Mnist.
Some operators now support multidirectional broadcasting, including Add, Sub, Mul, Div, Pow, PRelu, Gemm
[Distributed training with communication optimization]. DistOpt has implemented multiple optimization techniques, including gradient sparsification, chunk transmission, and gradient compression.
Computational graph construction at the CPP level. The operations submitted to the Device are buffered. After analyzing the dependency, the computational graph is created, which is further analyzed for speed and memory optimization. To enable this feature, use the Module API.
New website based on Docusaurus. The documentation files are moved to a separate repo singa-doc. The static website files are stored at singa-site.
DNNL(Deep Neural Network Library), powered by Intel, is integrated into model/operations/[batchnorm|pooling|convolution], the changes is opaque to the end users. The current version is dnnl v1.1 which replaced previous integration of mkl-dnn v0.18. The framework could boost the performance of dl operations when executing on CPU. The dnnl dependency is installed through conda.
Some Tensor APIs are marked as deprecated which could be replaced by broadcast, and it can support better on multi-dimensional operations. These APIs are add_column(), add_row(), div_column(), div_row(), mult_column(), mult_row()
Conv and Pooling are enhanced to support fine-grained padding like (2,3,2,3), and SAME_UPPER, SAME_LOWER pad mode and shape checking.
Reconstruct soonx,
- Support two types of weight value (Initializer and Constant Node);
- For some operators (BatchNorm, Reshape, Clip, Slice, Gather, Tile, OneHot), move some inputs to its attributes;
- Define and implement the type conversion map.