Autograd · Apache SINGA

实现 autograd 有两种典型的方式，一种是通过如Theano的符号微分（symbolic differentiation）或通过如Pytorch的反向微分（reverse differentialtion）。SINGA 遵循 Pytorch 方式，即通过记录计算图，并在正向传播后自动应用反向传播。自动传播算法的详细解释请参阅这里。我们接下来对 SINGA 中的相关模块进行解释，并举例说明其使用方法。

样例

在example folder中提供了很多样例。在这里我我们分析两个最具代表性的例子。

只使用 Operation

下一段代码展示了一个只使用Operation的多层感知机（MLP）模型：

调用依赖包

from singa.tensor import Tensor
from singa import autograd
from singa import opt

创建权重矩阵和偏置向量

在将requires_grad和stores_grad都设置为True的情况下，创建参数张量。

w0 = Tensor(shape=(2, 3), requires_grad=True, stores_grad=True)
w0.gaussian(0.0, 0.1)
b0 = Tensor(shape=(1, 3), requires_grad=True, stores_grad=True)
b0.set_value(0.0)

w1 = Tensor(shape=(3, 2), requires_grad=True, stores_grad=True)
w1.gaussian(0.0, 0.1)
b1 = Tensor(shape=(1, 2), requires_grad=True, stores_grad=True)
b1.set_value(0.0)

训练

inputs = Tensor(data=data)  # data matrix
target = Tensor(data=label) # label vector
autograd.training = True    # for training
sgd = opt.SGD(0.05)   # optimizer

for i in range(10):
    x = autograd.matmul(inputs, w0) # matrix multiplication
    x = autograd.add_bias(x, b0)    # add the bias vector
    x = autograd.relu(x)            # ReLU activation operation

    x = autograd.matmul(x, w1)
    x = autograd.add_bias(x, b1)

    loss = autograd.softmax_cross_entropy(x, target)

    for p, g in autograd.backward(loss):
        sgd.update(p, g)

使用 Operation 和 Layer

下面的例子使用 autograd 模块提供的层实现了一个 CNN 模型。

创建层

conv1 = autograd.Conv2d(1, 32, 3, padding=1, bias=False)
bn1 = autograd.BatchNorm2d(32)
pooling1 = autograd.MaxPool2d(3, 1, padding=1)
conv21 = autograd.Conv2d(32, 16, 3, padding=1)
conv22 = autograd.Conv2d(32, 16, 3, padding=1)
bn2 = autograd.BatchNorm2d(32)
linear = autograd.Linear(32 * 28 * 28, 10)
pooling2 = autograd.AvgPool2d(3, 1, padding=1)

定义正向传播函数

在正向传播中的 operations 会被自动记录，用于反向传播。

def forward(x, t):
    # x is the input data (a batch of images)
    # t is the label vector (a batch of integers)
    y = conv1(x)           # Conv layer
    y = autograd.relu(y)   # ReLU operation
    y = bn1(y)             # BN layer
    y = pooling1(y)        # Pooling Layer

    # two parallel convolution layers
    y1 = conv21(y)
    y2 = conv22(y)
    y = autograd.cat((y1, y2), 1)  # cat operation
    y = autograd.relu(y)           # ReLU operation
    y = bn2(y)
    y = pooling2(y)

    y = autograd.flatten(y)        # flatten operation
    y = linear(y)                  # Linear layer
    loss = autograd.softmax_cross_entropy(y, t)  # operation
    return loss, y

训练

autograd.training = True
for epoch in range(epochs):
    for i in range(batch_number):
        inputs = tensor.Tensor(device=dev, data=x_train[
                               i * batch_sz:(1 + i) * batch_sz], stores_grad=False)
        targets = tensor.Tensor(device=dev, data=y_train[
                                i * batch_sz:(1 + i) * batch_sz], requires_grad=False, stores_grad=False)

        loss, y = forward(inputs, targets) # forward the net

        for p, gp in autograd.backward(loss):  # auto backward
            sgd.update(p, gp)

Using the Model API

下面的样例使用Model API实现了一个 CNN 模型。.

定义 Model 的子类

定义模型类，它应该是 Model 的子类。只有这样，在训练阶段使用的所有操作才会形成一个计算图以便进行分析。图中的操作将被按时序规划并有效执行，模型类中也可以包含层。

class MLP(model.Model):  # the model is a subclass of Model

    def __init__(self, data_size=10, perceptron_size=100, num_classes=10):
        super(MLP, self).__init__()

        # init the operators, layers and other objects
        self.relu = layer.ReLU()
        self.linear1 = layer.Linear(perceptron_size)
        self.linear2 = layer.Linear(num_classes)
        self.softmax_cross_entropy = layer.SoftMaxCrossEntropy()

    def forward(self, inputs):  # define the forward function
        y = self.linear1(inputs)
        y = self.relu(y)
        y = self.linear2(y)
        return y

    def train_one_batch(self, x, y):
        out = self.forward(x)
        loss = self.softmax_cross_entropy(out, y)
        self.optimizer(loss)
        return out, loss

    def set_optimizer(self, optimizer):  # attach an optimizer
        self.optimizer = optimizer

训练

# create a model instance
model = MLP()
# initialize optimizer and attach it to the model
sgd = opt.SGD(lr=0.005, momentum=0.9, weight_decay=1e-5)
model.set_optimizer(sgd)
# input and target placeholders for the model
tx = tensor.Tensor((batch_size, 1, IMG_SIZE, IMG_SIZE), dev, tensor.float32)
ty = tensor.Tensor((batch_size, num_classes), dev, tensor.int32)
# compile the model before training
model.compile([tx], is_train=True, use_graph=True, sequential=False)

# train the model iteratively
for b in range(num_train_batch):
    # generate the next mini-batch
    x, y = ...

    # Copy the data into input tensors
    tx.copy_from_numpy(x)
    ty.copy_from_numpy(y)

    # Training with one batch
    out, loss = model(tx, ty)

保存模型 checkpoint

# define the path to save the checkpoint
checkpointpath="checkpoint.zip"

# save a checkpoint
model.save_states(fpath=checkpointpath)

加载模型 checkpoint

# define the path to load the checkpoint
checkpointpath="checkpoint.zip"

# load a checkpoint
import os
if os.path.exists(checkpointpath):
    model.load_states(fpath=checkpointpath)

Python API

关于 Python API 的更多细节，请参考这里。