There are two typical ways to implement autograd, via symbolic differentiation like Theano or reverse differentiation like Pytorch. SINGA follows Pytorch way, which records the computation graph and apply the backward propagation automatically after forward propagation. The autograd algorithm is explained in details here. We explain the relevant modules in Singa and give an example to illustrate the usage.
There are three classes involved in autograd, namely
singa.autograd.Layer. In the rest of this
article, we use tensor, operation and layer to refer to an instance of the
Three attributes of Tensor are used by autograd,
Operationinstance. It records the operation that generates the Tensor instance.
.requires_gradis a boolean variable. It is used to indicate that the autograd algorithm needs to compute the gradient of the tensor (i.e., the owner). For example, during backpropagation, the gradients of the tensors for the weight matrix of a linear layer and the feature maps of a convolution layer (not the bottom layer) should be computed.
.stores_gradis a boolean variable. It is used to indicate that the gradient of the owner tensor should be stored and output by the backward function. For example, the gradient of the feature maps is computed during backpropagation, but is not included in the output of the backward function.
Programmers can change
stores_grad of a Tensor instance.
For example, if later is set to True, the corresponding gradient is included in
the output of the backward function. It should be noted that if
requires_grad must be true, not vice versa.
It takes one or more
Tensor instances as input, and then outputs one or more
Tensor instances. For example, ReLU can be implemented as a specific Operation
subclass. When an
Operation instance is called (after instantiation), the
following two steps are executed:
- record the source operations, i.e., the
creators of the input tensors.
- do calculation by calling member function
There are two member functions for forwarding and backwarding, i.e.,
.backward(). They take
Tensor.data as inputs (the type is
CTensor), and output
Ctensors. To add a specific operation, subclass
operation should implement their own
backward() function is called by the
backward() function of autograd
automatically during backward propogation to compute the gradients of inputs
(according to the
For those operations that require parameters, we package them into a new class,
Layer. For example, convolution operation is wrapped into a convolution layer.
Layer manages (stores) the parameters and calls the corresponding
to implement the transformation.
Multiple examples are provided in the example folder. We explain two representative examples here.
The following codes implement a MLP model using only Operation instances (no Layer instances).
from singa.tensor import Tensor from singa import autograd from singa import opt
Create weight matrix and bias vector
The parameter tensors are created with both
w0 = Tensor(shape=(2, 3), requires_grad=True, stores_grad=True) w0.gaussian(0.0, 0.1) b0 = Tensor(shape=(1, 3), requires_grad=True, stores_grad=True) b0.set_value(0.0) w1 = Tensor(shape=(3, 2), requires_grad=True, stores_grad=True) w1.gaussian(0.0, 0.1) b1 = Tensor(shape=(1, 2), requires_grad=True, stores_grad=True) b1.set_value(0.0)
inputs = Tensor(data=data) # data matrix target = Tensor(data=label) # label vector autograd.training = True # for training sgd = opt.SGD(0.05) # optimizer for i in range(10): x = autograd.matmul(inputs, w0) # matrix multiplication x = autograd.add_bias(x, b0) # add the bias vector x = autograd.relu(x) # ReLU activation operation x = autograd.matmul(x, w1) x = autograd.add_bias(x, b1) loss = autograd.softmax_cross_entropy(x, target) for p, g in autograd.backward(loss): sgd.update(p, g)
Operation + Layer
The following example implements a CNN model using layers provided by the autograd module.
Create the layers
conv1 = autograd.Conv2d(1, 32, 3, padding=1, bias=False) bn1 = autograd.BatchNorm2d(32) pooling1 = autograd.MaxPool2d(3, 1, padding=1) conv21 = autograd.Conv2d(32, 16, 3, padding=1) conv22 = autograd.Conv2d(32, 16, 3, padding=1) bn2 = autograd.BatchNorm2d(32) linear = autograd.Linear(32 * 28 * 28, 10) pooling2 = autograd.AvgPool2d(3, 1, padding=1)
Define the forward function
The operations in the forward pass will be recorded automatically for backward propagation.
def forward(x, t): # x is the input data (a batch of images) # t the the label vector (a batch of integers) y = conv1(x) # Conv layer y = autograd.relu(y) # ReLU operation y = bn1(y) # BN layer y = pooling1(y) # Pooling Layer # two parallel convolution layers y1 = conv21(y) y2 = conv22(y) y = autograd.cat((y1, y2), 1) # cat operation y = autograd.relu(y) # ReLU operation y = bn2(y) y = pooling2(y) y = autograd.flatten(y) # flatten operation y = linear(y) # Linear layer loss = autograd.softmax_cross_entropy(y, t) # operation return loss, y
autograd.training = True for epoch in range(epochs): for i in range(batch_number): inputs = tensor.Tensor(device=dev, data=x_train[ i * batch_sz:(1 + i) * batch_sz], stores_grad=False) targets = tensor.Tensor(device=dev, data=y_train[ i * batch_sz:(1 + i) * batch_sz], requires_grad=False, stores_grad=False) loss, y = forward(inputs, targets) # forward the net for p, gp in autograd.backward(loss): # auto backward sgd.update(p, gp)
Using the Module API
The following example implements a CNN model using the Module provided by the module.
Define the subclass of Module
Define the model class, it should be the subclass of the Module. In this way, all operations used during traing phase will form a calculation graph and will be analyzed. The operations in the graph will be scheduled and executed efficiently. Layers can also be included in the module class.
class MLP(module.Module): # the model is a subclass of Module def __init__(self, optimizer): super(MLP, self).__init__() # init the operators, layers and other objects self.w0 = Tensor(shape=(2, 3), requires_grad=True, stores_grad=True) self.w0.gaussian(0.0, 0.1) self.b0 = Tensor(shape=(3,), requires_grad=True, stores_grad=True) self.b0.set_value(0.0) self.w1 = Tensor(shape=(3, 2), requires_grad=True, stores_grad=True) self.w1.gaussian(0.0, 0.1) self.b1 = Tensor(shape=(2,), requires_grad=True, stores_grad=True) self.b1.set_value(0.0) # init the optimizer self.optimizer = optimizer def forward(self, inputs): # define the forward function x = autograd.matmul(inputs, self.w0) x = autograd.add_bias(x, self.b0) x = autograd.relu(x) x = autograd.matmul(x, self.w1) x = autograd.add_bias(x, self.b1) return x def loss(self, out, target): # define the loss function # can use the loss operations provided by SINGA or self-defined function return autograd.softmax_cross_entropy(out, target) def optim(self, loss): # define the optim function # can use the optimizer provided by SINGA or self-defined function return self.optimizer.backward_and_update(loss)
# create a model instance model = MLP(sgd) # declare what device to train on model.on_device(dev) # declare execution mode and order model.graph(graph, sequential) for i in range(niters): out = model(inputs) loss = model.loss(out, target) model.optim(loss) if i % (niters / 10) == 0 and rank_in_global == 0: print("training loss = ", tensor.to_numpy(loss), flush=True)
Refer here for more details of Python API.