Hello World

PyTorch Tutorial

Getting Started

Tensors are similar to numpy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.

1
2
from __future__ import print_function
import torch

Construct a uninitialized matrix or a randomly initialized matrix

1
2
x = torch.Tensor(5, 3)
x = torch.rand(5, 3)

Get its size

1
print(x.size())

Addition operation

1
2
3
4
x+y
torch.add(x,y)
torch.add(x,y,out=result)
y.add_(x)

Note: Any operation that mutates a tensor in-place is post-fixed with an _For example: x.copy_(y), x.t_(), will change x.

More Tensor operations in documentation

Numpy Bridge

The torch Tensor and numpy array will share their underlying memory locations, and changing one will change the other.

Converting torch Tensor to numpy Array

1
2
3
a = torch.ones(5)
b = a.numpy()
a.add_(1)

The result of a.add_(1) will be both a and b are changed because they share their underlying memory locations.

Converting numpy Array to torch Tensor

1
2
3
4
5
6
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

CUDA Tensors

Tensors can be moved onto GPU using the .cuda function.

1
2
3
4
5
# let us run this cell only if CUDA is available
if torch.cuda.is_available():
x = x.cuda()
y = y.cuda()
x + y

Autograd: automatic differentiation

The autograd package provides automatic differentiation for all operations on Tensors. It is a define-by-run framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.
Note that all the following codes are based on from torch.autograd import Variable

Variable

autograd.Variable is the central class of the package. It wraps a Tensor, and supports nearly all of operations defined on it. Once you finish your computation you can call .backward() and have all the gradients computed automatically.

You can access the raw tensor through the .data attribute, while the gradient w.r.t. this variable is accumulated into .grad.

There’s one more class which is very important for autograd implementation - a Function.

Variable and Function are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each variable has a .creator attribute that references a Function that has created the Variable (except for Variables created by the user - their creator is None).

If you want to compute the derivatives, you can call .backward() on a Variable. If Variable is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a grad_output argument that is a tensor of matching shape.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import torch
from torch.autograd import Variable

# Create a variable
x = Variable(torch.ones(2, 2), requires_grad=True)
print(x)

# Addition operation
y = x + 2
print(y)
print(y.creator)
# result: <torch.autograd._functions.basic_ops.AddConstant object at 0x7f53f1df7888>

# More operations
z = y * y * 3
out = z.mean()
print(z, out)

Gradients

1
2
3
4
x = Variable(torch.ones(2, 2), requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()

Now out.backward() is equivalent to doing out.backward(torch.Tensor([1.0]))

1
2
out.backward()
print(x.grad)

It will print print gradients d(out)/dx

You can do many crazy things with autograd!

1
2
3
4
5
6
7
8
9
10
11
12
13
x = torch.randn(3)
x = Variable(x, requires_grad=True)

y = x * 2
while y.data.norm() < 1000:
y = y * 2

print(y)

gradients = torch.FloatTensor([0.1, 1.0, 0.0001])
y.backward(gradients)

print(x.grad)

Documentation of Variable and Function is at here

Neural Networks

Neural networks can be constructed using the torch.nn package.

Now that you had a glimpse of autograd, nn depends on autograd to define models and differentiate them.

An nn.Module contains layers, and a method forward(input)that returns the output.

"neural network structure"
It is a simple feed-forward network. It takes the input, feeds it through several layers one after the other, and then finally gives the output.

A typical training procedure for a neural network is as follows:

  1. Define the neural network that has some learnable parameters (or weights)
  2. Iterate over a dataset of inputs
  3. Process input through the network
  4. Compute the loss (how far is the output from being correct)
  5. Propagate gradients back into the network’s parameters
  6. Update the weights of the network, typically using a simple update rule: weight = weight + learning_rate * gradient

Define the network

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
import torch
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 5x5 square convolution kernel
self.conv1 = nn.Conv2d(1, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120) # an affine operation: y = Wx + b
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)

def forward(self, x):
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv2(x)), 2) # If the size is a square you can only specify a single number
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x

def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features

net = Net()
print(net)

Result:

1
2
3
4
5
6
7
8

Net (
(conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
(fc1): Linear (400 -> 120)
(fc2): Linear (120 -> 84)
(fc3): Linear (84 -> 10)
)

You just have to define the forward function, and the backward function (where gradients are computed) is automatically defined for you using autograd. You can use any of the Tensor operations in the forward function.

The learnable parameters of a model are returned by net.parameters()

1
2
3
params = list(net.parameters())
print(len(params))
print(params[0].size()) # conv1's .weight

Out:

1
2
10
torch.Size([6, 1, 5, 5])

The input to the forward is an autograd.Variable, and so is the output.

1
2
input = Variable(torch.randn(1, 1, 32, 32))
out = net(input)

Zero the gradient buffers of all parameters and backprops with random gradients:

1
2
net.zero_grad()
out.backward(torch.randn(1, 10))

Before proceeding further, let’s recap all the classes you’ve seen so far.

Recap:

  • torch.Tensor - A multi-dimensional array.
  • autograd.Variable - Wraps a Tensor and records the history of operations applied to it. Has the same API as a Tensor, with some additions like backward(). Also holds the gradient w.r.t. the tensor.
  • nn.Module - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.
  • nn.Parameter - A kind of Variable, that is automatically registered as a parameter when assigned as an attribute to a Module.
  • autograd.Function - Implements forward and backward definitions of an autograd operation. Every Variable operation, creates at least a single Function node, that connects to functions that created a Variable and encodes its history*.

At this point, we covered:

  • Defining a neural network
  • Processing inputs and calling backward.

Still Left:

  • Computing the loss
  • Updating the weights of the network