Getting Started
Tensors are similar to numpy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.
1  from __future__ import print_function 
Construct a uninitialized matrix or a randomly initialized matrix
1  x = torch.Tensor(5, 3) 
Get its size
1  print(x.size()) 
Addition operation
1  x+y 
Note: Any operation that mutates a tensor inplace is postfixed with an _
For example: x.copy_(y), x.t_()
, will change x
.
More Tensor operations in documentation
Numpy Bridge
The torch Tensor and numpy array will share their underlying memory locations, and changing one will change the other.
Converting torch Tensor to numpy Array
1  a = torch.ones(5) 
The result of a.add_(1)
will be both a
and b
are changed because they share their underlying memory locations.
Converting numpy Array to torch Tensor
1  import numpy as np 
CUDA Tensors
Tensors can be moved onto GPU using the .cuda
function.
1  # let us run this cell only if CUDA is available 
Autograd: automatic differentiation
The autograd
package provides automatic differentiation for all operations on Tensors. It is a definebyrun framework, which means that your backprop is defined by how your code is run, and that every single iteration can be different.
Note that all the following codes are based on from torch.autograd import Variable
Variable
autograd.Variable
is the central class of the package. It wraps a Tensor, and supports nearly all of operations defined on it. Once you finish your computation you can call .backward()
and have all the gradients computed automatically.
You can access the raw tensor through the .data
attribute, while the gradient w.r.t. this variable is accumulated into .grad
.
There’s one more class which is very important for autograd implementation  a Function
.
Variable
and Function
are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each variable has a .creator
attribute that references a Function
that has created the Variable (except for Variables
created by the user  their creator
is None
).
If you want to compute the derivatives, you can call .backward()
on a Variable
. If Variable
is a scalar (i.e. it holds a one element data), you don’t need to specify any arguments to backward(), however if it has more elements, you need to specify a grad_output
argument that is a tensor of matching shape.
1  import torch 
Gradients
1  x = Variable(torch.ones(2, 2), requires_grad=True) 
Now out.backward()
is equivalent to doing out.backward(torch.Tensor([1.0]))
1  out.backward() 
It will print print gradients d(out)/dx
You can do many crazy things with autograd!
1  x = torch.randn(3) 
Documentation of Variable and Function is at here
Neural Networks
Neural networks can be constructed using the torch.nn
package.
Now that you had a glimpse of autograd
, nn
depends on autograd to define models and differentiate them.
An nn.Module
contains layers, and a method forward(input)that returns the output.
It is a simple feedforward network. It takes the input, feeds it through several layers one after the other, and then finally gives the output.
A typical training procedure for a neural network is as follows:
 Define the neural network that has some learnable parameters (or weights)
 Iterate over a dataset of inputs
 Process input through the network
 Compute the loss (how far is the output from being correct)
 Propagate gradients back into the network’s parameters
 Update the weights of the network, typically using a simple update rule:
weight = weight + learning_rate * gradient
Define the network
1  import torch 
Result:
1 

You just have to define the forward function, and the backward function (where gradients are computed) is automatically defined for you using autograd. You can use any of the Tensor operations in the forward function.
The learnable parameters of a model are returned by net.parameters()
1  params = list(net.parameters()) 
Out:
1  10 
The input to the forward is an autograd.Variable, and so is the output.
1  input = Variable(torch.randn(1, 1, 32, 32)) 
Zero the gradient buffers of all parameters and backprops with random gradients:
1  net.zero_grad() 
Before proceeding further, let’s recap all the classes you’ve seen so far.
Recap:
torch.Tensor
 A multidimensional array.autograd.Variable
 Wraps a Tensor and records the history of operations applied to it. Has the same API as aTensor
, with some additions likebackward()
. Also holds the gradient w.r.t. the tensor.nn.Module
 Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.nn.Parameter
 A kind of Variable, that is automatically registered as a parameter when assigned as an attribute to a Module.autograd.Function
 Implements forward and backward definitions of an autograd operation. Every Variable operation, creates at least a single Function node, that connects to functions that created a Variable and encodes its history*.
At this point, we covered:
 Defining a neural network
 Processing inputs and calling backward.
Still Left:
 Computing the loss
 Updating the weights of the network