## Module torch.nn

### Convolution Layers

`class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)`

Parameters:

`in_channels (int)`

– Number of channels in the input image`out_channels (int)`

– Number of channels produced by the convolution`kernel_size (int or tuple)`

– Size of the convolving kernel`stride (int or tuple, optional)`

– Stride of the convolution`padding (int or tuple, optional)`

– Zero-padding added to both sides of the input`dilation (int or tuple, optional)`

– Spacing between kernel elements`groups (int, optional)`

– Number of blocked connections from input channels to output channels`bias (bool, optional)`

– If True, adds a learnable bias to the output

Examples:

1 | >>> # With square kernels and equal stride |

### Linear Layers

`class torch.nn.Linear(in_features, out_features, bias=True)`

Applies a linear transformation to the incoming data : $y=Ax+b$

Parameters:

`in_features`

– size of each input sample`out_features`

– size of each output sample`bias`

– If set to False, the layer will not learn an additive bias. Default:`True`

Examples:

1 | >>> m = nn.Linear(20, 30) |

### Containers

`class torch.nn.Sequential(*args)`

A sequential container. Modules will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of modules can also be passed in.

Example:

1 | # Example of using Sequential |

`add_module(name, module)`

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

### Parameters

`class torch.nn.Parameter`

A kind of Variable that is to be considered a module parameter.

Parameters are `Variable`

subclasses, that have a very special property when used with `Module`

s - when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear e.g. in `parameters()`

iterator. Assigning a Variable doesn’t have such effect. This is because one might want to cache some temporary state, like last hidden state of the RNN, in the model. If there was no such class as `Parameter`

, these temporaries would get registered too.

Another difference is that parameters can’t be volatile and that they require gradient by default.

Parameters:

**data**(Tensor) – parameter tensor.**requires_grad**(bool, optional) – if the parameter requires gradient. See Excluding subgraphs from backward for more details.

### Loss functions

`class torch.nn.MSELoss(size_average=True)`

Creates a criterion that measures the mean squared error between n elements in the input *x* and target *y*:

$$loss(x,y) = \frac{1}{n} \sum{|x_i - y_i|^2}$$

*x* and *y* arbitrary shapes with a total of *n* elements each.

The sum operation still operates over all the elements, and divides by *n*.

**The division by n can be avoided** if one sets the internal variable

`size_average`

to False.Examples:

1 | output = net(input) |

## Module torch.nn.functional

### Pooling Layers

`torch.nn.functional.max_pool2d(input, kernel_size, stride=None, padding=0, dilation=1, ceil_mode=False, return_indices=False)`

### Non-linear Activations

`class torch.nn.ReLU(inplace=False)`

Applies the rectified linear unit function element-wise $ReLU(x) = max(0,x)$

Parameters: `inplace`

– can optionally do the operation in-place

Example:

1 | >>> m = nn.ReLU() |

## Module torch.optim

### Taking an optimization step

`optimizer.step()`

`optimizer.step(closure)`

Some optimization algorithms such as Conjugate Gradient and LBFGS need to reevaluate the function multiple times, so you have to pass in a closure that allows them to recompute your model. The closure should clear the gradients,**compute the loss, and return it.**

## Note

`torch.nn`

only supports mini-batches. The entire`torch.nn`

package only supports inputs that are a mini-batch of samples, and not a single sample.

For example,`nn.Conv2d`

will take in a 4D Tensor of`nSamples x nChannels x Height x Width`

.

If you have a single sample, just use`input.unsqueeze(0)`

to add a fake batch dimension.To backpropogate the error all we have to do is to

`loss.backward()`

.**You need to clear the existing gradients though, else gradients will be accumulated to existing gradients.**To select GPUs, use

`CUDA_VISIBLE_DEVICES=2,3 python file.py`