Blogs/Backpropagation

# Backpropagation

peterwashington Nov 02 2021 6 min read 0 viewsNeural Networks

Blogs/Backpropagation

Neural Networks

Training a neural network with gradient descent involves b*ackpropagation*, which is an efficient method for calculating the gradient of the error function with respect to the weights of the network, and this calculation is required for updating each weight in the network.

Let’s start by looking at backpropagation in action for the following simple neural network trained with the below 3 inputs and corresponding outputs:

The objective of neural network training is to update the model weights so that node D has a value of 17 when node A is 3, has a value of 200 when node A is 10, and has a value of -17 when node A is 7. Using gradient descent, we need to calculate the derivative of the loss function with respect to each individual model weight.

Backpropagation involves the *chain rule* from Calculus. If you are unfamiliar with Calculus, then don’t worry – the chain rule is simply a way to say that if you have a variable z that depends on y (z = f(y)), and y depends on x (y = f(x)), then:

In backpropagation, we repeatedly apply the chain rule, calculating how we update the weights at the end of the network, and reusing the calculations to calculate how to update weights that are progressively towards the beginning of the neural network.

We start from the end of the network, aiming to calculate how the loss function, L, changes with respect to changes in the last network weight cd. Using the chain rule, we know that:

Applying the chain rule twice, we can calculate how the loss function changes with respect to changes in the previous edge, bc, as:

To calculate the last derivative, we once again apply the chain rule:

If we had a network with 26 nodes going from A to Z, then we would calculate the derivative with respect to the first edge in the network (the last edge we evaluate in backpropagation) as:

Because we have done most of the calculation for the previous edge, it is very computationally efficient to calculate the derivatives going from the back to the front of the neural network, hence the name “backpropagation” .