The goal of this tutorial is provide a minimal self-contained implementation of automatic differentation. If you are developing machine learning algorithms or applications, you have definitely used PyTorch, TensorFlow, or JAX before. These are all examples of automatic differentation (AD) software. AD software is the backbone for AI algorithms since the advent of deep learning since training deep neural networks requires to compute gradients over the network via backpropagation. If you want to learn more about the fundamental principles in the design of such software, this tutorial is for you. It is split into 3 sections:
Acknowledgement: I took inspiration from Mostafa Samir's tutorial on backpropagation and wanna thank him for letting me re-using parts of his code.
Note: AD and differentiable programming. AD software is basically a programming language for so-called differentiable programming. A program on a high level is basically an algorithm that converts an input to an output depending on certain parameters $\theta$: $$\text{Input}\to \text{Algorithm}(\theta)\to\text{Output}\to \text{Performance/Loss}$$ e.g. for a neural network, $\theta$ are the weights of that neural network. The key to differentiable programming is that we can compute the derivative with respect to $\theta$: $$ \frac{d}{d\theta}\text{Loss}=\frac{d}{d\theta}\text{Algorithm}(\theta)$$ and therefore, we can optimize the program via gradient descent: $$\theta_{t+1}=\theta_t-\lambda \frac{d}{d\theta}\text{Loss}$$ Therefore, the key to applying this paradigm is to be able to differentiate any program, i.e. to compute gradients. This is what we are going to learn in this tutorial.
#!pip install pydot-ng
#!pip install graphviz