*Automatic
differentiation* (or *autodiff*) is one of the
most unfairly neglected numerical algorithms for scientific
computing. Fortunately, widely popular machine learning
libraries recently started to adopt it (like TensorFlow, for
gradients computation).

Automatic differentiation refers to a set of computational procedures to calculate automatically exact (or better, at working precision) derivatives of any function (including any computer program), using at most a small constant factor more arithmetic operations than the original function. Hence, it differs from:

*Finite differences*: fast method, but introduces large numerical errors depending on the choice of the finite step. Although the dependence can be slightly relieved with some algorithm (e.g., the five-point stencil), errors are significantly larger than working precision and require careful iterative accuracy tests. Finite differences are unreliable in the vicinity of discontinuities.*Symbolic differentiation*: the chain rule is a mechanical procedure easy to apply programmatically. It relies on the fact that a function is written in terms of other elementary mathematical functions whose derivative is known. However, symbolic differentiation of arbitrary programs is less straightforward. Furthermore, it soon becomes inefficient. For instance, let's consider the iteration*n=4*of the logistic map defined by*l*and_{n+1}= 4l_{n}(1-l_{n})*l*(code scripted in Maxima):_{1}= x(%i1) f(x) := 64 * x * (1 - x) * (1 - 2*x)^2 * (1 - 8*x + 8*x^2)^2; 2 2 2 (%o1) f(x) := 64 x (1 - x) (1 - 2 x) (1 - 8 x + 8 x )

The symbolic derivative obtained through systematic application of the chain rule is:(%i2) diff(f(x), x); 2 2 (%o2) (- 256 (1 - 2 x) (1 - x) x (8 x - 8 x + 1) ) 2 2 2 2 2 2 - 64 (1 - 2 x) x (8 x - 8 x + 1) + 64 (1 - 2 x) (1 - x) (8 x - 8 x + 1) 2 2 + 128 (1 - 2 x) (1 - x) x (16 x - 8) (8 x - 8 x + 1)

Let's simplify the result:(%i3) ratsimp(%o2); 7 6 5 4 3 2 (%o3) (- 131072 x ) + 458752 x - 638976 x + 450560 x - 168960 x + 32256 x - 2688 x + 64

This shows that symbolic differentiation can produce large symbolic expressions that take correspondingly long to evaluate (problem known as*expression swell*).

In its simpler form (called *forward-mode* automatic
differentiation), the idea is simple. Let's consider the first
derivative of a scalar function *f: R → R*. Given a
real number *x*, we define its dual
x+x_{d}ε such that

- (x+x
_{d}ε) + (y+y_{d}ε) = (x+y) + (x_{d}+ y_{d})ε - (x+x
_{d}ε) (y+y_{d}ε) = xy + (xy_{d}+ yx_{d})ε

In other words, terms of order *ε ^{2}*
(or larger) are set to zero by definition. Then, the
Taylor-expansion of a function

- f(x+x
_{d}ε) = f(x) + f'(x)x_{d}ε

Setting arbitrarily *x _{d} = 1*, the
coefficients of the

As an example, let's take a polynomial *p(x) = p _{0} +
p_{1}x + p_{2}x^{2}*. The dual
version reads:

- p(x+x
_{d}ε) = p_{0}+ p_{1}(x+x_{d}ε) + p_{2}(x+x_{d}ε)^{2}

= p_{0}+ p_{1}x + p_{2}x^{2}+ (p_{1}+ 2p_{2}x)x_{d}ε

= p(x) + p'(x)x_{d}ε

Hence, given the dual version of the polynomial *p(x)* it
is easy to extract automatically its derivative *p'(x) =
p _{1} + 2p_{2}x* as the coefficient of
the

extract-eps(dual(f)(x + 1ε))

Every computer program eventually executes a sequence of elementary arithmetic operations (addition, multiplication, exponential, etc.). Defining how dual numbers operate on such basic operations and functions then allows to take derivatives at working precision of any computer program. Automatic derivatives are well defined also close to discontinuities, and at most only a small factor more operations are required compared to evaluating the original function.

For instance, if a language defines a *double* type, a
new *adouble* type can be defined to operate on dual
numbers overloading all fundamental operations (if the
language allows operator overloading, like C++). Conceptually
equivalent, but
practically different
procedures are also possible. Some library rely on
source-to-source transformation, i.e., it reads a given source
file, analyzes and transforms it automatically writing a new
source file (this can be done also with languages not allowing
operator overloading, like C). Yet another approach is to
write a programming language with autodiff primitives (being
acquainted with some Lisp dialect can ease this task).

It is easy to generalize the method to higher-order
derivatives, and to functions *f: R ^{n} →
R^{m}*. However, while the procedure described
above is efficient if

Implementing automatic differentiation is rather involved and faces subtle issues (for instance, perturbation confusion), so it is preferable to rely on existing libraries rather than implementing it from scratch. Of course, the algorithm complexity depends on the specific programming language.

Given a programming language with powerful enough abstraction, relatively simple implementations allow impressive results. This is the case for Lisp dialects. (See scmutils, implemented in Scheme and described in the notation appendix of Structure and Interpretation of Classical Mechanics; note that scmutils also implements symbolic differentiation, not to be confused.)

Most commonly used, but less powerful languages (e.g., C/C++)
require significantly more intricated design patterns.
Several
libraries are available. E.g., ADOL-C allows arbitrary
differentiation order using operator over-loading in C++, both
for forward and reverse modes. However, its usage requires to
write C/C++ code in a peculiar form (recording operation on a
so-called *tape* structure) that assumes a good
understanding of the underlying implementation (the
documentation is clear, but has a rather steep learning
curve). Furthermore, all operators and function must be
defined in terms of dual numbers. External functions need to
be registered first and, while this is a welcome feature of
the library, it is difficult to use. The alternative is to
re-implement a function from scratch, that can be easy (e.g.,
a power operator) or cumbersome (e.g., a Bessel
function). Other libraries like ADEPT are easier to use and
have a friendly documentation, but are limited to first
derivatives. More libraries for different programming
languages are listed
at autodiff.org.

Automatic differentiation solves all the main issues related to finite differences and symbolic differentiation. The usage problems outlined above are more related to autodiff not being widely used, rather than with intrinsic limitations of the algorithm. (And to the lack of abstraction in specific languages, but this is a fundamental problem not specific to autodiff.) Fortunately, advances in machine learning and AI are bringing deserved attention to it.

One powerful and user-friendly implementation is available (unsurprisingly) in Python. Autograd computes arbitrary order reverse-mode autodiff with a simple and intuitive interface. Registering external functions is fairly easy, and several Numpy and Scipy functions are already available anyway.