Automatic differentiation (or autodiff) is one of the most unfairly neglected numerical algorithms for scientific computing. Fortunately, widely popular machine learning libraries recently started to adopt it (like TensorFlow, for gradients computation).
Automatic differentiation refers to a set of computational procedures to calculate automatically exact (or better, at working precision) derivatives of any function (including any computer program), using at most a small constant factor more arithmetic operations than the original function. Hence, it differs from:
(%i1) f(x) := 64 * x * (1 - x) * (1 - 2*x)^2 * (1 - 8*x + 8*x^2)^2; 2 2 2 (%o1) f(x) := 64 x (1 - x) (1 - 2 x) (1 - 8 x + 8 x )The symbolic derivative obtained through systematic application of the chain rule is:
(%i2) diff(f(x), x); 2 2 (%o2) (- 256 (1 - 2 x) (1 - x) x (8 x - 8 x + 1) ) 2 2 2 2 2 2 - 64 (1 - 2 x) x (8 x - 8 x + 1) + 64 (1 - 2 x) (1 - x) (8 x - 8 x + 1) 2 2 + 128 (1 - 2 x) (1 - x) x (16 x - 8) (8 x - 8 x + 1)Let's simplify the result:
(%i3) ratsimp(%o2); 7 6 5 4 3 2 (%o3) (- 131072 x ) + 458752 x - 638976 x + 450560 x - 168960 x + 32256 x - 2688 x + 64This shows that symbolic differentiation can produce large symbolic expressions that take correspondingly long to evaluate (problem known as expression swell).
In its simpler form (called forward-mode automatic differentiation), the idea is simple. Let's consider the first derivative of a scalar function f: R → R. Given a real number x, we define its dual x+xdε such that
In other words, terms of order ε2 (or larger) are set to zero by definition. Then, the Taylor-expansion of a function f reads:
Setting arbitrarily xd = 1, the coefficients of the ε term provides the first derivative of the function, f'(x).
As an example, let's take a polynomial p(x) = p0 + p1x + p2x2. The dual version reads:
Hence, given the dual version of the polynomial p(x) it is easy to extract automatically its derivative p'(x) = p1 + 2p2x as the coefficient of the ε term when xd = 1. In procedural terms, given a function f the idea is to advance as:
extract-eps(dual(f)(x + 1ε))
Every computer program eventually executes a sequence of elementary arithmetic operations (addition, multiplication, exponential, etc.). Defining how dual numbers operate on such basic operations and functions then allows to take derivatives at working precision of any computer program. Automatic derivatives are well defined also close to discontinuities, and at most only a small factor more operations are required compared to evaluating the original function.
For instance, if a language defines a double type, a new adouble type can be defined to operate on dual numbers overloading all fundamental operations (if the language allows operator overloading, like C++). Conceptually equivalent, but practically different procedures are also possible. Some library rely on source-to-source transformation, i.e., it reads a given source file, analyzes and transforms it automatically writing a new source file (this can be done also with languages not allowing operator overloading, like C). Yet another approach is to write a programming language with autodiff primitives (being acquainted with some Lisp dialect can ease this task).
It is easy to generalize the method to higher-order derivatives, and to functions f: Rn → Rm. However, while the procedure described above is efficient if n ≥ m, in the case n < m a different approach called reverse-mode automatic differentiation is most suited.
Implementing automatic differentiation is rather involved and faces subtle issues (for instance, perturbation confusion), so it is preferable to rely on existing libraries rather than implementing it from scratch. Of course, the algorithm complexity depends on the specific programming language.
Given a programming language with powerful enough abstraction, relatively simple implementations allow impressive results. This is the case for Lisp dialects. (See scmutils, implemented in Scheme and described in the notation appendix of Structure and Interpretation of Classical Mechanics; note that scmutils also implements symbolic differentiation, not to be confused.)
Most commonly used, but less powerful languages (e.g., C/C++) require significantly more intricated design patterns. Several libraries are available. E.g., ADOL-C allows arbitrary differentiation order using operator over-loading in C++, both for forward and reverse modes. However, its usage requires to write C/C++ code in a peculiar form (recording operation on a so-called tape structure) that assumes a good understanding of the underlying implementation (the documentation is clear, but has a rather steep learning curve). Furthermore, all operators and function must be defined in terms of dual numbers. External functions need to be registered first and, while this is a welcome feature of the library, it is difficult to use. The alternative is to re-implement a function from scratch, that can be easy (e.g., a power operator) or cumbersome (e.g., a Bessel function). Other libraries like ADEPT are easier to use and have a friendly documentation, but are limited to first derivatives. More libraries for different programming languages are listed at autodiff.org.
Automatic differentiation solves all the main issues related to finite differences and symbolic differentiation. The usage problems outlined above are more related to autodiff not being widely used, rather than with intrinsic limitations of the algorithm. (And to the lack of abstraction in specific languages, but this is a fundamental problem not specific to autodiff.) Fortunately, advances in machine learning and AI are bringing deserved attention to it.
One powerful and user-friendly implementation is available (unsurprisingly) in Python. Autograd computes arbitrary order reverse-mode autodiff with a simple and intuitive interface. Registering external functions is fairly easy, and several Numpy and Scipy functions are already available anyway.