Integration by Substitution

Current UK exam textbooks pass over proofs and mathematical discussions in a hurry to show the ‘how to’ of exam questions.

Integration by substitution is a little more than just backwards chain-rule and deserves a fuller treatment.

Try this,


y=\displaystyle \int \textnormal{f}(x) \ \textnormal{d}x



Suppose that there exists a function g, of another variable u, such that x=\textnormal{g}(u) and let, \textnormal{f}(x)=\textnormal{f}(\textnormal{g}(u))=\textnormal{F}(u). So that,


Now, by the chain rule,

\dfrac{\textnormal{d}y}{\textnormal{d}u}=\dfrac{\textnormal{d}y}{\textnormal{d}x}\times \dfrac{\textnormal{d}x}{\textnormal{d}u}=\textnormal{F}(u)\dfrac{\textnormal{d}x}{\textnormal{d}u}


y=\displaystyle \int \dfrac{\textnormal{d}y}{\textnormal{d}u} \ \textnormal{d}u=\displaystyle \int \textnormal{F}(u)\dfrac{\textnormal{d}x}{\textnormal{d}u} \ \textnormal{d}u


y=\displaystyle \int \textnormal{f}(x) \ \textnormal{d}x=\displaystyle \int \textnormal{F}(u)\dfrac{\textnormal{d}x}{\textnormal{d}u} \ \textnormal{d}u

Differentiation From First Principles

The gradient of a smooth curve, \textnormal{f}(x), at a point x is the gradient of the tangent to the curve at the point x. Point P is on the curve and Q is a neighbouring point whose x value is displaced a small quantity, \delta x.

The idea behind differentiation is that as \delta x becomes very small, the gradient of PQ tends towards the gradient of the curve. In the limit as \delta x becomes infinitesimally close to zero, the gradient PQ becomes the gradient of the curve.

We write:

\textnormal{gradient f}(x)=\dfrac{\textnormal{d}y}{\textnormal{d}x}=\lim_{\delta x \rightarrow 0}\left(\dfrac{\delta y}{\delta x}\right)=\lim_{\delta x \rightarrow 0}\left(\dfrac{\textnormal{f}(x+\delta x)-\textnormal{f}(x)}{\delta x}\right)

there is a fair bit of analytic work missing (higher education) to make these ideas sound.

We also write:



Standard results can be proved for different functions.

If \textnormal{f}(x)=x^{n} then

If \textnormal{f}(x)=\sin x, then we need to consider the small angle approximation that is if \delta x radians is very small (infinitesimal), then \delta x\approx\sin \delta x and \cos \delta x \approx 1, and compound trigonometry from which follows,

The differentiation process described above is linear and extends to more complicated functions. That is to say that if, y=a\textnormal{f}(x)+b\textnormal{g}(x) where a,b \in \mathbb{R},

The Fundamental Theorem of Calculus

Integration is introduced as the reversal of differentiation i.e. in solving a differential equation, \dfrac{\textnormal{d}y}{\textnormal{d}x}=\textnormal{g}(x). The link between integration and area is often passed over and is the subject of the Fundamental Theorem of Calculus. [The following discussion can be adapted for a decreasing function or, piece-wise, a function which successively increases or decreases.]

Consider and area function, A(x), defined by the area under \textnormal{f}(x) between a and and a general point, x. If a small increment, \delta x, is applied to x giving a small element, \delta A of area. Now,

\textnormal{f}(x)\delta x \leqslant \delta A \leqslant \textnormal{f}(x+\delta x)\delta x

dividing though by \delta x, gives,

\textnormal{f}(x) \leqslant \dfrac{\delta A}{\delta x} \leqslant \textnormal{f}(x+\delta x),

a limit sandwich where, as \delta x \rightarrow 0,


The curve function, \textnormal{f}(x) is the derivative of the area function; hence the area function is the anti-derivative of the curve function and,

\displaystyle\int \textnormal{f}(x) \textnormal{d}x=A(x).