This cheating sheet is only computer/Machine Learning/Deep Learning wise.

I have read Paul Dawkin’s previous contents and accumulated some gists that I found useful. (content seems not to be there anymore, for more infomation, please contact the author)

inverse matrices and elementary matrices Link to heading

  • If \(A\) is a square matrix and we can find another matrix of the same size, say \(B\), such that

    \[ AB = BA = I \]

    then we call \(A\) invertible and we say that \(B\) is an inverse of the matrix \(A\). If we can’t find such a matrix \(B\) we call A a singular matrix.

  • Suppose that \(A\) and \(B\) are invertible matrices of the same size. Then,

    (a) \(AB\) is invertible and \( {(AB)}^{−1} = {B}^{−1}{A}^{−1} \).

    (b) \({A}^{−1}\) is invertible and \( {({A}^{−1})}^{-1} =A\).

    (c) For \( n=0,1,2,… {A}^n \) is invertible and \( {({A}^n)}^{-1} = {A}^{−n} ={(A^{−1})}^n \).

    (d) If \(c\) is any non-zero scalar then \(cA\) is invertible and \( {(cA)}^{−1} = {1 \over c} {A}^{−1} \) (e) \(A^T\) is invertible and \( {(A^T)}^{-1} = {(A^{−1})}^T \).

elementary matrix Link to heading

  • A square matrix is called an elementary matrix if it can be obtained by applying a single elementary row operation to the identity matrix of the same size.

finding inverse matrices Link to heading

  • If \(A\) is an \(n \times n\) matrix then the following statements are equivalent.

    (a) \(A\) is invertible.

    (b) The only solution to the system \(A \mathbf{x} = 0\) is the trivial solution.

    (c) \(A\) is row equivalent to \(I_n\).

    (d) \(A\) is expressible as a product of elementary matrices.

    (e) \(A \mathbf{x} = \mathbf{b} \) has exactly one solution for every \(n \times 1\) matrix \(\mathbf{b}\).

    (f) \(A \mathbf{x} = \mathbf{b} \) is consistent for every \(n \times 1\) matrix \(\mathbf{b}\).

    (g) \( det(A) \ne 0 \).

consistent and inconsistent Link to heading

A system of linear equations is called inconsistent if it has no solutions. A system which has a solution is called consistent.

Special Matrices Link to heading

  • About triangular matrix

    (a) The product of lower triangular matrices will be a lower triangular matrix.

    (b) The product of upper triangular matrices will be an upper triangular matrix.

    (c) The inverse of an invertible lower triangular matrix will be a lower triangular matrix.

    (d) The inverse of an invertible upper triangular matrix will be an upper triangular matrix.

  • About symmetric matrix

    (a) For any matrix \(A\) both \(AA^T\) and \(A^T A\) are symmetric.

    (b) If \(A\) is an invertible symmetric matrix then \(A^{−1}\) is symmetric.

    (c) If \(A\) is invertible then \(AA^T\) and \(A^T A\) are both invertible.

Determinants Link to heading

  • Let \(A\) be an \(n \times n\) matrix and \(c\) be a scalar then, \[det(cA)=c^n det(A)\]

  • If \(A\) and \(B\) are matrices of the same size then \[ det(AB)=det(A)det(B) \]

  • Suppose that \(A\) is an invertible matrix then, \[ det(A^{−1})= {1 \over det(A)} \]

singular and non-singular Link to heading

  • A square matrix A is invertible if and only if \( det (A) \ne 0 \) . A matrix that is invertible is often called non-singular and a matrix that is not invertible is often called singular.

  • If A is a square matrix then, \( det(A)=det(A^T) \).

  • Suppose that \(A\) is an \(n \times n\) triangular matrix then, \( det(A)= a11\ a22\ …\ ann \).

adjoint Link to heading

matrix of cofactors from A, Adjoint of A Link to heading

If \(A\) is a square matrix then the minor of \(a_{ij}\),

denoted by \(M_{ij}\)

is the determinant of the submatrix that results from removing the \(i^{th}\) row and \(j^{th}\) column of \(A\). If \(A\) is a square matrix then the cofactor of \(a_{ij}\)

denoted by \(C_{ij}\)

is the number \((−1)^{i+j} M_{ij}\) .

  • Let \(A\) be an n×n matrix and \(C_{ij}\)

be the cofactor of \(a_{ij}\). The matrix of cofactors from A is,

\begin{bmatrix} C11 & C12 & … & C1n\\ C21 & C22 & … & C2n\\ \vdots & \vdots & \ddots & \vdots\\ Cn1 & Cn2 & … & Cnn \end{bmatrix}

The adjoint of A is the transpose of the matrix of cofactors and is denoted by adj(A).

  • If A is an invertible matrix then \[ A^{−1} = {1 \over det(A)}adj(A) \]

  • Let A be a square matrix.

    (a) If \(B\) is the matrix that results from multiplying a row or column of \(A\) by a scalar, \(c\), then \( det(B)= cdet(A)\)

    (b) If \(B\) is the matrix that results from interchanging two rows or two columns of \(A\) then \(det(B)=−det(A)\)

    (c) If \(B\) is the matrix that results from adding a multiple of one row of \(A\) onto another row of \(A\) or adding a multiple of one column of \(A\) onto another column of \(A\) then \(det(B)=det(A)\)

Cramer’s Rule Link to heading

Suppose that \(A\) is an \(n \times n\) invertible matrix. Then the solution to the system \(Ax=b\) is given by,

\[ x_1 = { det(A_1) \over det(A) }, x_2 = { det(A_2) \over det(A) }, …, x_n = { det(A_n) \over det(A) } \]

where \(A_i\) is the matrix found by replacing the \(i^{th}\) column of \(A\) with \(\mathbf{b}\).

Euclidean n-space Link to heading

  • Two non-zero vectors, u and v, are orthogonal if and only if \({u} \centerdot {v} = 0 \)

  • Suppose that \(u\) and \(a \ne 0\) are both vectors in 2-space or 3-space then,

\[proj_a \mathbf{u} = { {\mathbf{u} \centerdot \mathbf{a} } \over {\parallel \mathbf{a} \parallel}^2 } \mathbf{a}\]

and the vector component of u orthogonal to a is given by,

\[\mathbf{u} - proj_a \mathbf{u} = {\mathbf{u}- {\mathbf{u} \centerdot \mathbf{a} } \over {\parallel \mathbf{a} \parallel}^2 } \mathbf{a}\]

  • If u and v are two vectors in 3-space then the cross product, denoted by u × v and is defined in one of three ways.

    (a) \( \mathbf{u} \times \mathbf{v} = (u_2v_3 −u_3v_2,u_3v_1 −u_1v_3,u_1v_2 −u_2v_1) \)-Vector Notation.

    (b) \( \mathbf{u} \times \mathbf{v} = (\begin{vmatrix} u_2 & u_3 \\ v_2 & v_3\end{vmatrix}, -\begin{vmatrix} u_1 & u_3 \\ v_1 & v_3\end{vmatrix}, \begin{vmatrix} u_1 & u_2 \\ v_1 & v_2\end{vmatrix}) \) - Using 2 x 2 determinants

    (c) \( \mathbf{u} \times \mathbf{v} = \begin{vmatrix} \mathbf{i} & \mathbf{j} & \mathbf{k}\\ u_1 & u_2 & u_3\\ v_1 & v_2 & v_3\end{vmatrix} \) - Using 3 x 3 determinants

  • The cross product a × b is defined as a vector c that is perpendicular (orthogonal) to both a and b, with a direction given by the right-hand rule and a magnitude equal to the area of the parallelogram that the vectors span.

  • Suppose \(\mathbf{u}, \mathbf{v}\), and \(\mathbf{w}\) are vectors in 3-space and \(c\) is any scalar then

    (a) \( \mathbf{u×v=−(v×u)} \)

    (b) \( \mathbf{u×(v+w)=(u×v)+(u×w)} \)

    (c) \( \mathbf{(u+v)×w=(u×w)+(v×w)} \)

    (d) \( c\mathbf{(u×v)}=(c\mathbf{u})×\mathbf{v}=\mathbf{u}×(c\mathbf{v}) \)

    (e) \( \mathbf{u×0=0×u=0} \)

    (f) \( \mathbf{u×u=0} \)

lagrange identity Link to heading

  • Suppose \( \mathbf{u, v} \), and \( \mathbf{w} \) are vectors in 3-space then,

    (a) \( \mathbf{u \centerdot (u×v)=0} \)

    (b) \( \mathbf{v \centerdot (u×v)=0} \)

    (c) \( {\parallel \mathbf{u \times v} \parallel}^2 = {\parallel \mathbf{u} \parallel}^2\ {\parallel \mathbf{v} \parallel}^2 −{\mathbf{(u \centerdot v)}}^2 \) - This is called Lagrange’s Identity

    (d) \( \mathbf{ u \times (v \times w)=(u \centerdot w)v−(u \centerdot v)w } \)

    (e) \( \mathbf{(u \times v) \times w=(u \centerdot w)v−(v \centerdot w)u} \)

  • Suppose that \(\mathbf{u}\) and \(\mathbf{v}\) are vectors in 3-space and let θ be the angle between them then,

\[ \parallel \mathbf{u \times v} \parallel = \parallel \mathbf{u} \parallel\parallel \mathbf{v} \parallel \sin\theta \]

euclidean inner product Link to heading

  • Suppose \( \mathbf{u}=(u_1,u_2,…,u_n) \) and \( \mathbf{v} = (v_1,v_2,…,v_n) \) are two vectors in \( \mathbb{R}^n \) then the Euclidean inner product denoted by \( \mathbf{ u \centerdot v } \) is defined to be \( \mathbf{u \centerdot v} = u_1v_1 +u_2v_2 +…+u_nv_n \)

euclidean norm Link to heading

  • Suppose \( \mathbf{u} = (u_1,u_2,…,u_n) \) is a vector in \( \mathbb{R}^n \) then the Euclidean norm is, \[ \parallel \mathbf{u} \parallel = (\mathbf{u \centerdot u})^{1 \over 2} = \sqrt{ u_1^2 + u_2^2 + … + u_n^2 } \]

euclidean distance Link to heading

  • Suppose \( \mathbf{u}=(u_1,u_2,…,u_n) \)and \( \mathbf{v} =(v_1,v_2,…,v_n) \) are two points in \(\mathbb{R}^n\) then the Euclidean distance between them is defined to be, \[ d(\mathbf{u}, \mathbf{v}) = \parallel \mathbf{u-v} \parallel = \sqrt{(u_1-v_1)^2+(u_2-v_2)^2+…+(u_n-v_n)^2} \]

  • If \(\mathbf{u}\) and \(\mathbf{v}\) are two vectors in \(\mathbb{R}^n\) then, \[ \mathbf{u \centerdot v} = {1 \over 4} {\parallel \mathbf{u+v} \parallel}^2 - {1 \over 4} {\parallel \mathbf{u-v} \parallel}^2 \]

  • \( f: \mathbb{R}^n \rightarrow \mathbb{R}^m \) for example define \( T: \mathbb{R}^2 \rightarrow \mathbb{R}^4\) as, \[ T(x_1, x_2) = (3x_1-4x_2, x_1+2x_2, 6x_1-x_2, 10x_2) = \begin{bmatrix} 3 & -4 \\ 1 & 2 \\ 6 & -1 \\ 0 & 10 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2\end{bmatrix} \Rightarrow \mathbf{w} = A\mathbf{x} \]

linear transformation Link to heading

  • A function \( T: \mathbb{R}^n \rightarrow \mathbb{R}^m \) is called a linear transformation if for all \(\mathbf{u}\) and \(\mathbf{v}\) in \(\mathbb{R}^n\) and all scalars \(c\) we have, \[ T(\mathbf{u+v}) = T(\mathbf{u}) + T(\mathbf{v}) \] \[ T(c\mathbf{u}) = cT(\mathbf{u}) \]

induced transformation Link to heading

  • If A is an \(m \times n\) matrix then its induced transformation, \( T_A: \mathbb{R}^n \rightarrow \mathbb{R}^m \) defined as, \[ T_A(\mathbf{x}) = A\mathbf{x} \] is a linear transformation.

matrix induced by T Link to heading

  • Let \( T: \mathbb{R}^n \rightarrow \mathbb{R}^m \) be a linear transformation, then there is an \(m \times n\) matrix such that \(T = T_A\) (recall that \(T_A\) is the transformation induced by \(A\)). The matrix \(A\) is called the matrix induced by \(T\) and is sometimes denoted as \(A = [T]\).

example of linear transformations Link to heading

please refer to Paul Dawkins http://tutuorial.math.lamar.edu/

vector spaces Link to heading

linear independence Link to heading

  • Suppose \( S = {\mathbf{v_1},\mathbf{v_2},…,\mathbf{v_n}} \) is a non-empty set of vectors and form the vector equation, \[c_1\mathbf{v_1} +c_2\mathbf{v_2} + … +c_n\mathbf{v_n} =0\] This equation has at least one solution,namely, \(c_1 =0, c_2 =0,…, c_n =0\). This solution is called the trivial solution.

    If the trivial solution is the only solution to this equation then the vectors in the set S are called linearly independent and the set is called a linearly independent set. If there is another solution then the vectors in the set S are called linearly dependent and the set is called a linearly dependent set.

  • A finite set of vectors that contains the zero vector will be linearly dependent.

  • Suppose that \( S = {\mathbf{v_1},\mathbf{v_2},…,\mathbf{v_n}} \) is a set of vectors in \( \mathbb{R}^n\) . If k > n then the set of vectors is linearly dependent.

basis Link to heading

  • Suppose \( S = {\mathbf{v_1},\mathbf{v_2},…,\mathbf{v_n}} \) is a set of vectors from the vector space V. Then S is called a basis (plural is bases) for V if both of the following conditions hold.

    (a) \(span(S)=V\), i.e. S spans the vector space V.

    (b) S is a linearly independent set of vectors.

  • Suppose that the set \( S = {\mathbf{v_1},\mathbf{v_2},…,\mathbf{v_n}} \) is a basis for the vector space V then every vector u from V can be expressed as a linear combination of the vectors from S in exactly one way.

dimension Link to heading

  • Suppose that V is a non-zero vector space and that S is a set of vectors from V that for a basis for V. If S contains a finite number of vectors, say \( S = {\mathbf{v_1},\mathbf{v_2},…,\mathbf{v_n}} \), then we call V a finite dimensional vector space and we say that the dimension of V, denoted by dim (V ) , is n (i.e. the number of basis elements in S. If V is not a finite dimensional vector space (so S does not have a finite number of vectors) then we call it an infinite dimensional vector space.

    By definition the dimension of the zero vector space (i.e. the vector space consisting solely of the zero vector) is zero.

  • Suppose that V is a finite dimensional vector space with dim (V ) = n and that S is any finite set of vectors from V.

    (a) If S spans V but is not a basis for V then it can be reduced to a basis for V by removing certain vectors from S.

    (b) If S is linearly independent but is not a basis V then it can be enlarged to a basis for V by adding in certain vectors from V.

change of basis Link to heading

  • Suppose that \( S = {\mathbf{v_1},\mathbf{v_2},…,\mathbf{v_n}} \) is a basis for a vector space V and that u is any vector from V. Since u is a vector in V it can be expressed as a linear combination of the vectors from S as follows, \[ \mathbf{u} = c_1\mathbf{v_1} +c_2\mathbf{v_2} + … +c_n\mathbf{v_n} \] The scalars \( c_1,c_2,…,c_n \) are called the coordinates of u relative to the basis S. The coordinate vectors of u relative to S is denoted by \( (u)_S \) and defined to be the following vector in \( \mathbb{R}^n \), \[ (\mathbf{u})_S =(c_1,c_2,…,c_n) \]

  • Suppose that V is a n-dimensional vector space and further suppose that \( B ={\mathbf{v_1},\mathbf{v_2},…,\mathbf{v_n}} \) and \( C ={\mathbf{w_1},\mathbf{w_2},…,\mathbf{w_n}} \) are two bases for V. The transition matrix from C to B is defined to be,

\[ P = [[\mathbf{w_1}_B]|[\mathbf{w_2}_B]|…|[\mathbf{w_n}_B]] \]

where the \(i^th\) column of P is the coordinate matrix of \(\mathbf{w_i}\) relative to B.

The coordinate matrix of a vector u in V, relative to B, is then related to the coordinate matrix of u relative to C by the following equation. \[ {[\mathbf{u}]}_B =P{[\mathbf{u}]}_C \]

  • Suppose that V is a finite dimensional vector space and that P is the transition matrix from C to B then,

    (a) P is invertible and,

    (b) \(P^{−1}\) is the transition matrix from B to C.

fundamental subspaces Link to heading

  • The row vectors (we called them row matrices at the time) are the vectors in \(\mathbb{R}^m\) formed out of the rows of A. The column vectors (again we called them column matrices at the time) are the vectors in \(\mathbb{R}^n\) that are formed out of the columns of A.

  • Suppose that A is an n × m matrix.

    (a) The subspace of \(\mathbb{R}^m\) that is spanned by the row vectors of A is called the row space of A.

    (b) The subspace of \(\mathbb{R}^n\) that is spanned by the column vectors of A is called the column space of A.

  • The dimension of the null space of A is called the nullity of A and is denoted by nullity(A).

  • Suppose that A is a matrix and U is a matrix in row-echelon form that has been obtained by performing row operations on A. Then the row space of A and the row space of U are the same space.

  • Suppose that A and B are two row equivalent matrices (so we got from one to the other by row operations) then a set of column vectors from A will be a basis for the column space of A if and only if the corresponding columns from B will form a basis for the column space of B.

rank Link to heading

  • Suppose that A is a matrix then the row space of A and the column space of A will have the same dimension. We call this common dimension the rank of A and denote it by rank(A).

  • Let A be an n × n matrix. The following statements are equivalent.

    (a) A is invertible.

    (b) The only solution to the system \(A\mathbf{x} = 0 \) is the trivial solution.

    (c) A is row equivalent to \(I_n\) .

    (d) A is expressible as a product of elementary matrices.

    (e) \(A\mathbf{x} = \mathbf{b} \) has exactly one solution for every n ×1 matrix b.

    (f) \(A\mathbf{x} = \mathbf{b} \) is consistent for every n ×1 matrix b.

    (g) \( det(A) \neq 0\)

    (h) The null space of A is {0}, i.e. just the zero vector.

    (i) nullity(A)=0.

    (j) rank(A)=n.

    (k) The columns vectors of A form a basis for \(\mathbb{R}^n\).

    (l) The row vectors of A form a basis for \(\mathbb{R}^n\).

inner product Link to heading

  • Suppose u, v, and w are all vectors in a vector space V and c is any scalar. An inner product on the vector space V is a function that associates with each pair of vectors in V, say u and v, a real number \( <\mathbf{u,v}> \)that satisfies the following denoted by axioms.

    (a) \( <\mathbf{u,v}> = <\mathbf{v,u}> \)

    (b) \( <\mathbf{u+v,w}> = <\mathbf{u,w}> + <\mathbf{v,w}> \)

    (c) \( < {c}\mathbf{u,v}> = {c}<\mathbf{u,v}> \)

    (d) \( <\mathbf{u,u}> \geq 0 \) and \( <\mathbf{u,u}> =0 \) if and only if u=0

    A vector space along with an inner product is called an inner product space.

  • Suppose that u and v are two vectors in an inner product space. They are said to be orthogonal if \( <\mathbf{u,v}> =0 \).

orthogonal complements Link to heading

  • Suppose that W is a subspace of an inner product space V. We say that a vector u from V is orthogonal to W if it is orthogonal to every vector in W. The set of all vectors that are orthogonal to W is called the orthogonal complement of W and is denoted by \( W^{\bot} \).

    We say that W and \( W^{\bot} \) are orthogonal complements.

  • Suppose W is a subspace of an inner product space V. Then,

    (a) \( W^{\bot} \) is a subspace of V.

    (b) Only the zero vector, 0, is common to both W and \( W^{\bot} \) .

    (c) \( {(W^{\bot})}^{\bot} = W \) . Or in other words, the orthogonal complement of \( W^{\bot} \) is W.

orthogonal basis Link to heading

  • Suppose that S is a set of vectors in an inner product space.

    (a) If each pair of distinct vectors from S is orthogonal then we call S an orthogonal set.

    (b) If S is an orthogonal set and each of the vectors in S also has a norm of 1 then we call S an orthonormal set.

  • Suppose that \( S ={ \mathbf{v_1}, \mathbf{v_2},… ,\mathbf{v_n}} \) is an orthogonal basis for an inner product space and that u is any vector from the inner product space then,

\[ \mathbf{u} = { < \mathbf{u,v_1} > \over {\parallel \mathbf{v_1} \parallel}^2 } \mathbf{v_1} + { < \mathbf{u,v_2} > \over {\parallel \mathbf{v_2} \parallel}^2 } \mathbf{v_2} + … + { < \mathbf{u,v_n} > \over {\parallel \mathbf{v_n} \parallel}^2 } \mathbf{v_n}\]

If in addition S is in fact an orthonormal basis then,

\[ \mathbf{u} = <\mathbf{u,v_1}>\mathbf{v_1} + <\mathbf{u,v_2}>\mathbf{v_2} + … + <\mathbf{u,v_n}>\mathbf{v_n} \]

Gram-Schmidt Porcess Link to heading

Suppose that V is a finite dimensional inner product space and that \( {\mathbf{v_1},\mathbf{v_2},…,\mathbf{v_n}} \) is a basis for V then an orthogonal basis for V, \( {\mathbf{u_1},\mathbf{u_2},…,\mathbf{u_n}} \), can be found using the following process.

\[ \mathbf{u_1} = \mathbf{v_1} \]

\[ \mathbf{u_2} = \mathbf{v_2} - {<\mathbf{v_2, u_1}> \over {\parallel \mathbf{u_1} \parallel}^2}\mathbf{u_1} \]

\[ \mathbf{u_3} = \mathbf{v_3} - {<\mathbf{v_3, u_1}> \over {\parallel \mathbf{u_1} \parallel}^2}\mathbf{u_1} - {<\mathbf{v_3, u_2}> \over {\parallel \mathbf{u_2} \parallel}^2}\mathbf{u_2} \]

\[ \vdots \vdots \]

\[ \mathbf{u_n} = \mathbf{v_n} - {<\mathbf{v_n, u_1}> \over {\parallel \mathbf{u_1} \parallel}^2}\mathbf{u_1} - {<\mathbf{v_n, u_2}> \over {\parallel \mathbf{u_2} \parallel}^2}\mathbf{u_2} - {<\mathbf{v_n, u_3}> \over {\parallel \mathbf{u_3} \parallel}^2}\mathbf{u_3} - … \]

To convert the basis to an orthonormal basis simply divide all the new basis vectors by their norm. Also, due to the construction process we have

\[ span(\mathbf{u_1},\mathbf{u_2},…,\mathbf{u_k}) = span(\mathbf{v_1},\mathbf{v_2},…,\mathbf{v_k}) \] for \[k =1,2,…,n \]

and \(u_k \) will be orthogonal to

\( span( v1, …,v_{k−1}) \) for \( k=2,3,…n. \)