# A Tutorial on Linear Algebra and Geometry ( Part 2)

* Author: Paolo Caressa*

*Points, vectors and their algebra*

So far we dealt with points as identified with pairs (or triples for space) of real numbers: as far as Machine Learning is concerned, we are interested infinite sets of points, which represent particular objects to classify or phenomena to correlate as points in a Cartesian space. However, to find out some regular behaviour or pattern for those points, often we are interested in lines, planes and also more complex geometric figures, as conics or quadrics.

We have already seen, by an example, that some geometric figures may also be represented by single equations or systems of equations. Let us bound ourselves to lines and planes to understand in general how to represent them since often those representations are used in Machine Learning.

A line is, intuitively speaking, a set of collinear points: one of Euclid’s axioms claims that given two distinct points there is exactly one line passing through them so that to identify uniquely a line two of its points suffice.

To understand how to do that in the Cartesian plane, let us consider two points P ≠ Q, whose coordinates are P=(x_{1},y_{1}) e Q=(x_{2},y_{2}). Let us define the *vector* from P to Q as the pair of real numbers

Thus a vector is by definition the difference between two points, whereby “difference” we mean the coordinate-wise difference.

From the IT point of view, we may imagine a point as an array (or list) with two elements (which are numbers). Usually an array A = [x,y] is indexed starting by 0, thus A[0] = x and A[1] = y. Then, if A and B are arrays, the vector may be represented by the array [B[0] – A[0], B[1] – A[1]].

Why do we call “vector” a difference of points? Because it is an object which has direction, sense and magnitude, arrays as vectors used in Physics.

Its direction is just the line passing through P and Q; its orientation is from P toward Q (the opposite vector would be which runs from Q toward P: which are its coordinates?)

The magnitude is the distance between points P and Q. Therefore a vector determines a line, and a sense on it: of course there are infinite vectors sharing the same direction (the same line).

For example, the line passing through and is determined by the vector , but also by the vector which has the same direction but different magnitude.

In general, for each number, a ≠ 0 the vector induces this line. We say that two vectors are *paralleli *if one is multiple of the other one by a non-zero scale factor.

Indeed, another way to determine a line is to pick one of its points and to provide a vector so to be able to express all the line’s points as translated by vectors parallel to .

This is called the *parametric equation* of the line since it expresses the generic point on the line via a parameter which varies through all real numbers.

Notice that we used the operation of the *sum between a point and a vector*, defined as follows:

Although vectors are useful to describe lines (but also planes and, in higher dimension, “hyperplanes”), they are interesting in themselves because of their algebraic properties, such as:

- Vectors may be added to get new vectors according to the rule

By a *vector space,* we mean a set of elements equipped with two operations, sum between elements and multiplication by a number, which satisfies the previous properties shared by vectors in the Cartesian plane as defined as differences of points.

Actually, in Machine Learning, one always uses finite-dimensional vector spaces whose vectors are expressed in coordinates, thus vectors are identified with n-tuples of numbers (their coordinates). This is a major element of confusion for beginners since both points and vectors in the Cartesian space are represented as a pair of numbers.

Actually, in Machine Learning, one always uses finite-dimensional vector spaces whose vectors are expressed in coordinates, thus vectors are identified with n-tuples of numbers (their coordinates). This is a major element of confusion for beginners since both points and vectors in the Cartesian space are represented as a pair of numbers.

However, even if they are represented in the same way, points and vectors are conceptually different objects. A point identifies a single point in the space, while a vector identifies a displacement along a certain direction, with given sense and magnitude: points and vectors may be put in one-to-one correspondence as follows: To a point *P* it is associated the vector which starts from the origin and ends in *P*; to a vector instead, we may associate the point .

This distinction between points and vectors is often overlooked but t is nevertheless important, also because it helps in understanding application of the theory: for example, in the following picture some words of a document corpus are represented, mapped by means of a Machine Learning algorithm to points in the plane:

These are points representing words. What about vectors in this case?

Let us take the vector which displaces “pizza” to “Italy” and let us apply it to the point “sushi”: we vet the point “Japan” (up to a certain approximation). We infer that that vector represents an answer to a question: “given a national dish which is the corresponding nation?”.

In this case, it is clear that points and vectors represent distinct concepts.

By applying the “arccosine” function, it is possible to compute this angle starting from the scalar product between two vectors and from their lengths: the geometrical meaning of angle for this quantity is explained via the trigonometric interpretation of the cosine function.

Recall that is a number greater or equal to -1 and less or equal 1 to , such that:

- If the two vectors have the same direction and the same sense then the cosine is equal to 1.
- If the two vector has the same direction ut opposite since then the cosine is equal to -1 .
- If the cosine is equal to zero, the two vectors are said to be
*orthogonal*.

Therefore, while the distance measures the nearness of two points, which the more are near the more their distance is zero, the cosine measures the similarity among directions and senses of the two vectors, so that the more the two vectors are aligned on the same line the more the absolute value of their cosine is near to one.

This cosine similarity is often employed in Machine Learning to classify objects in vector spaces.

*The N-dimensional Cartesian space*

So far, in the crash course on Cartesian geometry and linear algebra, we bounded ourselves to the case of dimension N=2 and N=3, so to be able to draw pictures and to develop the concept in a somewhat familiar environment: but in real life applications, the dimension of vector spaces may be quite high.

In those cases, we have to give up to geometric intuition, although *all formulas and concept developed so far generalize trivially to the case of higher dimensions*.

Indeed such formulas depend on sum and subtractions of points coordinates and vector components: if those coordinates are 2 or 20,000 it makes no real difference. Better still, all the theory is easily and efficiently implemented on a computer, which may deal with points and vectors in spaces of high dimensions without problems, since they are represented as arrays of numbers.

For example, let us consider the concept of a hyperplane: in the plane, this coincides with the concept of line and in the space with the concept of a plane.

Let us fix a dimension *N* and let us consider the Cartesian space **R**^{N}.

A line in this space is, as in dimension 2, determined by a pair of points, or by a point and a vector: the parametric equation is just the same as in the dimension 2 case.

If N > 2 we may also consider parametric equations of the following form:

In this case, we have two parameters which vary independently, so that, intuitively, the set of points *X* which satisfy this equation, when *a* and *b *vary in all real numbers, corresponds to the set of pairs (a,b) thus to the plane. In particular, it is a bidimensional object.

Actually this is not always true: for example if then the parametric equation becomes

which actually describes a line.

Therefore, if we write a parametric equation with many parameters, the dimension of the set of points described by this equation depends on the relations between the vectors which appear in the parametric equation itself. In the previous case, if vectors and are parallel then the equation represents a line and not a plane.

If in general, in the *N*-dimensional space, we write a parametric equation with *N – *1 parameters, we get:

*Matrices and their algebra*

A major feature of linear algebra is the efficiency and universality of its numerical methods. Actually, it suffices to implement a single algorithm (or one of its variants), namely Gauss’ elimination, to be able to do practically everything in an effective way (solving equations, checking linear independence, etc.). These algorithms are available in each standard numerical computation library, such as Python’s numpy.linalg.

To close this tutorial (already far too long) it is worth to introduce the key notion which is involved in each of those algorithms, and which is also crucial in the conceptual developments of linear algebra: the concept of a *matrix*.

A matrix is just a table of numbers, and it may be considered as a bidimensional array. More formally, an *n*×*m* matrix is a table of numbers which are singly addressed by means of two indexes, *i *and *j*, where the first one addresses the row and the second one addresses the column. At the crossing of row *I* and column *j,* there is the number which is pointed by those indexes (notice that in mathematics indexes usually runs from 1 on, not from 0 on as in computer science).

The tabular representation of a matrix is as follows:

A matrix such as *n* = *m *is said to be a square matrix.

In practice, a matrix is just a vector of length *nm* whose elements are displayed by rows and not as a single sequence. However, this notation change is fundamental to use those objects.

In particular, matrices enrich vector algebra with a new operation, a multiplication. In the first place let us notice that we may add matrices and multiply them by a single number to get again matrices of the same type:

Therefore, *n*×*m* matrices do form a vector space of dimension *nm*.

Recall that a vector may be multiplied by a number, to get a new vector and that two vectors may be multiplied to get a number (via the dot product). But we do not know how to multiply, say, a vector by itself to get a new vector.

If we write vector as matrices we can actually multiply them: indeed we may distinguish two kinds of vectors when written in matrix form, row vectors and column vectors. A row vector is a sequence of numbers, we have already met them, for example (1,2,0).

A column vector is a vector written from top to bottom as in

At first sight, this is just a matter of notation, but actually, if we interpret a vector as a particular kind of matrix, a row vector is a 1×*N* matrix, while a column vector is an *N*×1 vector.

Now, given an *n*×*m* matrix *A* and *n*×*r* matrix *B*, we may multiply *A *by *B* to get a new *n*×*r *matrix. The entry in the matrix *AB* addressed by indexes *i* and *j* is defined as:

Notice that this is the dot product of the row vector given by the *i-*th row in *A* by the column vector given by the *j*-th column in *B*.

Example: let us multiply a 2×3 matrix times a 2×3 matrix:

Now we come back to vectors: we may multiply a row vector times a column vector to get a 1×1 matrix (which is just a number) and this is the dot product. But we can also multiply an *N*×1 column vector times a 1×*N* row vector as matrices and get an *N*×*N* matrix, as in:

However in this way, we multiply two vectors belonging to *N*-dimensional spaces and we get a vector in a vector space with a different dimension, either 1 or *N*×*N*.

The *identity matrix *is the square matrix whose entries are zero but for the one on the diagonal which are 1 (diagonal elements in a matrix are the ones with the row index equal to the column index: ). For example the 3,,×3 identity matrix is

As the name suggests, on multiplying a matrix *A* times the identity matrix still we get *A*. Moreover, the matrix product is both associative and distributive with respect to matrix sum.

However, matrix algebra displays a particular and interesting feature: *the matrix product is not commutative in general*. Thus AB is generally different from BA (indeed BA may well be meaningless, for example, if *n *≠ *m*)*.*

For example:

Another typical operation is the multiplication of an *n*×*m* matrix times a column vector with *m* components: the result is a column vector with *n* components.