First-Order Least-Squares Fitting - An Overview

One particular type of data that I find I increasingly deal with is 3D point cloud data. Point cloud data is fairly simple to grasp, it’s merely a “cloud” of points which each have their own individual coordinates in a Cartesian (think X, Y, Z) system. If that still doesn’t clear it up, watch this excellent video which shows off a cool example of point cloud data to model a shipping gallery. The sources of such point cloud datasets vary, including laser scanners, range cameras, or even your standard close-range photogrammetric techniques. In this post, I’m going to examine some typical techniques for fitting first-order geometric shapes (lines, planes) to 3D point cloud data. It may get hairy with all the math involved, but I’ll try to keep the equations down where possible.

Typically, you won’t find too many Geomatics Engineers in the field who want to fit a plane to their data immediately. However, when discussing the calibration of different instruments (laser scanners, range cameras), and running through various performance assessments, being able to model how well a point cloud fits a plane or a line is an important tool that can provide a lot of information about the kinds of errors or modelling issues your equipment might be encountering. Even outside of Geomatics, finding the line of best fit (in 2D or 3D) or compressing the information of a planar point cloud into a single surface normal are useful techniques in analysing and expressing data. That said, let’s dive right in.

Least-Squares What?

I’ll admit: I got the original idea for this article after reading this NIST Journal Article. However, I did not feel that this method is particularly well documented or explained outside of academic papers, particularly in that it is more often discussed with regards to Singular Value Decompositions, as opposed to the much simpler, more direct approach I’ll show later. That just leaves the question, “What is Least-Squares anyways, and why would I want to use it to perform linear fitting?”

If you’re not already familiar with typical least-squares methods, don’t worry. The basic gist is that we want to fit some surface (in the case of a plane) or a line to our data while minimizing the sum of the squares of our errors. If you’ve ever done a “line of best fit” before, this is a pretty solid way of calculating one, though we’ll be doing it in three dimensions, instead of just two.

Calculating a 3D Line of Best Fit

Although mathematicians may not like to admit it, there are often many ways to express the same equation. Take for instance, the typical equation for a line:

$$y = f(x) = a x + b$$

We can also express this in matrix format, as follows:

$$ \left[ \begin{array}{ccc} x \\ y \end{array} \right] = \left[ \begin{array}{ccc} x_0 \\ y_0 \end{array} \right] + t \left[ \begin{array}{ccc} d_x \\ d_y \end{array} \right] $$

And, naturally, the corresponding equations for 3D lines:

$$z = f(x,y) = a x + b y + c$$
$$ \left[ \begin{array}{ccc} x \\ y \\ z \end{array} \right] = \left[ \begin{array}{ccc} x_0 \\ y_0 \\ z_0 \end{array} \right] + t \left[ \begin{array}{ccc} d_x \\ d_y \\ d_z \end{array} \right] $$

The second form probably looks odd if you haven’t done much in the way of linear algebra, but we basically are defining our line based on an origin point \((x_0, y_0, z_0)\), and a direction \((d_x, d_y, d_z)\). In both cases (2D and 3D), \(t\) is just the distance along the direction of the line we need to move. Since our directional cosine \(d\) is a unit vector, we need to scale it by the distance in order to complete our equation. For the purposes of this method, forget all about slopes and intercepts and embrace the second form of expressing a line.

Now obviously, if I tell you we have point cloud data, we’re not going to be solving for \((x_0, y_0, z_0)\). We can easily find the origin point of our line, but for simplicity’s sake let’s just define our origin as the mean X, Y, and Z values. In effect, this will mean that we’re translating our points to the origin of our coordinate system, which helps simplify some of the later math. Finally, in order to set up our least squares equations, let’s name the vector \((x, y, z)\) to be \(X\), and let’s name our origin point to be \(X_0\). We’ll derive the following distance equation as our model for relating the three vectors I listed above:

$$X = X_0 + t d$$
$$t = d (X - X_0)$$

or, turned around:

$$(X - X_0) d = t$$

In plain english, our distance is our directional cosine times the difference between our origin point and our observed point. Since we can simplify this equation further, why don’t we? We’ll label the matrix denoted by \((X - X_0)\) as A, but it will still be the difference between our observed points and our origin points. This is simply to keep my notation consistent with typical least squares literature, but trust me, it will be easier to recognise what I’m doing if I can simplify this a little.

Since we’re still defining our nomenclature, I’ll take the time to briefly touch on our matrix dimensions. First and foremost, it should be noted that least-squares is used in the solution of an over-constrained problem. This typically means that if we’re solving for 3 parameters (such as \(d_x, d_y, and d_z\)), we’ll typically want four or more observations in order to solve for those three parameters (NOTE: A unique solution can be found if we solve a system of 3 unknowns with 3 observations, but if we do so we won’t gain any information telling us how good our least-squares solution was). Thus, we’ll have the following extra definitions to consider as well:

  • \(n\): the number of observations. Should be equal to or higher than u.
  • \(u\): the number of unknowns. If u > n, then our solution is unsolvable.
  • \(A\): is an n x 3 matrix. Each row is the difference between our origin point and the corresponding observed point.
  • \(d\): is a u x 1 vector. This is what we’ll be solving for, and in this case u = 3.
  • \(t\): is an n x 1 vector.

With that out of the way, we now have our general parametric equation. If you’re familiar with least-squares, this equation will look incredibly familiar, and you’ll recall the typical solution as follows:

$$(A^T A) d = A^T b$$
$$d = (A^T A)^{-1} A^T b$$

Which works well, however, it does contain some issues. First and foremost, this solution relies on the proper multiplication and inversion of \(A^T A\). Unfortunately, this is an expensive computation, especially with a lot of points. More than that, inverting the matrix \(A^T A\) may not be numerically stable depending on how well conditioned our matrix is. I won’t go into explicit detail on that last point, but let’s face it, we can do better than computing \(A^T A\) every time we want to calculate a line of best fit.

The solution propsed in the NIST Journal Article I linked above is very straightforward, and goes like this:

  1. Compute the 3 x 3 covariance matrix \(C_x\) for \(A^T\)
  2. Perform an eigenvalue decomposition on the aforementioned covariance matrix
  3. Each of the resulting eigenvectors is a directional cosine vector. The eigenvector corresponding to the largest eigenvalue is the direction vector that is parallel to the primary direction of the points, while the eigenvector corresonding to the smallest eigenvalue is the direction vector that is perpendicular to the primary direction of the points.

We’ll also notice that since we’re operating on the covariance matrix rather than on \(A\), we no longer have to perform any matrix inversion, and likewise that \(C_v\) will at most be a 3 x 3 matrix, which means that our eigen-decomposition will be in closed form. This means we’ll gain the numerical stability in favour of not inverting \(A^T A\), and we’ll likewise gain a speed boost from not multiplying two (potentially large) matrices together. I’ve put together a small implementation in Python 3.4 (using Numpy), shown below:

#!/usr/bin/env python3
## 3D Least Squares Line Fitting
import numpy as np

obs_points = np.array([ [1.1   , 0.96 , 1.0026],
                        [1.5   , 1.53 , 1.44  ],
                        [2.03  , 1.99 , 2.0401],
                        [2.31  , 2.43 , 2.309 ],
                        [4.501 , 4.53 , 4.500 ],
                        [4.6   , 4.6  , 4.60  ],
                        [4.9   , 4.87 , 4.804 ],
                        [5.5   , 5.53 , 5.44  ] ])

C_x = np.cov(obs_points.T) # Note that here numpy does row-order to find covariance.
                           # If we don't do it this way, we'll get an 8 x 8 matrix
                           # instead of a 3 x 3 matrix.

# Note we're using eigh below, not eig. This is because C_x is a symmetrical (also
# called Hermitian) matrix, so eigh (eigen decomposition for hermitian matrix) is
# more appropriate
eig_vals, eig_vecs = np.linalg.eigh(C_x)

variance = np.max(eig_vals)
max_eig_val_index = np.argmax(eig_vals)
direction_vector = eig_vecs[:, max_eig_val_index].copy()

From this method, we gain two important pieces of information: the direction vector we were searching for (which combined with our origin point provides the full line equation), and the variance of our direction vector. The variance is nothing more than the sum of the squares of our residuals. I won’t go directly into the math behind proving that, but I would highly recommend the original NIST Article, which is far more rigorous than this humble blog is willing to go on this topic.

Calculating a 3D Plane of Best Fit

Notice that above I mentioned that the eigenvector corresonding to the smallest eigenvalue is the direction vector that is perpendicular to the primary direction of the points. Well, the easiest way to define a plane is through the surface normal. Since the surface normal is by definition perpendicular to the plane itself (and thus all the points contained within), we can thus use the exact same technique(!!) as in fitting a 3D line and merely choose the smallest eigenvalue/eigenvector pair instead. Thus, the code from above changes as follows:

#!/usr/bin/env python3
## 3D Least Squares Plane Fitting
import numpy as np

obs_points = np.array([ [1.1   , 0.96 , 0.0],
                        [1.5   , 1.53 , 0.0],
                        [2.03  , 1.99 , 0.0],
                        [2.31  , 2.43 , 0.0],
                        [4.501 , 4.53 , 0.0],
                        [4.6   , 4.6  , 0.0],
                        [4.9   , 4.87 , 0.0],
                        [5.5   , 5.53 , 0.0] ])

C_x = np.cov(obs_points.T)
eig_vals, eig_vecs = np.linalg.eigh(C_x)

variance = np.min(eig_vals)
min_eig_val_index = np.argmin(eig_vals)
direction_vector = eig_vecs[:, min_eig_val_index].copy()

Notice that above all of the points have the same Z value, and are thus in the Z-plane of our coordinate system. This should give us a direction vector for our surface normal of \((0,0,1)\), and indeed it does. If you want to take this exercise further, generate a random n x 3 matrix (let n be greater than 50000) as obs_points. Set obs_points[:,2] = 0, then test to see which method (the original or eigen-decomposition method) is faster. In both cases you should get a result where the direction vector for our surface normal is \((0,0,1)\).


The above method is a simple, elegant way of finding the directional cosine of best fit for a 3D line, or for a plane. Although I didn’t go into much detail expanding the article, the same principles apply for fitting 2D lines, 4D hyperplanes, etc. using the least-squares fitting methodology. In particular, you’ll notice that our matrix \(A\) was an n x 3 matrix which solved our line and plane in 3 dimensions. For 2 or 4 dimensions, you’ll want to create an n x 2 or n x 4 matrix respectively, and follow the same methodology through. I leave further implementations of these types as an exercise to the reader.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.