N-Dimensional Generalization of Linearized Gaussian Fitting

Papers by [Anthony-Granick] and [Wan-Wang-Wei-Li-Zhang] naturally focus on the two-dimensional case of Gaussian fitting presented in image analysis. In fact, neither paper regards axis cross-coupling terms in their representations, instead focusing on unrotated ellipsoids.

The goal of this segment is to demonstrate a solution with the cross-coupling term in two dimensions, and then generalize to arbitrary number of dimensions. The mathematics of this article is very basic, but the final result is both useful and interesting for practical applications.

Adding a Cross-Coupling Term

Both papers do not go past expressing a Gaussian as

(65)S(x, y) = \alpha e^{-\frac{1}{2} \left( \frac{(x - x_0)^2}{\sigma_x^2} + \frac{(y - y_0)^2}{\sigma_y^2} \right)}

The complete equation, with a rotated error ellipse requires an additional cross-coupling term \rho:

(66)S(x, y) = \alpha e^{-\frac{1}{2 (1 - \rho^2)} \left( \frac{(x - x_0)^2}{\sigma_x^2} - 2 \rho \frac{(x - x_0)(y - y_0)}{\sigma_x\sigma_y} + \frac{(y - y_0)^2}{\sigma_y^2} \right)}

This particular formulation is derived by inverting a 2x2 covariance matrix with cross-coupling term \rho:

(67)\Sigma_2 = \begin{bmatrix}
        \sigma_x^2             & \rho \sigma_x \sigma_y \\
        \rho \sigma_x \sigma_y & \sigma_y^2
    \end{bmatrix}

The determinant of this matrix is given by

(68)det(\Sigma_2) = (1 - \rho^2) \sigma_x^2 \sigma_y^2

The inverse covariance matrix is therefore

(69)\Sigma_2^{-1} = \frac{1}{(1 - \rho^2) \sigma_x^2 \sigma_y^2} \begin{bmatrix}
        \sigma_y^2              & -\rho \sigma_x \sigma_y \\
        -\rho \sigma_x \sigma_y & \sigma_x^2
    \end{bmatrix} = \frac{1}{(1 - \rho^2)} \begin{bmatrix}
        \frac{1}{\sigma_x^2}            & -\frac{\rho}{\sigma_x \sigma_y} \\
        -\frac{\rho}{\sigma_x \sigma_y} & \frac{1}{\sigma_y^2}
    \end{bmatrix}

It should be clear how this matrix translates to (66). If not, see the generalization in Expanding to Arbitrary Dimensions, which expounds the details more exhaustively.

If we linearize (66) by taking the logarithm of both sides and aggregate the coefficients, we get

(70)\text{log}(S_i) = A x_i^2 + B x_i y_i + C y_i^2 + D x_i + E y_i + F

Here S_i is shorthand for S(x_i, y_i), and the parameters have been combined as follows:

(71)A = & -\frac{1}{2 (1 - \rho^2) \sigma_x^2} \\
B = & \frac{\rho}{(1 - \rho^2) \sigma_x \sigma_y} \\
C = & -\frac{1}{2 (1 - \rho^2) \sigma_y^2} \\
D = & \frac{x_0}{(1 - \rho^2) \sigma_x^2} - \frac{\rho}{(1 - \rho^2) \sigma_x \sigma_y} \\
E = & \frac{y_0}{(1 - \rho^2) \sigma_y^2} - \frac{\rho}{(1 - \rho^2) \sigma_x \sigma_y} \\
F = & \text{log}(\alpha) - \frac{x_0^2}{2 (1 - \rho^2) \sigma_x^2} + \frac{\rho x_0 y_0}{(1 - \rho^2) \sigma_x \sigma_y} - \frac{y_0^2}{2 (1 - \rho^2) \sigma_y^2}

This equation is in a form suitable for a direct least-squares estimation of the parameter values, as long as we have at least six data samples (x_i, y_i, S_i) available to ensure a fully determined problem. Based on the papers that this algorithm is extending, we will want to add weights to the solution as well. The exact nature of the weights is out of scope for this article. Suffice it to say that we have individual weights w_i for each sample. Our goal is to find the projection that minimizes the error for the following matrices:

(72)\begin{bmatrix}
        w_1 \cdot x_1^2 & w_1 \cdot x_1 \cdot y_1 & w_1 \cdot y_1^2 & w_1 \cdot x_1 & w_1 \cdot y_1 & w_1 \\
        w_2 \cdot x_2^2 & w_2 \cdot x_2 \cdot y_2 & w_2 \cdot y_2^2 & w_2 \cdot x_2 & w_2 \cdot y_2 & w_2 \\
        \vdots          & \vdots                  & \vdots          & \vdots        & \vdots        & \vdots \\
        w_n \cdot x_n^2 & w_n \cdot x_n \cdot y_n & w_n \cdot y_n^2 & w_n \cdot x_n & w_n \cdot y_n & w_n
    \end{bmatrix}
\begin{bmatrix} A \\ B \\ C \\ D \\ E \\ F \end{bmatrix} =
\begin{bmatrix}
        w_1 \cdot \text{log}(S_1) \\
        w_2 \cdot \text{log}(S_2) \\
        \vdots \\
        w_n \cdot \text{log}(S_n)
    \end{bmatrix}

Any of the common linear algebra solvers should be able to solve this least squares problem directly.

Rather than attempting to express the six parameters x_0, y_0, \sigma_x, \sigma_y, \rho and A directly in terms of A, B, C, D, E, F, let us search for a solution for x_0, y_0, and the elements of \Sigma_2^{-1} as given in (69) instead. Not only will this be simpler, but the generalization to mutliple dimensions will be more apparent.

(73)\Sigma_2^{-1} = -\begin{bmatrix}
        2A &  B \\
         B & 2C
    \end{bmatrix}

We can notice that the coefficients for x_0 and y_0 in the equations for D and E have the form

(74)\begin{bmatrix} D \\ E \end{bmatrix} = & \begin{bmatrix}
        a & b \\
        b & c
    \end{bmatrix} \begin{bmatrix} x_0 \\ y_0 \end{bmatrix} \\
a = & \frac{1}{(1 - \rho^2) \sigma_x^2} \\
b = & -\frac{\rho}{(1 - \rho^2) \sigma_x \sigma_y} \\
c = & \frac{1}{(1 - \rho^2) \sigma_y^2}

Inverting this matrix and working out the terms divided by the determinant gives us

(75)\begin{bmatrix} x_0 \\ y_0 \end{bmatrix} = \Sigma_2 \begin{bmatrix} D \\ E \end{bmatrix}

Finally, we can solve for the amplitude \alpha by rewriting the equation for F as

(76)F = \text{log}(\alpha) - \frac{1}{2}\left(
        \begin{bmatrix} x_0 & y_0 \end{bmatrix}
        \Sigma_2
        \begin{bmatrix} x_0 \\ y_0 \end{bmatrix}
    \right)

The amplitude is given by

(77)\alpha = e^{F + \frac{1}{2}\left(
        \begin{bmatrix} x_0 & y_0 \end{bmatrix}
        \Sigma_2
        \begin{bmatrix} x_0 \\ y_0 \end{bmatrix}
    \right)}

We can extract \sigma_x, \sigma_y, and \rho from \Sigma_2, but as the next section shows, this is not practically necessary.

Expanding to Arbitrary Dimensions

A multivariate Gaussian is characterized by its amplitude \alpha, covariance matrix \Sigma, and location \vec{\mu}:

(78)S(\vec{x}) = \alpha e^{-\frac{1}{2} \left( \vec{x} - \vec{\mu} \right)^T \Sigma^{-1} \left( \vec{x} - \vec{\mu} \right)}

Since \Sigma is the positive definite matrix, we only need to specify the upper half of it. A least squares fit to an N-dimensional Gaussian will therefore always have \frac{N (N + 1)}{2} parameters from the covariance matrix, N from the location, and one from the amplitude, for a total of \frac{(N + 1) (N + 2)}{2} parameters. This is consistent with the six-parameter fit for a 2-dimensional Gaussian show in the previous section.

We can group our coefficients into two parameter vectors and a scalar: \vec{P} is the vector of coefficients of the covariance terms, \vec{Q} is the vector of coefficients for the location terms, and R determines the amplitude. For the two dimensional case shown in (71), we have

(79)\vec{P_2} = & \begin{bmatrix} A \\ B \\ C \end{bmatrix} \\
\vec{Q_2} = & \begin{bmatrix} D \\ E \end{bmatrix} \\
R_2 = & F

Since \vec{x} is a multidimensional quantity, let us denote the ith component with a left subscript: {}_i x. The matrix equation that generalizes (72) for N dimensions then becomes

(80)\begin{bmatrix}
        w_1 \cdot {}_1 x_1^2 & w_1 \cdot {}_1 x_1 \cdot {}_2 x_1 & \hdots &
            w_1 \cdot {}_2 x_1^2 &
            w_1 \cdot {}_2 x_1 \cdot {}_3 x_1 & \hdots &
            w_1 \cdot {}_N x_1^2 & w_1 \cdot {}_1 x_1 & \hdots &
            w_1 \\
        w_2 \cdot {}_1 x_2^2 & w_2 \cdot {}_1 x_2 \cdot {}_2 x_2 & \hdots &
            w_2 \cdot {}_2 x_2^2 &
            w_2 \cdot {}_2 x_2 \cdot {}_3 x_2 & \hdots &
            w_2 \cdot {}_N x_2^2 & w_2 \cdot {}_1 x_2 & \hdots &
            w_2 \\
        \vdots               & \vdots                            &        &
            \vdots               &
            \vdots                            &        &
            \vdots               & \vdots             &        &
            \vdots\\
        w_n \cdot {}_1 x_n^2 & w_n \cdot {}_1 x_n \cdot {}_2 x_n & \hdots &
            w_n \cdot {}_2 x_n^2 &
            w_n \cdot {}_2 x_n \cdot {}_3 x_n & \hdots &
            w_n \cdot {}_N x_n^2 & w_n \cdot {}_1 x_n & \hdots &
            w_n
    \end{bmatrix}
\begin{bmatrix} \vec{P} \\ \vec{Q} \\ R \end{bmatrix} =
\begin{bmatrix}
        w_1 \cdot \text{log}(S_1) \\
        w_2 \cdot \text{log}(S_2) \\
        \vdots \\
        w_n \cdot \text{log}(S_n)
    \end{bmatrix}

The solution is then a generalized form of each of (73), (75) and (77). First we ravel \vec{P} to make the inverse of \Sigma:

(81)\Sigma_N^{-1} = -\begin{bmatrix}
        2 P_1  & P_2       & P_3      & \hdots & P_N \\
        P_2    & 2 P_{N+1} & P_{N+2}  & \hdots & P_{2N-1} \\
        P_3    & P_{N+2}   & 2 P_{2N} & \hdots & P_{3N-3} \\
        \vdots & \vdots    & \vdots   & \ddots & \vdots \\
        P_N    & P_{2N-1}  & P_{2N-3} & \hdots & 2 P_{\frac{N(N+1)}{2}}
    \end{bmatrix}

We can then find the location from the \vec{Q} portion:

(82)\vec{\mu} = \Sigma_N \vec{Q}

And finally the amplitude:

(83)\alpha = e^{R + \frac{1}{2}\left( \vec{\mu}^T \Sigma_N \vec{\mu} \right) }