Jean Jacquelin

Sample applications to various functions:
Generalization of Sinusoidal Regression
Damped Sinusoidal Regression
Sum of Two Exponentials and Sum of Two Powers

[ First Edition : 14 January 2009 - Updated : 3 January 2014 ]

Translator’s Note

I came across this paper while searching for efficient, preferably non-iterative, routines for fitting exponentials. The techniques presented in this paper have served my purpose well, and this translation (and to some extent the entire scikit) were the result. I hope that my endeavors do justice to Jean Jacquelin’s work, and prove as useful to someone else as they did to me.

The paper translated here is a compilation of related original papers by the author, gathered into a single multi-chapter unit. Some of the original material is in French and some in English. I have attempted to translate the French as faithfully as I could. I have also attempted to conform the English portions to what I consider to be modern American usage.

There are certain to be some imperfections and probably outright errors in the translation. The translation is not meant to be a monolith. It is presented on GitHub so that readers so inclined can easily submit corrections and improvements.

A small number of technical corrections were made to the content of this paper throughout the translation. In all cases, the errata are clearly marked with footnotes with detailed descriptions of the fix in question. Errata are to be submitted to Jean Jacquelin before publication of this translation.

I have re-generated the plots and tables in the paper as faithfully as I could, rather than copying them out of the original. This has provided a measure of confidence in the results. There may be some slight disagreement in the randomly generated datasets, but the discrepancies are both expected and miniscule.

– Joseph Fox-Rabinovitz
23rd September 2018

Regressions and Integral Equations

Jean Jacquelin


The primary aim of this publication is to draw attention to a rarely used method for solving certain types of non-linear regression problems.

The key to the method presented in this paper is the principle of linearization through differential and integral equations, which enables the conversion of a complex non-linear problem into simple linear regression.

The calculus shown shown here has a fundamental difference from traditional solutions to similar problems in that it is non-recursive, and therefore does not require the usual iterative approach.

In order to demonstrate the theory through concrete application, detailed numerical examples have been worked out. Regressions of power, exponential, and logarithmic functions are presented, along with the Gaussian and Weibull distributions commonly found in statistical applications.

Regressions and Integral Equations

Jean Jacquelin

The first revision of the paper Regressions and Integral Equations was dated 01/14/2009.
The current version was updated on 04/27/2009.

1. Introduction

The study presented here falls into the general framework of regression problems. For example, given the coordinates of a sequence of n points: (x_1, y_1), (x_2, y_2), ..., (x_k, y_k), ..., (x_n, y_n), we wish to find the function y = F(a, b, c, ...; x) which lies as close as possible to the data by optimizing the parameters a, b, c, ...

The commonly known solution to linear regression merits only a brief discussion, which is to be found in Appendix 1. Some problems can be solved through linear regression even though they appear non-linear at first glance. The Gaussian cumulative distribution function is a specific example which is discussed in Appendix 2.

Excepting such simple cases, we are confronted with the daunting problem of non-linear regression. There is extensive literature on the subject. Even the most cursort review would derail us from the purpose of this paper. It is also unnecessary because our goal is to reduce non-linear problems to linear regression through non-iterative and non-recursive procedures (otherwise, how would our proposed method be innovative with respect to existing methods?).

In the next section, we will proceed to the heart of the matter: rendering non-linear problems to linear form by means of suitable differential and/or integral equations. The preliminary discussion will show that in the context of such problems, integral equations tend to be more numerically stable than differential equations, with very few exceptions.

The principle of using integral equations will be explained and demonstrated in practice using the Gaussian probability distribution function as a concrete example. Other examples of regression using integral equations will be described in a detailed manner in the two following papers:

2. Principle of Linearization Through Differential and/or Integral Equations

We begin with a summary of numerical methods for approximating derivatives and integrals. Given n points (x_k, y_k) located near the curve of a function y(x), and given another function g(x), we can calculate approximations for the following derivatives and integrals with g_k = g(x_k):

D_k = \frac{g_{k+1} y_{k+1} - g_{k-1} y_{k-1}}{x_{k+1} - x_{k-1}} \simeq
    \left( \frac{d}{dx} g(x) y(x) \right)_{\left( x = x_k \right)}

DD_k = \frac{D_{k+1} - D_{k-1}}{x_{k+1} - x_{k-1}} \simeq
    \left( \frac{d^2}{dx^2} g(x)y(x) \right)_{\left( x = x_k \right)}

And so on, for subsequent derivatives, as necessary.

S_k \simeq \int_{x_1}^x g(u)y(u)du \quad
    S_1 = 0 \hfill \text{and for $k = 2 \rightarrow n$:} & \\
    S_k = S_{k-1} +
        \frac{1}{2} (g_ky_k + g_{k-1}y_{k-1})(x_k - x_{k-1}) &

SS_k \simeq \int_{x_1}^x \left( \int_{x_1}^v g(u)y(u)du \right)dv \quad
    SS_1 = 0 \hfill \text{and for $k = 2 \rightarrow n$:} & \\
    SS_k = SS_{k-1} + \frac{1}{2} (S_k + S_{k-1})(x_k - x_{k-1}) &

And do on, for subsequent integrals, as necessary.

It goes without saying that the points must first be ranked in order of ascending x_k.

It is possible to use more sophisticated approximations for numerical differentiation and integration. Nothing prevents us from selecting a lower limit (or lower limits) of integration other than x_1, and using different limits for the successive integrations. However, that would complicate the formulas and explanations unnecessarily. For the sake of simplicity, let us agree to use these formulas, at least for this stage of the presentation.

Returning to our initial formulation of the problem, we wish to optimize the parameters a, b, c, ... of a function y(a, b, c, ...; x) so that its curve approaches the n points (x_k, y_k) as closely as possible. Evidently, the exact expressions of the derivatives and anti-derivatives of the function depend on the pameters a, b, c, .... However the approximate values calculated using the formulas shown above, i.e. the numerical values of D_k, DD_k, ..., S_k, SS_k, ..., are computed solely from the data points x_k, y_k, without requiring prior knowledge of a, b, c, .... This observation is the crux of the method that is to be shown.

Let us suppose that the function y(a, b, c, ...; x) is the solution to a differential and/or integral equation of the form:

y(x) = A \; \Phi(x) + B\int G(x)y(x)dx + C\int\int H(x)y(x)dx^2 + ...
     + \alpha\frac{d}{dx}g(x)y(x) + \beta\frac{d^2}{dx^2}h(x)y(x) + ...

with \Phi(x), G(x), H(x), ..., g(x), h(x), ... predetermined functions independent of a, b, c, ..., and A, B, C, ..., \alpha, \beta, ... dependent on a, b, c, .... The approximate values are then respectively (with \Phi_k = \Phi(x_k); G_k = G(x_k); H_k = H(x_k); ...)[errata-reei-1]:

D_k = \frac{g_{k+1}y_{k+1} - g_{k-1}y_{k-1}}{x_{k+1} - x_{k-1}}

DD_k = \frac{\Delta_{k+1} - \Delta_{k-1}}{x_{k+1} - x_{k-1}}
\quad \text{with} \quad
\Delta_k = \frac{h_{k+1}y_{k+1} - h_{k-1}y_{k-1}}{x_{k+1} - x_{k-1}}

S_1 = 0 \quad ; \quad S_k = S_{k-1} +
\frac{1}{2}\left(G_ky_k + G_{k-1}y_{k-1}\right)\left(x_k - x_{k-1}\right)

    SS_1 = 0 \quad ; \quad SS_k = SS_{k-1} +
    \frac{1}{2}\left(\Xi_k + \Xi_{k-1}\right)\left(x_k - x_{k-1}\right) \\
    \text{with: } \Xi_k = 0 \quad ; \quad \Xi_k = \Xi_{k-1} +
    \frac{1}{2}\left(H_ky_k + H_{k-1}y_{k-1}\right)\left(x_k - x_{k-1}\right)

If we replace the exact derivatives and/or anti-derivatives with their numerical approximations, the equation will no longer hold true. We can therefore work with the sum of the squares of the residuals:

\sum_{k=1}^n \varepsilon_k^2 =
    \sum_{k=1}^n \left( -y_k + A \; \Phi_k + B \; S_k + C \; SS_k + ...
    + \alpha \; D_k + \beta \; DD_k + ... \right)^2

The relationship is linear with respect to A, B, C, ..., \alpha, \beta, .... We have therefore returned to classical linear regression, which allows us to calculate the optimal values of A_0, B_0, C_0, ..., \alpha_0, \beta_0, .... Finally, since A, B, C, ..., \alpha, \beta, ... are known functions of a, b, c, ..., we must solve the system of equations A(a, b, c, ...) = A_0 \; ; \; B(a, b, c, ...) = B_0 \; ; \; ... \; ; \; \alpha(a, b, c, ...) = \alpha_0 \; ; \; \beta(a, b, c, ...) = \beta_0 to obtain the optimal values of the parameters a, b, c, ....

There are some additional considerations that must be taken into account when choosing the differental and/or integral equation. Other than the requirement for linearity in its coefficients (but not necessarily in the functions, since we have the choice of G(x), H(x), ..., g(x), h(x), ...), the equation should preferably have as many coefficients A_0, B_0, ..., \alpha_0, \beta_0, ... as there are initial parameters a, b, c, ... to optimize. If there are fewer coefficients, an additional regression (or regressions) will be necessary to calculate the coefficients that do not figure explicitly in the equation.

Moreover, to avoid an overburdened explanation, we have been considering a simplified form of differential and/or integral equation. In fact, the equation could have any number of different functions \Phi(x), several different derivatives (corresponding to various choices of g(x)), several different integrals (corresponding to various choice of G(x)), and so on for subsequent multiple derivatives and integrals.

Clearly, there is a multitude of choices for the differential and/or integral equation that we bring to bear on the problem. However, practical considerations limit our choices. One of the main stumbling blocks is the difficulty inherent in numerical approximation of derivatives. In fact, in cases where the points have an irregular distribution, and are too sparse, and if, to make matters worse, the values of y_k are not sufficiently precise, the computed derivatives will fluctuate so much and be so dispersed as to render the regression ineffective. On the other hand, numerical approximations of integrals retain their stability even in these difficult cases (this does not mean that the inevitable deviations are insignificant, but that they remain damped, which is essential for the robustness of the overall process). Except in special cases, the preference is therefore to seek an integral equation rather than a differential one.

The generality of the presentation that has just been made may give the impression that the method is complicated and difficult to implement in practice. The fact of the matter is quite the opposite, as we will see once we shift focus from an the abstract discussion of many possibilities to solving a single concrete example.

One of the most spectacular examples is that of sinusoidal regression (which we only mention in passing, without going into depth here, but which will be treated in detail in the attached paper Regression of Sinusoids). It concerns the optimization of the parameters a, b, c and \omega in the equation:

y(x) = a + b \; \text{sin}(\omega \; x) + c \; \text{cos}(\omega \; x)

This function is the solution to the differential equation:

y(x) = A + B \frac{d^2y}{dx^2} \quad \text{with} \quad
    A = a \quad \text{and} \quad B = -\frac{1}{\omega^2}

This is a linear equation with respect to A and B, which are themselves (very simple) functions of a and \omega. Moreover, the parameters b and c no longer figure in the differential equation directly. This case is therefore a typical application of the method, and among the easiest to implement, except that it contains a second derivative, which makes it virtually useless. Fortunately, there is no a-priori reason not to use an integral equation whose solution is a sinusoid instead. The integral method is hardly any more complicated and gives largely satisfactory results (which are studied in detail in the attached paper: Regression of Sinusoids).

For our first demonstration of the method, let us look for a simpler example. In the following section, the method of regression through an integral equation will be applied to the Gaussian probability density function.

3. Example: Illustration of the Gaussian Probability Density Function

We consider a probability density function of two parameters, \sigma and \mu, defined by

(1)f(x) = \frac{1}{\sigma \sqrt{2 \pi}} \; \text{exp} \left(
    -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2

The general notation y(x) of the previous sections is replaced with f(x) here due to the specificity of this case.

The integration (2) leads to the integral equation (3), of which f(x) is the solution[errata-reei-2]:

(2)\int_{x_1}^x \left( t - \mu \right) f(t)dt =
    -\sigma^2 \left( f(x) - f(x_1) \right)

    f(x) - f(x_1) = A \int_{x_1}^x f(t)dt +
        B \int_{x_1}^x t \; f(t)dt \\
    \text{with:} \quad A = \frac{\mu}{\sigma^2} \quad
    \text{and} \quad B = -\frac{1}{\sigma^2}

This is a linear integral equation, consisting of two simple integrals, which places it among the applications mentioned in the previous section. We compute the respective approximations, the first denoted S with G(x) = 1, and the second denoted T with G(x) = x:

    S_1 = 0 \\
    S_k = S_{k-1} +
        \frac{1}{2}\left(f_k + f_{k-1}\right)
        \left(x_k - x_{k-1}\right) \quad k = 2 \rightarrow n

    T_1 = 0 \\
    T_k = T_{k-1} +
        \frac{1}{2}\left(x_k f_k + x_{k-1} f_{k-1}\right)
        \left(x_k - x_{k-1}\right) \quad k = 2 \rightarrow n

When we replace f(x_k) with f_k, f(x_1) with f_1 and the integrals with S_k and T_k, respectively, equation (3) no longer holds true. We seek to minimize the sum of the squares of the residuals:

(6)\sum_{k=1}^n \varepsilon_k^2 =
    \sum_{k=1}^n \left(
        -\left( f_k - f_1 \right) + A \; S_k + B \; T_k

Notice that had we chosen a lower limit of integration different from x_1, it would have changed not only the value of f_1, but also the numerical values of S_k and T_k, in a way that would cancel out the differences without changing the final result.

The relationship (6) is none other than the than the equation of a linear regression, which we know how to optimize for the parameters A_1 and B_1[errata-reei-3]:

(7)\begin{bmatrix}A_1 \\ B_1\end{bmatrix} =
    \sum \left(S_k\right)^2 & \sum S_k T_k \\
    \sum S_k T_k            & \sum \left(T_k\right)^2
    \sum \left(f_k - f_1\right) S_k \\
    \sum \left(f_k - f_1\right) T_k

By convention, \sum \equiv \sum_{k=1}^n. We then deduce \sigma_1 and \mu_1 according to (3)[errata-reei-4]:

(8)\sigma_1 = \sqrt{-\frac{1}{B_1}} \quad ; \quad \mu_1 = -\frac{A_1}{B_1}

Here is a summary of the numerical computation:

Data: (x_1, f_1), (x_2, f_2), ..., (x_k, f_k), ..., (x_n, f_n)

Result: \sigma_1 and \mu_1 are the approximations of \sigma and \mu.

To illustrate the calculation (Fig. 1), numerical data (Table 1) was genererated in the following manner: x_k values were chosen at random from the domain under consideration. From the “exact” values \sigma_e and \mu_e (defining the “exact” function f(x), whose representative curve is plotted as a dashed line in Fig. 1), we computed the exact values of f(x_k) given by equation (1)[errata-reei-7]. We then added perturbations whose amplitude was drawn randomly from a range - to +10% of f(x_k), which, after rounding, gave us the numerical values of f_k in Table 1.

The outrageous error model is motivated by the need for legibility in the figure, so that the so called “experimental” points, represented by crosses, lie far enough from the dashed curve to be clearly distinguishable. In the same vein, only a handful of points was chosen so that the deviations between the “exact” dashed curves and the calculated solid curves are are highlighted for both the intermediate and the final calculation. The fact that the points are not uniformly distributed across the domain is also a significant complication.


Fig. 1 A sample regression of the Gaussian probability density function.

Table 1 Numerical values corresponding to the example in Fig. 1.
k x_k f_k S_k T_k  
1 -0.992 0.238 0 0  
2 -0.935 0.262 0.01425 -0.0137104 \sigma_e = 0.4
3 -0.836 0.38 0.046029 -0.0415616 \mu_e = -0.3
4 -0.404 1.04 0.352965 -0.201022  
5 -0.326 0.922 0.429522 -0.229147 \sigma_1 = 0.383915
6 -0.042 0.755 0.667656 -0.276331 \mu_1 = -0.289356
7 0.068 0.589 0.741576 -0.275872  
8 0.302 0.34 0.850269 -0.259172  
9 0.439 0.193 0.88678 -0.246335  
10 0.58 0.083 0.906238 -0.236968  

In Fig. 1, the shape of the curves of the “exact” integrals and the points \left(x_k, S_k\right) and \left(x_k, T_k\right) make the primary reason for the deviations in this method of calculation clearly apparent: numerical integration, while preferable to derivation, is not by any means perfect, and causes the deviations in \left(\sigma_1, \mu_1\right).

To form an objective opinion about the qualities and defects of the method exposed here, it would be necessary to perform a systematic experimental study of a very large number of cases and examples. In the current state of progress of this study, this remains yet to be done.

It is certain that the deviations, caused by the defects inherent in numerical integration, would be considerably reduced if the points were sufficiently numerous and their abscissae were partitioned at sufficiently regular intervals.

4. Discussion

It would be unreasonable to imagine that the method presented here could replace what is currently in use in commercial software, which has the benefits of long term study, experimentation and improvements. We must ask then, what is the motivation behind this work.

Of course, recursive methods generally require a first approximation within the same order of magnitude as the target value. This is not generally a handicap since users are not completely in the dark. One might be tempted to consider this method of integral equations to satisfy the need for initial approximation. However, the need is quite marginal, and should not be seen as a serious motivation.

A simple method, easy to program, like the one shown here, might certainly seduce some potential users in particular situations where we seek full mastery over the calculations that are performed. Users of commercial software are usually satisfied with the results they provide, but may occasionally deplore not knowing precisely what the sophisticated software is doing. Nevertheless, it would be poor motivation indeed for this study to attempt to provide a less powerful tool than what already exists, just to aleviate some of the frustration caused by using tools whose exact mechanisms are unknown.

In fact, we must see in this paper not a specific utilitarian motivation in the case of the Gaussian distribution, but the intention to understand and draw attention to a more general idea: the numerous possibilities offered by integral equations to transform a problem of non-linear regression into a linear regression, computed through a non-iterative process.

It is out of the question to compete against what already exists, and what is more imporant, works well. On the other hand, it would be a pity to forget a method that might potentially help resolve future problems: the method that has been the subject of this paper, whose essence is presented in Section 2.

Appendix 1: Review of Linear Regression

When the function that we seek to optimize, y = F(a, b, c, ...; x), can be written in the form y = a \; f(x) + b \; g(x) + c \; h(x) + ..., according to some number of parameters a, b, c, ... and with known functions f(x), g(x), h(x), ..., the algorithm is linear with respect to the optimization parameters.

Even more generally, if the function y = F(a, b, c, ...; x) can be transformed into F(x, y) = A \; f(x, y) + B \; g(x, y) + C \; h(x, y) with known functions F(x, y), f(x, y), g(x, y), h(x, y), ..., A(a, b, c, ...), B(a, b, c, ...), C(a, b, c, ...), ... the algorithm is again linear with respect to the coefficients A, B and C, even if it is not linear with respect to a, b, c, .... But it always reverts to a linear regression. The “least squares” method effectively consists of finding the minimum of:

    \varepsilon^2_{\left(A, B, C, ...\right)} =
        \sum_{k=1}^n \left(
            F_k - \left( A \; f_k + B \; g_k + C \; h_k + ... \right)
        \right)^2 \\
    F_k \equiv F(x_k, y_k) \; ; \; f_k \equiv f(x_k, y_k) \; ; \;
        g_k \equiv g(x_k, y_k) \; ; \; h_k \equiv h(x_k, y_k)

The partial derivatives with respect to A, B, C, ... determine a system of equations for which solutions, A_0, B_0, C_0, ... are optimal:

        \frac{\partial \left(\varepsilon^2 \right)}{\partial A}
    \right)_{A_0, B_0, C_0, ...} = -\sum_{k=1}^n \left(
        F_k - \left( A_0 \; f_k + B_0 \; g_k + C_0 \; h_k, ... \right)
    \right) f_k = 0 \\

        \frac{\partial \left( \varepsilon^2 \right)}{\partial B}
    \right)_{A_0, B_0, C_0, ...} = -\sum_{k=1}^n \left(
        F_k - \left( A_0 \; f_k + B_0 \; g_k + C_0 \; h_k, ... \right)
    \right) g_k = 0 \\

        \frac{\partial \left( \varepsilon^2 \right)}{\partial C}
    \right)_{A_0, B_0, C_0, ...} = -\sum_{k=1}^n \left(
        F_k - \left( A_0 \; f_k + B_0 \; g_k + C_0 \; h_k, ... \right)
    \right) h_k = 0 \\


The solution to this system, conventionally written with \sum \equiv \sum_{k=1}^n leads to:

    A_0 \\ B_0 \\ C_0 \\ ...
\end{bmatrix} =
    \sum f_k^2   & \sum f_k g_k & \sum f_k h_k & ... \\
    \sum f_k g_k & \sum g_k^2   & \sum g_k h_k & ... \\
    \sum f_k h_k & \sum g_k h_k & \sum h_k^2   & ... \\
    ...          & ...          & ...          & ...
    \sum F_k f_k \\ \sum F_k g_k \\ \sum F_k h_k \\ ...

Then we obtain the optimized values of a, b, c, ... corresponding to the following system, where the unknowns are a_0, b_0, c_0, ...:

    A(a_0, b_0, c_0, ...) = A_0 \\
    B(a_0, b_0, c_0, ...) = B_0 \\
    C(a_0, b_0, c_0, ...) = C_0 \\

which is a system that is non-linear in the same measure that the functions A(a, b, c, ...), B(a, b, c, ...), C(a, b, c, ...), ... are non-linear. But this does not prevent the regression that was performed from being linear, so even this case has its rightful place in this section.

Of course, this can be further extended by considering more variables, for example x, y, z, t, ..., instead of just x, y, thus working in 3D, or 4D, …, instead of 2D. Everything mentioned here figures in literature in a more detailed and more structured manner, with presentations adapted to the exposition of general theory. The purpose here was only to present a brief review, with a specific notation to be used consistently throughout the remainder of the work.

Appendix 2: Linear Regression of the Gaussian Cumulative Distribution Function

We consider the cumulative Gaussian distribution function of two parameters, \sigma and \mu, defined by[errata-reei-8]:

(9)F(x) = \frac{1}{\sqrt{2 \pi} \; \sigma}
    \int_{-\infty}^x \text{exp} \left(
        -\frac{1}{2} \left( \frac{t - \mu}{\sigma}\right)^2
    \right) dt

An example is shown in Fig. 2 (the dashed curve).


Fig. 2 Sample regression of the Gaussian cumulative distribution function.

Table 2 Numerical values corresponding to the example in Fig. 2[errata-reei-9].
k x_k F_k \text{argErf}(2 F_k - 1)  
1 -0.914 0.001 -2.18512  
2 -0.556 0.017 -1.49912 \sigma_e = 0.4
3 -0.49 0.021 -1.43792 \mu_e = 0.3
4 -0.195 0.097 -0.918416  
5 0.019 0.258 -0.459283 \sigma_1 = 0.374462
6 0.045 0.258 -0.459283 \mu_1 = 0.266843
7 0.587 0.704 0.378967  
8 0.764 0.911 0.952429  
9 0.81 0.911 0.952429  
10 0.884 0.979 1.43792  

The data are the points that we call “experimental”: (x_1, F_1), (x_2, F_2), ..., (x_k, F_k), ..., (x_n, F_n), which, in Fig. 2[errata-reei-10] have a particular dispersion relative to their respective nominal positions \left(x_k, F(x_k)\right) on the dashed curve representing F(x).

We can write F(x) equivalently in terms of the Erf funcion, pronounced “error function” and defined by:

(10)\text{erf}(z) = \frac{2}{\sqrt{\pi}}
    \int_0^z \text{exp} \left( -\tau^2 \right) d\tau

The change of variable t = \mu + \sqrt{2} \; \sigma \; \tau in (9) gives the relationship:

(11)F(x) = \frac{1}{\sqrt{\pi}}
    \int_{-\infty}^{\frac{x - \mu}{\sqrt{2} \sigma}}
        \text{exp} \left( -\tau^2 \right) d\tau =
    \frac{1}{2} + \frac{1}{2} \text{erf} \left(
        \frac{x - \mu}{\sqrt{2} \sigma}

The inverse function to Erf is designated Erf(-1), or Erfinv or argErf. We will use the last notation.

Thus, the inverse relationship to (11) can be written:

(12)\frac{x - \mu}{\sqrt{2} \sigma} = \text{argErf}(2F(x) - 1)

Which in turn leads to a linear relationship in A and B, defined by:

(13)y(x) = \text{argErf}(2F(x) - 1) = A \; x + B \quad
    A = \frac{1}{\sqrt{2} \sigma} \\
    B = -\frac{\mu}{\sqrt{2} \sigma}

This reduces to the most elementary form of linear regression, relative to the points (x_k, y_k) with y_k previously calculated from:

(14)y_k = \text{argErf}(2F_k - 1)

The optimal values of A_1 and B_1 are the solutions to the following system:

(15)\begin{bmatrix} A_1 \\ B_1 \end{bmatrix} =
    \sum \left(x_k\right)^2 & \sum x_k \\
    \sum x_k                & n
    \sum y_k x_k \\
    \sum y_k

with the convention that \sum \equiv \sum_{k=1}^n. We then deduce \sigma_1 and \mu_1 from (13):

(16)\sigma_1 = \frac{1}{\sqrt{2} A_1} \quad ; \quad \mu_1 = -\frac{B_1}{A_1}

For the example shown, the numerical values of \sigma_1 and \mu_1 that we obtain are shown in Table 2[errata-reei-9]. The curve of the corresponding function is marked with a solid line in Fig. 2 [errata-reei-11]. It neighbors the “theoretical” curve, which is dashed.

In fact, the example was intentionally chosen to have a very small number of widely dispersed points, so as to make the two curves clearly distinct from each other. This is depreciative rather than representative of the quality of fit that can usually be obtained.

Here is a summary of the very simple numerical computation:

Data: (x_1, F_1), (x_2, F_2), ..., (x_k, F_k), ..., (x_n, F_n)

  • Compute y_k: y_k = argErf(2F_k - 1)

  • Compute \sum x_k, \sum \left(x_k\right)^2, \sum y_k, \sum y_k x_k

  • Compute A_1 and B_1:

    \begin{bmatrix} A_1 \\ B_1 \end{bmatrix} =
    \sum \left(x_k\right)^2 & \sum x_k \\
    \sum x_k                & n
    \sum y_k x_k \\
    \sum y_k

  • Compute \sigma_1 and \mu_1: \sigma_1 = \frac{1}{\sqrt{2} A_1} \quad ; \quad \mu_1 = -\frac{B_1}{A_1}

Result: \sigma_1 and \mu_1 are the approximations of \sigma and \mu.

If a software package implementing argErf is not readily available, a sample listing for the functions Erf and argErf is shown in the next section.

Note: The material in Appendices 1 and 2 is well known. All the same, it was useful to highlight the fundamental differences between the regression problems reviewed in Appendix 1 and those in Section 2 of the main paper. It is equally useful to provide examples that showcase the noteworthy differences between the application of regression to the Gaussian distribution, on one hand to the cumulative function (Appendix 2) and on the other to the more difficult case of the density function (Section 3 of the main text).

Listings for the functions Erf and argErf

The approximate values of Erf(x) are obtained with a minimum of eight significant digits of precision. We use the following series expansion:

\text{Erf}(x) \simeq \frac{2x}{\sqrt{\pi}}
    \sum_{k=0}^{30} \frac{(-1)^k x^{2k}}{k!(2k + 1)}
    \left| x \right| < 2.7 \\
    \left| \text{Erf}(x) \right| < 0.999866

completed by the asymptotic expansion:

\text{Erf}(x) \simeq \pm 1 - \frac{e^{-x^2}}{x \sqrt{\pi}}
    \sum_{k=0}^5 \frac{(-1)^k (2k + 1)!!}{x^{2k}}
    + \enspace \text{if} \enspace x > 2.7 \enspace ;
        \enspace - \enspace \text{if} \enspace x < 2.7 \\
    (2k + 1)!! = 1 * 3 * ... * (2k + 1) \\
    0.999865 < \left| \text{Erf}(x) \right| < 1

The inverse function \text{argErf}(y) is calculated using Newton-Raphson’s method. The result is obtained to at least eight significant digits of precision if \left| y \right| < 0.999999999998 \rightarrow \left| \text{argErf}(y) \right| < 5. Outside that domain, the result is insignificant.

The following listing is written in Pascal, and uses only the most elementary contructs and syntax. It should not be difficult to translate into any other language of choice.

Function Erf(x:extended):extended;
     if ((x>-2.7)and(x<2.7)) then
          for k:=0 to 30 do
     end else
          for k:= 0 to 5 do
          if x>0 then y:=1-y else y:=-1-y;
Function argErf(y:extended):extended;
     for k:=1 to 30 do

We can test the computation by comparing against the values in the table below (as well as the negatives of the same values):

x = \text{argErf}(y) \text{Erf}(x) = y
0.001 0.001128378791
0.1 0.112462916
1 0.8427007929
2 0.995322265
2.699 0.9998648953
2.701 0.998664351
4 0.9999999846
5 0.9999999999984

Non-Linear Regression of the Types: Power, Exponential, Logarithmic, Weibull

Jean Jacquelin

The first revision of this paper was dated 01/18/2009.
The current version was updated on 04/23/2009.


The parameters of power, exponential, logarithmic and Weibull functions are optimized by a non-iterative regression method based on an appropriate integral equation.

Non-Linear Regression of the Types: Power, Exponential, Logarithmic, Weibull

Jean Jacquelin

1. Introduction

The following two examples of regression will be treated simultaneously:

(17)y = a + b \; X^c \quad (X > 0)

(18)y = a + b \; \text{exp}(c \; x)

Indeed, if we take formula (17), with data points (X_1, y_1), ..., (X_k, y_k), ..., (X_n, y_n), we can pre-calculate

(19)x_k = \text{ln}(X_k)

which turns into (18) with the data points (x_1, y_1), ..., (x_k, y_k), ..., (x_n, y_n).

Various other equations can be transformed into these two formulae:

  • The equation y = a + b' \; \text{exp}(c \; (x - \mu)) is identical to (18) with the substitution b = b' \; \text{exp}(-c \; \mu).

  • The equation y = \alpha + \beta \; \text{ln}(x - \gamma) turns into (18) when we interchange the x and y values. This motivates the regression of logarithmic functions of three parameters.


    Add to scikit

  • And so on. In particular, the case of the Weibull function of three parameters will be treated in Section 3.

The method to be used was described in Section 2 of the article Regressions and Integral Equations.

2. Regression of Functions of the Form y(x) = a + b \; \text{exp}(c \; x)

Integating the function y(x) results in:

(20)\int_{x1}^x y(u)du = a \; (x - x_1) +
    \frac{b}{c} \; \text{exp}(c x) - \frac{b}{c} \; \text{exp}(c x_1)

Replacing \text{exp}(c \; x) with (18):

(21)\int_{x1}^x y(u)du = a \; (x - x_1) +
    \frac{1}{c} \; (y - a) - \frac{b}{c} \; \text{exp}(c \; x_1)

From which we have the integral equation to be used:

(22)y - (a + b \; \text{exp}(c \; x_1)) =
    -a \; c \; (x - x_1) + c \int_{x_1}^x y(u)du

The numerical approximations of the integral for x = x_k are calculated with:

    S_1 = 0 \quad \text{and for} \enspace
        k = 2 \rightarrow n \enspace \text{:} \\
    S_k = S_{k-1} + \frac{1}{2}(y_k + y_{k-1})(x_k - x_{k-1})

When we replace the exact unknowns in (22) with their respective approxmations, the equation no longer holds true:

(24)y_k - y_1 \simeq -a \; c \; (x_k - x_1) + c \; S_k

Minimizing the sum of the squared residuals:

(25)\sum_{k=1}^n \varepsilon_k^2 = \sum_{k=1}^n
    \left( A \; (x_k - x_1) + B \; S_k - (y_k - y_1) \right)^2


(26)A = -a \; c \quad ; \quad B = c

results in a linear regression with respect to the coefficients A and B, whose optimal values A_1 and B_1 can be obtained in the usual manner (with the convention that \sum \equiv \sum_{k=1}^n):

(27)\begin{bmatrix}A_1 \\ B_1\end{bmatrix} =
    \sum \left((x_k - x_1)^2\right) & \sum (x_k - x_1) \; S_k \\
    \sum (x_k - x_1) \; S_k         & \sum \left(S_k^2\right)
    \sum (y_k - y_1)(x_k - x_1) \\
    \sum (y_k - y_1) \; S_k

We the get the optimal values a_1 and c_1 according to (26):

(28)a_1 = -\frac{A_1}{B_1} \quad ; \quad c_1 = B_1

Only two parameters are resolvable from the form of integral equation that we chose. The third parameter appears in the numerical calculations, but not directly in the equation, so another regression is necessary to obtain it. In fact, considering (18), the result would be further improved by computing the regression on both a and b:

(29)\sum_{k=1}^n \varepsilon_k^2 =
    \sum_{k=1}^n \left( a + b \; \text{exp}(c_1 \; x_k) - y_k \right)^2


(30)c_2 = c_1 \quad ; \quad \theta_k = \text{exp}(c_2 \; x_k)

the resulting regression is:

(31)\begin{bmatrix}a_2 \\ b_2\end{bmatrix} =
    n             & \sum \theta_k \\
    \sum \theta_k & \sum \theta_k^2
    \sum y_k \\
    \sum y_k \theta_k

Here is a summary of the computation:

Data in the case y = a + b \; X^c

(X_1, y_1), (X_2, y_2), ..., (X_k, y_k), ..., (X_n, y_n)

  • Compute x_k = \text{ln}(X_k)

Data in the case y = a + b \; \text{exp}(c \; x)

(x_1, y_1), (x_2, y_2), ..., (x_k, y_k), ..., (x_n, y_n)

  • Rank points in order of ascending x_k
  • Compute S_k according to (23).
  • Compute \sum (x_k - x_1)^2, \sum (x_k - x_1) \; S_k, \sum S_k^2, \\ \sum (y_k - y_1)(x_k - x_1), \sum (y_k - y_1) \; S_k
  • Compute B_1 from system (27)
  • With c_1 = c_2 = B_1, compute \theta_k (from relationships in (28) and (30))
  • Compute \sum \theta_k, \sum \left(\theta_k\right)^2, \sum y_k, \sum y_k \theta_k
  • Compute a_2 and b_2 from system (31).

Result: a_2, b_2 and c_2 are the approximations of a, b and c.

To illustrate this calculation (Fig. 3), numerical data were generated in the following manner: x_k were drawn at random from the domain under consideration. Initially, “exact” values a_e, b_e and c_e defined the function y(x), which we call the “true” form of equation (18), and whose curve is traced by a dotted line in the figure. The exact values of y(x_k) were then assigned perturbations of an amplitude drawn at random from a range between - and +10% of y(x_k), which, after rounding, produced the numrical values of y_k in Table 3, represented by crosses in the figure.

Finally, the result (a_2, b_2, c_2) was substituted into equation (18), to obtain the fitted curve, plotted as a solid line.


Fig. 3 A sample regression for the function y = a + b \; \text{exp}(c \; x)

Table 3 Numerical values corresponding to the example in Fig. 3.
k x_k y_k S_k  
1 -0.99 0.418 0  
2 -0.945 0.412 0.018675 a_e = 0.3
3 -0.874 0.452 0.049347 b_e = 0.6
4 -0.859 0.48 0.056337 c_e = 1.7
5 -0.64 0.453 0.1585  
6 -0.573 0.501 0.19046 a_2 = 0.313648
7 -0.433 0.619 0.268859 b_2 = 0.574447
8 -0.042 0.9 0.565824 c_2 = 1.716029
9 -0.007 0.911 0.597517  
10 0.054 0.966 0.654765  
11 0.088 0.966 0.687609  
12 0.222 1.123 0.827572  
13 0.401 1.414 1.05463  
14 0.465 1.683 1.15374  
15 0.633 2.101 1.47159  
16 0.637 1.94 1.47968  
17 0.735 2.473 1.69591  
18 0.762 2.276 1.76002  
19 0.791 2.352 1.82713  
20 0.981 3.544 2.38725  

To form an objective opinion of the qualities and defects of the method presented here, it would be necessary to perform a systematic study of a very large number of cases and examples. It is certain even now that the deviations caused by errors in numerical integration would be considerably reduced as the number of points increases and their spacing becomes more uniform.

3. Regression of the Three-Parameter Weibull Cumulative Distribution Function

The Weibull cumulative distribution function of three parameters (\alpha, \beta, and \mu) is defined by:

(32)F(t) = 1 - \text{exp} \left(
    -\left( \frac{t - \mu}{\beta} \right)^\alpha

For data given as (t_1, F_1), ..., (t_k, F_k), ..., (t_n, F_n), we seek to optimize \mu, \beta, and \alpha in such a way that relationship (32) is approximately and optimally satisfied across the n data points.

The inverse function of (32) is:

(33)t = \mu + \beta \left( -\text{ln}(1 - F) \right)^{\frac{1}{\alpha}}


(34)x = \text{ln} \left( -\text{ln}(1 - F) \right)


(35)y = t \quad ; \quad a = \mu \quad ; \quad b = \beta \quad ;
    \quad c = \frac{1}{\alpha}

We can see immediately that we have transformed the problem into (18), shown above: y = a + b \; \text{exp}(c \; x).

The algorithm is deduced immediately as we transpose the notation:


(t_1, F_1), ..., (t_k, F_k), ..., (t_n, F_n)


  • Rank points in order of ascending F_k

  • Compute x_k = \text{ln} \left( -\text{ln}(1 - F_k) \right)

  • Compute S_k:

    S_1 = 0 \quad \text{and for} \quad k = 2 \rightarrow n \; : \\
    S_k = S_{k-1} + \frac{1}{2}(t_k + t_{k-1})(x_k - x_{k-1})

  • Compute B

    \begin{bmatrix} A \\ B \end{bmatrix} =
    \sum \left((x_k - x_1)^2\right) & \sum \left(x_k - x_1\right) S_k \\
    \sum \left(x_k - x_1\right) S_k & \sum \left( S_k^2 \right)
    \sum (t_k - t_1)(x_k - x_1) \\
    \sum (t_k - t_1) S_k

  • Obtain \alpha_c = \frac{1}{B}

  • Compute \theta_k = \text{exp}(B \; x_k)

  • Compute \beta_c and \mu_c: \begin{bmatrix}\mu_c \\ \beta_c\end{bmatrix} = \begin{bmatrix}n & \sum \theta_k \\ \sum \theta_k & \sum \theta_k^2 \end{bmatrix}^{-1} \begin{bmatrix}\sum t_k \\ \sum t_k \theta_k\end{bmatrix}

Result: \alpha_c, \beta_c and \mu_c are the approximations of \alpha, \beta and \mu

In graphical representations, it is customary to display \text{ln}(t_k) as the abscissa and \text{ln}(-\text{ln}(1 - F_k)) as the ordinate. This is a legacy of the graphical linearization method used for the case where \mu = 0. In order to respect this tradition, we will interchange the axes and display \text{ln}(t_k) as the abscissa and x_k as the ordinate.

Weibull’s law applies generally to the failure of materials and objects. Since the variable t is time, the t_k and F_k are effectively sorted in ascending order.

A sample regression is shown in Fig. 4. To simulate an experiment, numerical data (t_k, F_k) were generated from “exact” values \alpha_e, \beta_e, \mu_e, defining the “true” F(t) according to (32), whose curve is shown as a dashed line in the figure. A value of t_k calculated from (33) corresponds to each F_k. These values of t are subjected to random deviations, simulating experimental uncertainty, which gives us the values of t_k in Table 4. Finally, the result (\alpha_c, \beta_c, and \mu_c) substituted into (32) gives the “fitted” function F_c(t) (whose curve is plotted as a solid line):

(36)F_c(t) = 1 - \text{exp} \left(
    - \left( \frac{t - \mu_c}{\beta_c} \right)^{\alpha_c}


Fig. 4 A sample regression of a Weibull cumulative distribution function of three parameters.

Table 4 Numerical values corresponding to the example in Fig. 4.
k t_k F_k \text{ln}(-\text{ln}(1 - F_k)) S_k  
1 1.202 0.033 -3.39452 0  
2 1.397 0.082 -2.45856 1.21627 \alpha_e = 2.4
3 1.537 0.131 -1.96317 1.94301 \beta_e = 1.6
4 1.57 0.181 -1.61108 2.48998 \mu_e = 0.8
5 1.768 0.23 -1.34184 2.93935  
6 1.856 0.279 -1.11744 3.34596 \alpha_c = 2.44301
7 1.87 0.328 -0.922568 3.70901 \beta_c = 1.55262
8 1.889 0.377 -0.748219 4.0367 \mu_c = 0.82099
9 1.918 0.426 -0.58856 4.3406  
10 2.098 0.475 -0.439502 4.63991  
11 2.212 0.524 -0.297951 4.94496  
12 2.349 0.573 -0.161377 5.25641  
13 2.453 0.622 -0.027514 5.57782  
14 2.557 0.671 0.105888 5.91199  
15 2.596 0.72 0.241349 6.26101  
16 2.602 0.769 0.382086 6.62678  
17 2.678 0.818 0.532831 7.02475  
18 2.706 0.867 0.701813 7.47965  
19 3.089 0.916 0.907023 8.07424  
20 3.441 0.965 1.20968 9.06241  

The representation in the traditional coordinate system shows that the presence of a non-zero \mu_c parameter prevents the usual graphical linearization, which is naturally to be expected. Carrying out the regression through an integral equation allows us to linearize and obtain a curve which may be substituted for the traditional method, appreciably improving the fit.

4. Discussion

The preceding examples (Section 2 and Section 3), show how a non-linear regression can be reduced to a linear one given the appropriate integral equation. In this manner, the usual iterative procedures can be relpaced by a simple linear calculation.

Regression of Sinusoids

Jean Jacquelin

The first revision of the paper Regressions of Sinusoids was dated 01/09/2009.
The current version was published on 02/15/2009.

1. Introduction

Under the seemingly innocuous title “Regression of Sinusoids”, ought to appear the subtitle “An Optimization Nightmare” to add a touch of realism. Indeed, we must have already been concerned with the problem to fully understand the relevance of the word. But what is it, really?

As with any number of similar problems, the data is comprised of n experimental points (x_1, y_1), (x_2, y_2), ..., (x_k, y_k), ..., (x_n, y_n). We seek to adjust the parameters of a function y = f(x) in such a way that its curve passes “as closely as possible” to the data points. In this particular case, we will deal with the following sinusoidal function of the four parameters a, b, c, and \omega:

(37)f(x) = a + b \; \text{sin}(\omega \; x) + c \; \text{cos}(\omega \; x)

This function is equivalent to:

    f(x) = a + \rho \; \text{sin}(\omega \; x + \varphi) \\
    \rho = \sqrt{b^2 + c^2} \quad ; \quad
    b = \rho \; \text{cos}(\varphi) \quad ; \quad
    c = \rho \; \text{sin}(\varphi)

The expression “as closely as possible” implies an optimization criterion. Specifically, we consider the sum of the squared residuals:

(39)\varepsilon^2_{a, b, c, \omega} =
    \sum_{k=1}^n \left( y_k - f(x_k) \right )^2 =
    \sum_{k=1}^n \left( y_k - \left(
        a + b \; \text{sin}(\omega \; x_k) + c \; \text{cos}(\omega \; x_k)
    \right) \right)^2

Since that is the sum that we wish to minimize, we have the generic name “least squares method”.

When \omega is known a-priori, the solution is trivial. Equation (37) effectively becomes linear in the optimization parameters (a, b and c). This well defined case merits only the briefest mention, which will be made in the next section.

In all other cases, the regression (or optimization) is non-linear, due to the fact that the sum of squares has a non-linear dependency on \omega.

An almost equally favorable situation arises when we have a “good enough” approximation for \omega, suitable to initialize an arbitrary non-linear optimization method. Literature is full of descriptions of such methods and many are implemented in software. To discuss these methods further would exceed the scope of the framework of this paper.

But the dreaded nightmare is not far. It is crucial that the initial estimate for \omega is “good enough”… And this is where sinusoidal functions differ from more accomodating non-linear functions: The more periods that the x_k span, the more randomly they are distributed, or the more noise is present in the y_k values, the more the condition “good enough” must be replaced with “very good”, tending to “with great precision”. In other words, we must know the value of \omega we are looking for in advance!

The original method proposed in Section 3 provides a starting point for addressing this challenge. Admittedly, it would be wrong to pretend that the method is particularly robust: Section 4 will discuss some of its deficiencies. Nevertheless, thanks to the first result, we will see in Section 5 that an original regression method (a saw’s tooth), allows for a much more accurate approximation of \omega through an improved linearization. Finally, Section 6 presents a summary of preformance in the course of systematic experiments. A synopsis of the entire method, which does not involve any iterative calculations, is presented in the Appendices.

Before getting into the heart of the matter, a warning must be given regarding some of the figures presented here (Fig. 5, Fig. 6, Fig. 7, Fig. 11, Fig. 14). They serve merely as illustrations of the procedures being described. To create them, we were obligated to fix on a particuar numeroical data set, which is not necessarily representative of the multitudes of possible cases. Given these figures alone, it would be absurd to form any opinion of the method in question, favorable or otherwise. This is especially true as the example has been selected with data that exaggerate all the defects that are explained in the text, to allow for easy identification and unambiguous discussion of the important features.

We see in Table 5 that our “experimental” points are sparse, non-uniformly distributed, and widely dispersed. One might suspect that this is not real experimental data, but rather a simulation obtained in the following way: An “exact” function is given, along with the coefficients a_e, b_e, c_e, and \omega_e, indicated in Table 5. The curve is plotted in Fig. 5 with a dashed line. The x_k were selected at random from the domain, and rounded to the values shown in Table 5. The corresponding exact y values are then computed from equation (37). The y_k were then randomly perturbed with a distribution whose root mean square \sigma_e was approximately 10% of the amplitude \rho_e of the sinusoid. This represents a much wider dispersion compared to what is usually seen in practice.


Fig. 5 “Exact” sinusoid along with the numerical data of our example.

Table 5 Numerical values corresponding to the data in Fig. 5.
k x_k y_k  
1 -1.983 0.936  
2 -1.948 0.810 \omega_e = 2
3 -1.837 0.716 a_e = -0.4
4 -1.827 0.906 b_e = 1.3
5 -1.663 0.247 c_e = -0.6
6 -0.815 -1.513 \rho_e = 1.431782
7 -0.778 -1.901 \sigma_e = 0.1491
8 -0.754 -1.565  
9 -0.518 -1.896  
10 0.322 0.051  
11 0.418 0.021  
12 0.781 1.069  
13 0.931 0.862  
14 1.510 0.183  
15 1.607 0.311  

The root mean square is defined by:

(40)\sigma = \sqrt{\frac{1}{n} \sum_{k=1}^n \left( f(x_k) - y_k \right)^2}

The values of the function f(x) are to be computed using the parameters a, b, c and \omega from the example under consideration.

The values of y_k, indicated in Table 5 and plotted in Fig. 5, are the data for the numerical examples in the following sections. It should be noted that the exact values a_e, b_e, c_e and \omega_e, shown in Table 5 are not used further on (except \omega_e in the “trivial” case in Section 2). They should be forgotten for the remainder of the algorithm, which uses only (x_k, y_k) as data. While the exact sinusoid will be shown on the various figures as a reminder, it is not an indication of the fact that the function (37) with the parameters (a_e, b_e, c_e, \omega_e) is ever used directly.

The tables serve another role, which will be appreciated by readers interested in implementing these techniques in a computer program. If so desired, all of the sample calculations shown here can be reproduced exactly with the data and results reported below. This provides a means to validate and correct a regression program implementing the algorithms described here.

2. Case Where \omega is Known A-Priori

When the value \omega = \omega_e is fixed, the optimization only needs to be performed for the parameters a, b and c of (37). Taking the partial derivatives of (39) with respect to the optimization parameters result in a system of three equations:

    \left( \frac{\partial \varepsilon^2}{\partial a} \right)_
        {(a_0, b_0, c_0)} = -2 \sum_{k=1}^n \left( y_k -
        \left( a_0 + b_0 \; \text{sin}(\omega_e \; x_k) +
        c_0 \; \text{cos}(\omega_e \; x_k)\right) \right) = 0 \\
    \left( \frac{\partial \varepsilon^2}{\partial b} \right)_
        {(a_0, b_0, c_0)} = -2 \sum_{k=1}^n \left( y_k -
        \left( a_0 + b_0 \; \text{sin}(\omega_e \; x_k) +
        c_0 \; \text{cos}(\omega_e \; x_k)\right) \right) \;
        \text{sin}(\omega_e \; x_k) = 0 \\
    \left( \frac{\partial \varepsilon^2}{\partial c} \right)_
        {(a_0, b_0, c_0)} = -2 \sum_{k=1}^n \left( y_k -
        \left( a_0 + b_0 \; \text{sin}(\omega_e \; x_k) +
        c_0 \; \text{cos}(\omega_e \; x_k)\right) \right) \;
        \text{cos}(\omega_e \; x_k) = 0 \\

The solution is given in the following system (42), with the convention that \sum \equiv \sum_{k=1}^n:

(42)\begin{bmatrix} a_0 \\ b_0 \\ c_0 \end{bmatrix} =
    n & \sum \text{sin}(\omega_e x_k) & \sum \text{cos}(\omega_e x_k) \\
    \sum \text{sin}(\omega_e x_k) & \sum \text{sin}^2(\omega_e x_k) &
        \sum \text{sin}(\omega_e x_k) \text{cos}(\omega_e x_k) \\
    \sum \text{cos}(\omega_e x_k) & \sum \text{sin}(\omega_e x_k)
        \text{cos}(\omega_e x_k) & \sum \text{cos}^2(\omega_e x_k)
    \sum y_k \\
    \sum y_k \; \text{sin}(\omega_e x_k) \\
    \sum y_k \; \text{cos}(\omega_e x_k)

The result is presented in figure Fig. 6. We note that the root mean square \sigma_0 is practically identical to \sigma_e in Table 5. This indicates that the regression did not increase the residuals of the fitted sinusoid relative to the “exact” one. Obviously, this is too good to always be true. So, after this brief review, we must tackle the crux of the problem: to compute a sufficiently accurate approximation for \omega, since it will not be know a-priori as it was in the preceding example.


Fig. 6 Case where \omega is know exactly.

Table 6 Fitting parameters of the curve shown in Fig. 6.
\omega_e = 2
a_0 = -0.397904
b_0 = 1.283059
c_0 = -0.573569
\rho_0 = 1.405426
\sigma_0 = 0.147456

3. Linearization Through an Integral Equation

The goal being to reduce the problem to a linear regression, it is sometimes tempting to use a linear differential equation whose solution is our function of interest as the intermediary. An example is the follwing equation, for which the sinusiod (37) is a solution[errata-reei-12]:

(43)f(x) = a - \frac{1}{\omega^2} \frac{d^2 f(x)}{dx^2}

Taking the appropriate partial derivatives of (44) leads to a system of linear equations of two unknowns a and \beta = \frac{1}{\omega^2}.

(44)\varepsilon^2_{(a, b, c, \omega)} =
    \sum_{k=1}^n \left( y_k - f(x_k) \right)^2 =
    \sum_{k=1}^n \left( y_k - a - \beta y''(x_k) \right)^2

Unfortunately, this is not a viable solution in practice (unless we have at our disposal a very large number of especially well distributed samples). The stumbling block here is the calculation of y''(x_k) from the n data points (x_k, y_k): as illustrated in Fig. 7, the fluctuations are generally too unstable for meaningful approximation.

On the other hand, the computation of numerical integrals is clearly less problematic than that of derivatives. It is therefore no surprise that we will seek an integral equation having for its solution our function of interest. For example, in the present case, we have equation (45), for which the sinusoid (37) is a solution:

(45)f(x) = -\omega^2 \int_{x_i}^x \int_{x_j}^v f(u) du \; dv + P(x)

P(x) is a second-degree polynomial whose coefficiens depend on a, b, c, \omega and the limits of integration x_i and x_j.

We could, of course, go into a great amount of detail regarding P(x), but that would derail the presentation without much immediate benefit. We can simply use the fact that a complete analysis shows that the limits of integration have no bearing on the regression that follows, at least with regard to the quantity of interest, the optimized value of \omega (noted as \omega_1). Therefore, to simplify, we will set x_i = x_j = x_1, which results in a function of the following form:

(46)f(x) = A \; SS(x) + B \; x^2 + C \; x + D

(47)\text{with:} \;
    SS(x) = \int_{x_1}^x \int_{x_1}^v f(u) du \; dv \quad ;
        \quad A = -\omega^2 \quad ; \quad B = \frac{1}{2}a \; \omega^2 \\
    C = -a \; \omega^2 \; x_1 + b \; \omega \; \text{cos}(\omega \; x_1) -
        c \; \omega \; \text{cos}(\omega \; x_1) \\
    D = a + \frac{1}{2} \; a \; \omega^2 \; x_1^2 +
        (b + c \; \omega \; x_1) \; \text{sin}(\omega \; x_1) +
        (c - b \; \omega \; x_1) \; \text{cos}(\omega \; x_1)

The coefficients A, B, C, D are unknown. However, they can be approximated through linear regression based on the precomputed values of SS_k. To accomplish this, we perform two numeric integrations:

    S(x_1) = S_1 = 0 \\
    S(x_k) \simeq S_k = S_{k-1} + \frac{1}{2}(y_k + y_{k-1}) (x_k - x_{k-1})
        \quad k = 2 \; \text{to} \; n

    SS(x_1) = SS_1 = 0 \\
    SS(x_k) \simeq SS_k = SS_{k-1} + \frac{1}{2}(S_k + S_{k-1})
        (x_k - x_{k-1}) \quad k = 2 \; \text{to} \; n

It goes without saying that the points must first be ranked in order of ascending x_k. It follows that the sum of squared residuals to minimize is the following[errata-reei-13]:

(50)\varepsilon^2_{a, b, c, \omega} = \sum_{k=1}^n \left( y_k -
    \left( A \; SS_k + B \; x_k^2 + C \; x_k  + D \right) \right)^2

There is no need to reiterate the partial derivatives with respect to A, B, C and D, used to obtain the optimized A_1, B_1, C_1 and D_1, followed by a_1, b_1, c_1 and \omega_1 (according to (47)):

(51)\begin{bmatrix} A_1 \\ B_1 \\ C_1 \\ D_1 \end{bmatrix} =
    \sum \left( SS_k \right)^2 & \sum x_k^2 \; SS_k &
        \sum x_k \; SS_k & \sum SS_k \\
    \sum x_k^2 \; SS_k & \sum x_k^4 & \sum x_k^3 & \sum x_k^2 \\
    \sum x_k \; SS_k   & \sum x_k^3 & \sum x_k^2 & \sum x_k \\
    \sum SS_k          & \sum x_k^2 & \sum x_k   & n
    \sum y_k \; SS_k \\ \sum y_k x_k^2 \\ \sum y_k x_k \\ \sum y_k

    \omega_1 = \sqrt{-A_1} \quad ; \quad a_1 = \frac{2 B_1}{\omega_1^2} \\
    b_1 = \left( B_1 \; x_1^2 + C_1 \; x_1 + D_1 - a_1 \right)
        \text{sin}(\omega \; x_1) +
        \frac{1}{\omega_1} \left( C_1 + 2 B_1 x_1 \right)
            \text{cos}(\omega \; x_1) \\
    c_1 = \left( B_1 \; x_1^2 + C_1 \; x_1 + D_1 - a_1 \right)
        \text{cos}(\omega \; x_1) -
        \frac{1}{\omega_1} \left( C_1 + 2 B_1 x_1 \right)
            \text{sin}(\omega \; x_1)

The results for our sample dataset are displayed further along in Fig. 14. The numerical values of \omega_1, a_1, b_1 and c_1 are shown in Table 14. To compare the graphical and numerical representations with resepect to the data points (x_k, y_k), refer to the curve and table column labeled (1).

Incidentally, it is interesting to compare the results of numerical integrations (48) and (49) with respect to differentiation (Fig. 7). Observing the significant oscillations of the latter (a sample point is noted f'_k on the plot), and how they would be exacerbated by further differentiation (not shown due to the excessive fluctuation amplitude), it becomes patently obvious that a feasible solution must be based on integral equations rather than differential.


Fig. 7 Numerical integrations and comparison with differentiations.[errata-reei-14].

Table 7 Cumulative sums corresponding to Fig. 7.
k S_k SS_k
1 0 0
2 0.030555 0.000534713
3 0.115248 0.00862678
4 0.123358 0.00981981
5 0.217904 0.0378033
6 -0.31888 -0.00501053
7 -0.382039 -0.0179775
8 -0.423631 -0.0276456
9 -0.832029 -0.175813
10 -1.60693 -1.20018
11 -1.60347 -1.35428
12 -1.40564 -1.90043
13 -1.26081 -2.10041
14 -0.958286 -2.74284
15 -0.934327 -2.83463

While noting that with more numerous and uniformly distributed x_k, coupled with less scattered y_k would result in more acceptable fluctuations in the derivatives, we must also remember that a good solution must apply not only to the trivial cases, but the difficult ones as well.

4. A Brief Analysis of Performance

The most important parameter to optimize is \omega. This comes with the understanding that once the optimization has succeeded, we can always fall back to the conventional regression method in Section 2 to obtain a, b and c. We will therefore concentrate on an investigation restricted to results with respect to \omega. The three most influential factors are:

  • The number of points, n_p, in each period of the sinusoid.
  • The distribution of samples in the domain:
    • Either uniform: x_{k+1} - x_k = constant
    • Or random: x_k is drawn at random from the available domain.
  • The scatter of the ordinates y_k, characterized by (\sigma_1 / \rho_e), the ratio of the root mean squared error (40) to the amplitude of the sinusoid.

4.1 Uniform Distribution of Abscissae With Non-Dispersed Ordinates

We find that the ratio \omega_1 / \omega_e, which is expected to be equal to 1, is affected by a deviation inversely proportional to n_p (Fig. 8). It is conceivably possible to construct an empirical function to correct the deviation. But such a function would not be very interesting, as the correction would not be satisfactory for the randomly distributed points we well investigate further on. A more general approach, described in Section 5, seems more appropriate.


Fig. 8 Effect of the number of points per period, with a uniform distribution.

Table 8 Numerical results corresponding to Fig. 8.[errata-reei-15]
n_p \frac{\omega_1}{\omega_e}
5 1.273
6 1.156
7 1.103
8 1.073
9 1.055
10 1.043
11 1.034
12 1.028
13 1.023
14 1.02
15 1.017
16 1.015
17 1.013
18 1.012
19 1.01
20 1.009


Place table next to figure if possible: .figure {float: left;}?

4.2 Random Distribution of Abscissae With Non-Dispersed Ordinates

Since x_k are randomly distributed, the dispersion in the calculated values of \omega_1 increases as n_p becomes small. For a fixed value of n_p, the cumulative distribution (computed from 10000 simulations), is represented in Fig. 9. It can be seen that the expected median \omega_{1m} / \omega_e, which is expected to be equal to 1, is affected even more than under the conditions described in Section 4.1.


Fig. 9 Cumulative distributions of \omega_1 (random distribution of x_k, non-dispersed y_k).

Table 9 Median values of distributions shown in Fig. 9.
n_p \frac{\omega_{1m}}{\omega_e}
8 1.134
10 1.098
12 1.073
15 1.051
20 1.033
50 1.006


Place table next to figure if possible: .figure {float: left;}?

4.3 Random Distribution of Abscissae With Dispersed Ordinates

As expected, the dispersion of \omega_1 is greater than in the previous case. Fig. 10 shows the the case where (\sigma_1/\rho_e) = 10\%, compared to (\sigma_1/\rho_e) = 0 as represented in Fig. 9. Nevertheless, the median values are barely affected (compare Table 10 to the previous data in Table 9).


Fig. 10 Cumulative distributions of \omega_1 with randomly distributed ordinates such that (\sigma_1/\rho_e = 0.1)

Table 10 Median values of distributions shown in Fig. 10.
n_p \frac{\omega_{1m}}{\omega_e}
8 1.144
10 1.104
12 1.080
15 1.057
20 1.036
50 1.007


Place table next to figure if possible: .figure {float: left;}?

5. Further Optimizations Based on Estimates of a and \rho

We now shift our interest to a function y = f(x), expressed in the form (38). The inverse function is an arcsine, but can also be expressed as an arctangent:

    \Phi(x) = \text{arctan} \left( \frac{f(x) - a}
        {\sqrt{\rho^2 - \left( f(x) - a \right)^2}} \right) \\
    \omega \; x + \varphi = \pm \Phi(x) + \pi K_{(x)}

\text{arctan} denotes the principal values (between -\frac{\pi}{2} and \frac{\pi}{2}) of the multivalued function. The sign of \Phi and the integer K_{(x)} depend on the half-period of the sinusoid in which the point (x, y) is found, making their dependence on x discontinuous. Additionally, we can show that the sign is + when K_{(x)} is even and - when odd.

If we consider \Phi(x) by itself, it is a sawtooth function, as shown in Fig. 11 by the dotted curve. The points (x_k, \Phi_k), with \Phi_k = \Phi(x_k), are represented by crosses. Rewritten in this manner, the problem becomes one of regression of a sawtooth function, which is largely as “nightmarish” as regression of the sinusoid. Since we have no information other than the points (x_k, \Phi_k), determining K_{(x)} difficult and mostly empirical, and therefore not guaranteed to always be feasible in the general case. The situation is different in our case because we have already found the orders of magnitude of the parameters: a_1, b_1, c_1 and \omega_1, and consequently \rho_1 and \varphi_1 through (38). In the current instance, which will be represented by the subscript 2, we posit that:

(54)a_2 = a_1 \quad ; \quad \rho_2 = \rho_1 = \sqrt{b_1^2 + c_1^2} \quad ; \quad
    \rho_1 \; \text{cos}(\varphi_1) = b_1 \quad ; \quad
    \rho_1 \; \text{sin}(\varphi_1) = c_1 \\
\text{If} \; b_1 > 0 \quad \rightarrow \quad
    \varphi_1 = \text{arctan} \left( \frac{c_1}{b_1} \right) \quad ; \quad
    \text{if} \; b_1 < 0 \quad \rightarrow \quad
    \varphi_1 = \text{arctan} \left( \frac{c_1}{b_1}\right) + \pi

Now it is possible to calculate K_1, K_2, ..., K_k, ..., K_n in number of different ways. One possibility is as follows, where \text{round} rounds the real argument to the nearest integer:

(55)K_k = \text{round} \left( \frac{\omega_1 \; x_k + \varphi_1}{\pi} \right)

Another way to write (53), applied to x = x_k, better demonstrates the nearly linear relationship \theta_k \approx \omega_2 \; x_k + \varphi_2 between x_k and \theta_k, defined by:

    \theta_k = (-1)^{K_k} \; \text{arctan} \left( \frac{y_k - a_2}
        {\sqrt{\rho_2^2 - (y_k - a_2)^2}^2} \right) + \pi K_k \\
    \text{if} \; \rho_2^2 \leq (y_k - a_2)^2 \quad \rightarrow \quad
        \text{arctan} = +\frac{\pi}{2} \;
        \text{if} \; y_k > a_2 \; \text{or} \; = -\frac{\pi}{2} \;
        \text{if} \; y_k < a_2

We see in Fig. 11, that the series of points (x_k, \Phi_k) represented by crosses, is transformed into a series of points (x_k, \theta_k) represented by squares, some of which coincide.


Fig. 11 Transformation of a sawtooth function in preparation for linear regression[errata-reei-16]

Table 11 Numerical data for the points shown in Fig. 11.
k \Phi_k K_k \theta_k
1 1.16394 -1 -4.30553
2 0.975721 -1 -4.11731
3 0.864493 -1 -4.00609
4 1.11266 -1 -4.25425
5 0.438724 -1 -3.58032
6 -0.990033 -1 -2.15156
7 -1.5708 0 -1.5708
8 -1.06193 0 -1.06193
9 -1.5708 0 -1.5708
10 0.288353 0 0.288353
11 0.266008 0 0.266008
12 1.5708 1 1.5708
13 1.04586 1 2.09573
14 0.388646 1 2.75295
15 0.490008 1 2.65159

It is clear that the points (x_k, \theta_k) tend towards a straight line, which makes it possible to perform a linear regression to obtain \omega_2 from the slope and \varphi_2 from the x-intercept. Once we have computed \theta_k from (56), the result can be obtained in the usual manner:

(57)\begin{bmatrix} \omega_2 \\ \varphi_2 \end{bmatrix} =
\begin{bmatrix} \sum (x_k)^2 & \sum x_k \\ \sum x_k & n \end{bmatrix}^{-1}
\begin{bmatrix} \sum \theta_k x_k \\ \sum \theta_k \end{bmatrix}

Completed by:

(58)b_2 = \rho_2 \; \text{cos}(\varphi_2) \quad ; \quad
    c_2 = \rho_2 \; \text{sin}(\varphi_2)

The numerical results are summarized in Table 14 (in the table column labeled (2)), and the curve is shown in Fig. 14.

The performance measured in terms of the optimization of \omega is shown in Fig. 12 and Fig. 13. Note the improvements in the optimized median value, \omega_m, shown in Table 12 and Table 13, which was the objective of this phase of the algorithm.


Fig. 12 Cumulative distributions of \omega_2 (random distribution of x_k, non-dispersed y_k).


Fix the broken figure

Table 12 Median values of distributions shown in Fig. 12.
n_p \frac{\omega_{2m}}{\omega_e}
8 1.175
10 1.144
12 1.117
15 1.069
20 0.986


Fix the broken table


Fig. 13 Cumulative distributions of \omega_2 with randomly distributed ordinates such that (\sigma_2/\rho_e = 0.1)


Fix the broken figure

Table 13 Median values of distributions shown in Fig. 13.
n_p \frac{\omega_{2m}}{\omega_e}
8 1.157
10 1.130
12 1.109
15 1.067
20 0.988
50 0.825


Fix the broken table

6. Final Steps and Results

As soon as we have a good approximation (\omega_2) of \omega, the classical method shown in Section 2 can be applied:

(59)\begin{bmatrix} a_3 \\ b_3 \\ c_3 \end{bmatrix} =
    n & \sum \text{sin}(\omega_2 \; x_k) &
        \sum \text{cos}(\omega_2 \; x_k) \\
    \sum \text{sin}(\omega_2 \; x_k) & \sum \text{sin}^2(\omega_2 \; x_k) &
        \sum \text{sin}(\omega_2 \; x_k) \; \text{cos}(\omega_2 \; x_k) \\
    \sum \text{cos}(\omega_2 \; x_k) &
        \sum \text{sin}(\omega_2 \; x_k) \; \text{cos}(\omega_2 \; x_k) &
        \sum \text{cos}^2(\omega_2 \; x_k)
     \sum y_k \\
     \sum y_k \; \text{sin}(\omega_2 \; x_k) \\
     \sum y_k \; \text{cos}(\omega_2 \; x_k)

This allows us to perform a final optimization of the dimensional parameters of the sinusoid, which was not possible in the previous step because a_2 and \rho_2 were fixed. The final result is presented in Fig. 14. The numerical results are summarized in Table 14 (in the table column labeled (3)). The intermediate results (sinusoids (1) and (2), along with their respective parameters) are reported in the same figure and table.


Fig. 14 Regression results for a sample sinusoid.

Table 14 Parameters of the results shown in Fig. 14
  (1) (2) (3)
\omega 2.32536 2.02074 2.02074
a -0.345959 -0.345959 -0.405617
b 1.34913 1.35253 1.2752
c 0.358335 -0.345283 -0.577491
\rho 1.39591 1.39591 1.39987
\varphi 0.25961 -0.249948 -0.425231

This figure might give the appearance that the prodedure is apparently converging the solid curves with the dotted one with successive approximations. However, attempts to use this approach iteratively would have no effect: repeating the calculations would not modify the final result because it is based on the initial numerical integrations, whose inherent noise will not decrease with further iterations.

Sampling a large number of simulations of different conditions would be required to form an objective opinion of the properties of this method. We can refer to Table 12 and Table 13 for a summary of the results for the the optimization of \omega, which is the essential part of the algorithm. Naturally, the final results are unchanged from what is shown in Fig. 12 and Table 13 since \omega_3 = \omega_2.

For each simulated value of (\omega_3, a_3, b_3, c_3), the root mean square is computed from

(60)\sigma_3 = \sqrt{\frac{1}{n} \sum_{k=1}^n \left( y_k - \left(
    a_3 + b_3 \; \text{sin}(\omega_3 \; x_k) +
    c_3 \; \text{cos}(\omega_3 \; x_k)
\right) \right)^2}

The distribution of (\sigma_3/\rho_e) for each fixed value of n_p is plotted from 10000 simulations, each with a different set of (x_k, y_k), since the ordinates are perturbed with a characteristic root mean square of \sigma_e about the nominal (defined in Section1). For the plot in figure sin-11, the dispersed ordinates are generated so that the ratio \sigma_e / \rho_e is always a constant, in this case 0.1. The abscissae, on the other hand, are generated differently:

  • Fig. 15: The abscissae of successive points are distributed uniformly.
  • Fig. 16: The abscissae are distributed randomly (with n_p points per full period)

It might seem surprising that the root mean squared error \sigma_3 is smaller than the initial \sigma_e. The explanation is that the “optimized” sinusoid will often pass closer to the data points than the “exact” one, especially with more dispersed points. The differences stem from the optimization of \omega_2, which can differ significantly from \omega_e, as we saw in Fig. 12 and Fig. 13. Consequently, for dispersed ordinates, the error does not increase as significantly as we might have feared in moving away from uniformly distributed abscissae.


Fig. 15 Cumulative distribution functions for the root mean squared error, with uniformly distributed absissae.


Fig. 16 Cumulative distribution functions for the root mean squared error, with non-uniformly distributed absissae.

The biggest difference between the uniform and non-uniform cases is the failure rate of the process. This must be mentioned, since it is the common lot of all methods when the points are irregularly distributed. Inevitably (not frequently, but nevertheless with a non-zero probabililty), we run into data sets in which all the points are tightly clustered. The sinusoid for such data is not defined, and therefore can be neither characterized nor approximated. This can be expressed differently in different portions of the computation, for example through indeterminacy (division by a number too close to zero), non-invertibility of one of the matrices, square root of a negative number, etc.

The calculation of \omega_1 in (52) is where the majority of failures occur in this method, fortunately quite rarely. For the hundreds of thousands of simulations performed with uniformly distributed abscissae, no failure was ever encountered. This is not surprising since the points could never be clumped in that situation. The case with randomly selected abscissae, on the other hand, showed a very low failure rate, dependent on both the dispersion and the quantity of data points under consideration, as shown in Fig. 17 (which required nearly five hundred thousand simulations for a coherent view to emerge).


Fig. 17 Failure rates (orders of magnitude).

7. Discussion

Appendix 1: Summary of Sinusoidal Regression Algorithm

Appendix 2: Detailed Procedure for Sinusoidal Regression

Application to the Logistic Distribution (Three Parameters)

Application to the Logistic Distribution (Four Parameters)

Mixed Linear and Sinusoidal Regression

Generalized Sinusoidal Regression

Damped Sinusoidal Regression

Double Exponential Regression & Double Power Regression

Multivariate Regression

Rather than a single variable, x, the objective function depends on multiple variables, x, t, ....

Review of the Linear Case

In the simplest case, the function y(x, t, ...) is linear with respect to the optimization parameters \lambda_1, \lambda_2, ..., \lambda_j, ..., \lambda_m:

y(x, t, ...) = \lambda_1 f_1(x, t, ...) + ... +
               \lambda_j f_j(x, t, ...) + ... +
               \lambda_m f_m(x, t, ...)

The given functions f_j(x, t, ...) do not themselves depend on the optimization parameters.

Given the n data points (x_1, t_1, ...; y_1), (x_2, t_2, ...; y_2), ..., (x_k, t_k, ...; y_k), ..., (x_n, t_n, ...; y_n)[errata-reei-17], the fitting parameters can be computed using the least-squares method:

With: f_{j,k} = f_j(x_k, t_k, ...)

    \lambda_1 \\ ... \\ \lambda_j \\ ... \\ \lambda_m
\end{bmatrix} = \begin{bmatrix}
    \sum_{k=1}^n \left(f_{1,k}\right)^2     & ... &
        \sum_{k=1}^n f_{1,k} f_{j,k}        & ... &
        \sum_{k=1}^n f_{1,k} f_{m,k} \\
    ... & ... & ... & ... & ... \\
    \sum_{k=1}^n f_{1,k} f_{j,k}            & ... &
        \sum_{k=1}^n \left(f_{j,k}\right)^2 & ... &
        \sum_{k=1}^n f_{j,k} f_{m,k} \\
    ... & ... & ... & ... & ... \\
    \sum_{k=1}^n f_{1,k} f_{m,k}            & ... &
        \sum_{k=1}^n f_{j,k} f_{m,k}        & ... &
        \sum_{k=1}^n \left(f_{m,k}\right)^2
\end{bmatrix}^{-1} \begin{bmatrix}
    \sum_{k=1}^n y_k f_{1,k} \\ ... \\
        \sum_{k=1}^n y_k f_{j,k} \\ ... \\ \sum_{k=1}^n y_k f_{m,k}

Non-Linear Case

If one or more of the functions depend on the fitting parameter(s), the corresponding y(x, t, ...) will no longer be linear with respect to the optimization parameters:

y(x, t, ...) = b_1 \varphi_1(p_1, p_2, ...; x, t, ...) +
               b_2 \varphi_2(p_1, p_2, ...; x, t, ...) + ... +
               \lambda_1 f_1(x, t, ...) + ... +
               \lambda_j f_j(x, t, ...) + ... +
               \lambda_m f_m(x, t, ...)

The functions given as \varphi_1, \varphi_2, ... depend on the fitting parameters p_1, p_2, .... The complete set of parameters to optimize is therefore p_1, p_2, ..., b_1, b_2, ..., \lambda_1, ..., \lambda_j, ..., lambda_m. The least squares method can not be applied directly in this case. A multitude of different methods exist, based generally on forming an initial estimate of the non-linear parameters (p_1, p_2, ...), followed by recursive approximations to progressively optimize the initial guess.

The method studied here follows from a very different principle. We perform one or more integrations of y(x, t, ...), for example \int y(x, y, ...) dx, or \int y(x, t, ...) dt, or \int \int y(x, t, ...) dx dt, or other multiple integrals.

We can can also introduce new functions g(x, t, ...), to simplify the integrals: \int g(x, t, ...) y(x, t, ...) dx, or \int g(x, t, ...) y(x, t, ...) dt, or \int \int g(x, t, ...) y(x, t, ...) dx dt, or other multiple integrations.

We can see that the possibilities are endless. It is sometimes possible to form linear combinations of such equations that cancel out the non linear terms depending on the parameters p_1, p_2, ....

For Example, consider the following function:

y(x, t) = b \text{exp}(p x t) + \lambda(x - t)^2 \quad

    b_1 = b \quad ; \quad b_2 = b_3 = ... = 0 \\
    p_1 = p \quad ; \quad p_2 = p_3 = ... = 0 \\
    \varphi(p_1, p_2, ...; x, t, ...) = \text{exp}(p x t) \\
    \lambda_1 = \lambda \quad ; \quad \lambda_2 = \lambda_3 = ... = 0 \\
    f_1(x, t, ...) = (x - t)^2
\end{cases} \\

    \text{with :} \\
    g(x, t) = t
\end{cases} \begin{cases}
    \int g(x, t) y(x, y) dx = \int t b \text{exp}(p x t) dx +
        \int t \lambda(x - t)^2 dx \\
    = \frac{b}{p} \text{exp}(p x t) + \frac{\lambda}{3} t (x-t)^3 + ...

We can see that a linear combination between this equation and the definition of y(x, t) can be used to cancel out the term \text{exp}(p x t):

y(x, t) = p \int t y(x, t) dx - \frac{1}{3} \lambda p t (x - t)^3 +
                                            \lambda (x - t)^2 + ... \\
y(x, t) = p S(x, t) + \lambda_1 f_1(x, t) + \lambda_2 f_2(x, t) + ...

    S(x, t) = \int t y(x, t) dx \\
    f_1(x, t) = t(x - t)^3 \\
    f_2(x, t) = (x - t)^2

S(x, t) will be approximated numerically from the data.

This relationship is linear in p, \lambda_1, \lambda_2. Least squares regression gives us the desired approximation of p. This reduces the equation for y(x, t) to a linear relationship with regards to b and lambda.

Nevertheless, the equations above are not quite correct, as they were deliberately left incomplete, so as to simplify the following explanation. The integrals can not be left in their indefinite form, especially when it comes to numerical integration. It is necessary to define a lower limit if integration. This requirement does not pose any fundamental difficulty for a function y(x) of one variable. It does, however, become a sensitive problem for functions y(x, t, ...) of multiple variables. The potential problems with numerical integration must be examined carefully in the presence of multiple variables (x, t, ...), even when the integration is only carried out over one of them.