REGRESSIONS et EQUATIONS INTEGRALES¶

Jean Jacquelin

Sample applications to various functions:
Gaussian
Power, Exponential, Logarithmic, Weibull
Sinusoidal
Logistic
Generalization of Sinusoidal Regression
Damped Sinusoidal Regression
Sum of Two Exponentials and Sum of Two Powers
Multivariate Regression

../../_images/Regressions-et-equations-integrales.cover.jpg

[ First Edition : 14 January 2009 - Updated : 3 January 2014 ]

Translator’s Note¶

I came across this paper while searching for efficient, preferably non-iterative, routines for fitting exponentials. The techniques presented in this paper have served my purpose well, and this translation (and to some extent the entire scikit) were the result. I hope that my endeavors do justice to Jean Jacquelin’s work, and prove as useful to someone else as they did to me.

The paper translated here is a compilation of related original papers by the author, gathered into a single multi-chapter unit. Some of the original material is in French and some in English. I have attempted to translate the French as faithfully as I could. I have also attempted to conform the English portions to what I consider to be modern American usage.

There are certain to be some imperfections and probably outright errors in the translation. The translation is not meant to be a monolith. It is presented on GitLab so that readers so inclined can easily submit corrections and improvements.

A small number of technical corrections were made to the content of this paper throughout the translation. In all cases, the errata are clearly marked with footnotes with detailed descriptions of the fix in question. Errata are to be submitted to Jean Jacquelin before publication of this translation.

I have re-generated the plots and tables in the paper as faithfully as I could, rather than copying them out of the original. This has provided a measure of confidence in the results. There may be some slight disagreement in the randomly generated datasets, but the discrepancies are both expected and miniscule.

– Joseph Fox-Rabinovitz

23rd September 2018

Regressions and Integral Equations¶

Jean Jacquelin

Abstract¶

The primary aim of this publication is to draw attention to a rarely used method for solving certain types of non-linear regression problems.

The key to the method presented in this paper is the principle of linearization through differential and integral equations, which enables the conversion of a complex non-linear problem into simple linear regression.

The calculus shown shown here has a fundamental difference from traditional solutions to similar problems in that it is non-recursive, and therefore does not require the usual iterative approach.

In order to demonstrate the theory through concrete application, detailed numerical examples have been worked out. Regressions of power, exponential, and logarithmic functions are presented, along with the Gaussian and Weibull distributions commonly found in statistical applications.

Regressions and Integral Equations

Jean Jacquelin

The first revision of the paper Regressions and Integral Equations was dated 01/14/2009.

The current version was updated on 04/27/2009.

1. Introduction¶

The study presented here falls into the general framework of regression problems. For example, given the coordinates of a sequence of $n$ points: $(x_1, y_1), (x_2, y_2), ..., (x_k, y_k), ..., (x_n, y_n)$ , we wish to find the function $y = F(a, b, c, ...; x)$ which lies as close as possible to the data by optimizing the parameters $a, b, c, ...$

The commonly known solution to linear regression merits only a brief discussion, which is to be found in Appendix 1. Some problems can be solved through linear regression even though they appear non-linear at first glance. The Gaussian cumulative distribution function is a specific example which is discussed in Appendix 2.

Excepting such simple cases, we are confronted with the daunting problem of non-linear regression. There is extensive literature on the subject. Even the most cursort review would derail us from the purpose of this paper. It is also unnecessary because our goal is to reduce non-linear problems to linear regression through non-iterative and non-recursive procedures (otherwise, how would our proposed method be innovative with respect to existing methods?).

In the next section, we will proceed to the heart of the matter: rendering non-linear problems to linear form by means of suitable differential and/or integral equations. The preliminary discussion will show that in the context of such problems, integral equations tend to be more numerically stable than differential equations, with very few exceptions.

The principle of using integral equations will be explained and demonstrated in practice using the Gaussian probability distribution function as a concrete example. Other examples of regression using integral equations will be described in a detailed manner in the two following papers:

Non-Linear Regression of the Types: Power, Exponential, Logarithmic, Weibull
Regression of Sinusoids

2. Principle of Linearization Through Differential and/or Integral Equations¶

We begin with a summary of numerical methods for approximating derivatives and integrals. Given $n$ points $(x_k, y_k)$ located near the curve of a function $y(x)$ , and given another function $g(x)$ , we can calculate approximations for the following derivatives and integrals with $g_k = g(x_k)$ :

$D_k = \frac{g_{k+1} y_{k+1} - g_{k-1} y_{k-1}}{x_{k+1} - x_{k-1}} \simeq \left( \frac{d}{dx} g(x) y(x) \right)_{\left( x = x_k \right)}$

$DD_k = \frac{D_{k+1} - D_{k-1}}{x_{k+1} - x_{k-1}} \simeq \left( \frac{d^2}{dx^2} g(x)y(x) \right)_{\left( x = x_k \right)}$

And so on, for subsequent derivatives, as necessary.

$S_k \simeq \int_{x_1}^x g(u)y(u)du \quad \begin{cases} S_1 = 0 \hfill \text{and for $k = 2 \rightarrow n$:} & \\ S_k = S_{k-1} + \frac{1}{2} (g_ky_k + g_{k-1}y_{k-1})(x_k - x_{k-1}) & \end{cases}$

$SS_k \simeq \int_{x_1}^x \left( \int_{x_1}^v g(u)y(u)du \right)dv \quad \begin{cases} SS_1 = 0 \hfill \text{and for $k = 2 \rightarrow n$:} & \\ SS_k = SS_{k-1} + \frac{1}{2} (S_k + S_{k-1})(x_k - x_{k-1}) & \end{cases}$

And do on, for subsequent integrals, as necessary.

It goes without saying that the points must first be ranked in order of ascending $x_k$ .

It is possible to use more sophisticated approximations for numerical differentiation and integration. Nothing prevents us from selecting a lower limit (or lower limits) of integration other than $x_1$ , and using different limits for the successive integrations. However, that would complicate the formulas and explanations unnecessarily. For the sake of simplicity, let us agree to use these formulas, at least for this stage of the presentation.

Returning to our initial formulation of the problem, we wish to optimize the parameters $a, b, c, ...$ of a function $y(a, b, c, ...; x)$ so that its curve approaches the $n$ points $(x_k, y_k)$ as closely as possible. Evidently, the exact expressions of the derivatives and anti-derivatives of the function depend on the pameters $a, b, c, ...$ . However the approximate values calculated using the formulas shown above, i.e. the numerical values of $D_k, DD_k, ..., S_k, SS_k, ...$ , are computed solely from the data points $x_k, y_k$ , without requiring prior knowledge of $a, b, c, ...$ . This observation is the crux of the method that is to be shown.

Let us suppose that the function $y(a, b, c, ...; x)$ is the solution to a differential and/or integral equation of the form:

$y(x) = A \; \Phi(x) + B\int G(x)y(x)dx + C\int\int H(x)y(x)dx^2 + ... + \alpha\frac{d}{dx}g(x)y(x) + \beta\frac{d^2}{dx^2}h(x)y(x) + ...$

with $\Phi(x), G(x), H(x), ..., g(x), h(x), ...$ predetermined functions independent of $a, b, c, ...$ , and $A, B, C, ..., \alpha, \beta, ...$ dependent on $a, b, c, ...$ . The approximate values are then respectively (with $\Phi_k = \Phi(x_k); G_k = G(x_k); H_k = H(x_k); ...$ )[errata-reei-1]:

$D_k = \frac{g_{k+1}y_{k+1} - g_{k-1}y_{k-1}}{x_{k+1} - x_{k-1}}$

$DD_k = \frac{\Delta_{k+1} - \Delta_{k-1}}{x_{k+1} - x_{k-1}} \quad \text{with} \quad \Delta_k = \frac{h_{k+1}y_{k+1} - h_{k-1}y_{k-1}}{x_{k+1} - x_{k-1}}$

$S_1 = 0 \quad ; \quad S_k = S_{k-1} + \frac{1}{2}\left(G_ky_k + G_{k-1}y_{k-1}\right)\left(x_k - x_{k-1}\right)$

$\begin{cases} SS_1 = 0 \quad ; \quad SS_k = SS_{k-1} + \frac{1}{2}\left(\Xi_k + \Xi_{k-1}\right)\left(x_k - x_{k-1}\right) \\ \text{with: } \Xi_k = 0 \quad ; \quad \Xi_k = \Xi_{k-1} + \frac{1}{2}\left(H_ky_k + H_{k-1}y_{k-1}\right)\left(x_k - x_{k-1}\right) \end{cases}$

If we replace the exact derivatives and/or anti-derivatives with their numerical approximations, the equation will no longer hold true. We can therefore work with the sum of the squares of the residuals:

$\sum_{k=1}^n \varepsilon_k^2 = \sum_{k=1}^n \left( -y_k + A \; \Phi_k + B \; S_k + C \; SS_k + ... + \alpha \; D_k + \beta \; DD_k + ... \right)^2$

The relationship is linear with respect to $A, B, C, ..., \alpha, \beta, ...$ . We have therefore returned to classical linear regression, which allows us to calculate the optimal values of $A_0, B_0, C_0, ..., \alpha_0, \beta_0, ...$ . Finally, since $A, B, C, ..., \alpha, \beta, ...$ are known functions of $a, b, c, ...$ , we must solve the system of equations $A(a, b, c, ...) = A_0 \; ; \; B(a, b, c, ...) = B_0 \; ; \; ... \; ; \; \alpha(a, b, c, ...) = \alpha_0 \; ; \; \beta(a, b, c, ...) = \beta_0$ to obtain the optimal values of the parameters $a, b, c, ...$ .

There are some additional considerations that must be taken into account when choosing the differental and/or integral equation. Other than the requirement for linearity in its coefficients (but not necessarily in the functions, since we have the choice of $G(x), H(x), ..., g(x), h(x), ...$ ), the equation should preferably have as many coefficients $A_0, B_0, ..., \alpha_0, \beta_0, ...$ as there are initial parameters $a, b, c, ...$ to optimize. If there are fewer coefficients, an additional regression (or regressions) will be necessary to calculate the coefficients that do not figure explicitly in the equation.

Moreover, to avoid an overburdened explanation, we have been considering a simplified form of differential and/or integral equation. In fact, the equation could have any number of different functions $\Phi(x)$ , several different derivatives (corresponding to various choices of $g(x)$ ), several different integrals (corresponding to various choice of $G(x)$ ), and so on for subsequent multiple derivatives and integrals.

Clearly, there is a multitude of choices for the differential and/or integral equation that we bring to bear on the problem. However, practical considerations limit our choices. One of the main stumbling blocks is the difficulty inherent in numerical approximation of derivatives. In fact, in cases where the points have an irregular distribution, and are too sparse, and if, to make matters worse, the values of $y_k$ are not sufficiently precise, the computed derivatives will fluctuate so much and be so dispersed as to render the regression ineffective. On the other hand, numerical approximations of integrals retain their stability even in these difficult cases (this does not mean that the inevitable deviations are insignificant, but that they remain damped, which is essential for the robustness of the overall process). Except in special cases, the preference is therefore to seek an integral equation rather than a differential one.

The generality of the presentation that has just been made may give the impression that the method is complicated and difficult to implement in practice. The fact of the matter is quite the opposite, as we will see once we shift focus from an the abstract discussion of many possibilities to solving a single concrete example.

One of the most spectacular examples is that of sinusoidal regression (which we only mention in passing, without going into depth here, but which will be treated in detail in the attached paper Regression of Sinusoids). It concerns the optimization of the parameters $a, b, c$ and $\omega$ in the equation:

$y(x) = a + b \; \text{sin}(\omega \; x) + c \; \text{cos}(\omega \; x)$

This function is the solution to the differential equation:

$y(x) = A + B \frac{d^2y}{dx^2} \quad \text{with} \quad A = a \quad \text{and} \quad B = -\frac{1}{\omega^2}$

This is a linear equation with respect to $A$ and $B$ , which are themselves (very simple) functions of $a$ and $\omega$ . Moreover, the parameters $b$ and $c$ no longer figure in the differential equation directly. This case is therefore a typical application of the method, and among the easiest to implement, except that it contains a second derivative, which makes it virtually useless. Fortunately, there is no a-priori reason not to use an integral equation whose solution is a sinusoid instead. The integral method is hardly any more complicated and gives largely satisfactory results (which are studied in detail in the attached paper: Regression of Sinusoids).

For our first demonstration of the method, let us look for a simpler example. In the following section, the method of regression through an integral equation will be applied to the Gaussian probability density function.

3. Example: Illustration of the Gaussian Probability Density Function¶

We consider a probability density function of two parameters, $\sigma$ and $\mu$ , defined by

(1)¶ $f(x) = \frac{1}{\sigma \sqrt{2 \pi}} \; \text{exp} \left( -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 \right)$

The general notation $y(x)$ of the previous sections is replaced with $f(x)$ here due to the specificity of this case.

The integration (2) leads to the integral equation (3), of which $f(x)$ is the solution[errata-reei-2]:

(2)¶ $\int_{x_1}^x \left( t - \mu \right) f(t)dt = -\sigma^2 \left( f(x) - f(x_1) \right)$

(3)¶ $\begin{cases} f(x) - f(x_1) = A \int_{x_1}^x f(t)dt + B \int_{x_1}^x t \; f(t)dt \\ \text{with:} \quad A = \frac{\mu}{\sigma^2} \quad \text{and} \quad B = -\frac{1}{\sigma^2} \end{cases}$

This is a linear integral equation, consisting of two simple integrals, which places it among the applications mentioned in the previous section. We compute the respective approximations, the first denoted $S$ with $G(x) = 1$ , and the second denoted $T$ with $G(x) = x$ :

(4)¶ $\begin{cases} S_1 = 0 \\ S_k = S_{k-1} + \frac{1}{2}\left(f_k + f_{k-1}\right) \left(x_k - x_{k-1}\right) \quad k = 2 \rightarrow n \end{cases}$

(5)¶ $\begin{cases} T_1 = 0 \\ T_k = T_{k-1} + \frac{1}{2}\left(x_k f_k + x_{k-1} f_{k-1}\right) \left(x_k - x_{k-1}\right) \quad k = 2 \rightarrow n \end{cases}$

When we replace $f(x_k)$ with $f_k$ , $f(x_1)$ with $f_1$ and the integrals with $S_k$ and $T_k$ , respectively, equation (3) no longer holds true. We seek to minimize the sum of the squares of the residuals:

(6)¶ $\sum_{k=1}^n \varepsilon_k^2 = \sum_{k=1}^n \left( -\left( f_k - f_1 \right) + A \; S_k + B \; T_k \right)^2$

Notice that had we chosen a lower limit of integration different from $x_1$ , it would have changed not only the value of $f_1$ , but also the numerical values of $S_k$ and $T_k$ , in a way that would cancel out the differences without changing the final result.

The relationship (6) is none other than the than the equation of a linear regression, which we know how to optimize for the parameters $A_1$ and $B_1$ [errata-reei-3]:

(7)¶ $\begin{bmatrix}A_1 \\ B_1\end{bmatrix} = \begin{bmatrix} \sum \left(S_k\right)^2 & \sum S_k T_k \\ \sum S_k T_k & \sum \left(T_k\right)^2 \end{bmatrix}^{-1} \begin{bmatrix} \sum \left(f_k - f_1\right) S_k \\ \sum \left(f_k - f_1\right) T_k \end{bmatrix}$

By convention, $\sum \equiv \sum_{k=1}^n$ . We then deduce $\sigma_1$ and $\mu_1$ according to (3)[errata-reei-4]:

(8)¶ $\sigma_1 = \sqrt{-\frac{1}{B_1}} \quad ; \quad \mu_1 = -\frac{A_1}{B_1}$

Here is a summary of the numerical computation:

Data: $(x_1, f_1), (x_2, f_2), ..., (x_k, f_k), ..., (x_n, f_n)$

Compute $S_k$ :

$\begin{cases} S_1 = 0 \\ S_k = S_{k-1} + \frac{1}{2}\left(f_k + f_{k-1}\right) \left(x_k - x_{k-1}\right) \quad k = 2 \rightarrow n \end{cases}$

Compute $T_k$ :

$\begin{cases} T_1 = 0 \\ T_k = T_{k-1} + \frac{1}{2}\left(x_k f_k + x_{k-1} f_{k-1}\right) \left(x_k - x_{k-1}\right) \quad k = 2 \rightarrow n \end{cases}$

Compute $\sum \left(S_k\right)^2 , \sum S_k T_k, \sum \left(T_k\right)^2 \\ \sum \left(f_k - f_1\right) S_k, \sum \left(f_k - f_1\right) T_k$

Compute $A_1$ and $B_1$ [errata-reei-5]:

$\begin{bmatrix}A_1 \\ B_1\end{bmatrix} = \begin{bmatrix} \sum \left(S_k\right)^2 & \sum S_k T_k \\ \sum S_k T_k & \sum \left(T_k\right)^2 \end{bmatrix}^{-1} \begin{bmatrix} \sum \left(f_k - f_1\right) S_k \\ \sum \left(f_k - f_1\right) T_k \end{bmatrix}$

Compute $\sigma_1$ and $\mu_1$ [errata-reei-6]: $\sigma_1 = \sqrt{-\frac{1}{B_1}} \quad ; \quad \mu_1 = -\frac{A_1}{B_1}$

Result: $\sigma_1$ and $\mu_1$ are the approximations of $\sigma$ and $\mu$ .

To illustrate the calculation (Fig. 1), numerical data (Table 1) was genererated in the following manner: $x_k$ values were chosen at random from the domain under consideration. From the “exact” values $\sigma_e$ and $\mu_e$ (defining the “exact” function $f(x)$ , whose representative curve is plotted as a dashed line in Fig. 1), we computed the exact values of $f(x_k)$ given by equation (1)[errata-reei-7]. We then added perturbations whose amplitude was drawn randomly from a range - to +10% of $f(x_k)$ , which, after rounding, gave us the numerical values of $f_k$ in Table 1.

The outrageous error model is motivated by the need for legibility in the figure, so that the so called “experimental” points, represented by crosses, lie far enough from the dashed curve to be clearly distinguishable. In the same vein, only a handful of points was chosen so that the deviations between the “exact” dashed curves and the calculated solid curves are are highlighted for both the intermediate and the final calculation. The fact that the points are not uniformly distributed across the domain is also a significant complication.

Fig. 1 A sample regression of the Gaussian probability density function.

Table 1 Numerical values corresponding to the example in Fig. 1.¶
$k$	$x_k$	$f_k$	$S_k$	$T_k$
1	-0.992	0.238	0	0
2	-0.935	0.262	0.01425	-0.0137104	$\sigma_e$ = 0.4
3	-0.836	0.38	0.046029	-0.0415616	$\mu_e$ = -0.3
4	-0.404	1.04	0.352965	-0.201022
5	-0.326	0.922	0.429522	-0.229147	$\sigma_1$ = 0.383915
6	-0.042	0.755	0.667656	-0.276331	$\mu_1$ = -0.289356
7	0.068	0.589	0.741576	-0.275872
8	0.302	0.34	0.850269	-0.259172
9	0.439	0.193	0.88678	-0.246335
10	0.58	0.083	0.906238	-0.236968

In Fig. 1, the shape of the curves of the “exact” integrals and the points $\left(x_k, S_k\right)$ and $\left(x_k, T_k\right)$ make the primary reason for the deviations in this method of calculation clearly apparent: numerical integration, while preferable to derivation, is not by any means perfect, and causes the deviations in $\left(\sigma_1, \mu_1\right)$ .

To form an objective opinion about the qualities and defects of the method exposed here, it would be necessary to perform a systematic experimental study of a very large number of cases and examples. In the current state of progress of this study, this remains yet to be done.

It is certain that the deviations, caused by the defects inherent in numerical integration, would be considerably reduced if the points were sufficiently numerous and their abscissae were partitioned at sufficiently regular intervals.

4. Discussion¶

It would be unreasonable to imagine that the method presented here could replace what is currently in use in commercial software, which has the benefits of long term study, experimentation and improvements. We must ask then, what is the motivation behind this work.

Of course, recursive methods generally require a first approximation within the same order of magnitude as the target value. This is not generally a handicap since users are not completely in the dark. One might be tempted to consider this method of integral equations to satisfy the need for initial approximation. However, the need is quite marginal, and should not be seen as a serious motivation.

A simple method, easy to program, like the one shown here, might certainly seduce some potential users in particular situations where we seek full mastery over the calculations that are performed. Users of commercial software are usually satisfied with the results they provide, but may occasionally deplore not knowing precisely what the sophisticated software is doing. Nevertheless, it would be poor motivation indeed for this study to attempt to provide a less powerful tool than what already exists, just to aleviate some of the frustration caused by using tools whose exact mechanisms are unknown.

In fact, we must see in this paper not a specific utilitarian motivation in the case of the Gaussian distribution, but the intention to understand and draw attention to a more general idea: the numerous possibilities offered by integral equations to transform a problem of non-linear regression into a linear regression, computed through a non-iterative process.

It is out of the question to compete against what already exists, and what is more imporant, works well. On the other hand, it would be a pity to forget a method that might potentially help resolve future problems: the method that has been the subject of this paper, whose essence is presented in Section 2.

Appendix 1: Review of Linear Regression¶

When the function that we seek to optimize, $y = F(a, b, c, ...; x)$ , can be written in the form $y = a \; f(x) + b \; g(x) + c \; h(x) + ...$ , according to some number of parameters $a, b, c, ...$ and with known functions $f(x), g(x), h(x), ...$ , the algorithm is linear with respect to the optimization parameters.

Even more generally, if the function $y = F(a, b, c, ...; x)$ can be transformed into $F(x, y) = A \; f(x, y) + B \; g(x, y) + C \; h(x, y)$ with known functions $F(x, y), f(x, y), g(x, y), h(x, y), ..., A(a, b, c, ...), B(a, b, c, ...), C(a, b, c, ...), ...$ the algorithm is again linear with respect to the coefficients $A$ , $B$ and $C$ , even if it is not linear with respect to $a, b, c, ...$ . But it always reverts to a linear regression. The “least squares” method effectively consists of finding the minimum of:

$\begin{cases} \varepsilon^2_{\left(A, B, C, ...\right)} = \sum_{k=1}^n \left( F_k - \left( A \; f_k + B \; g_k + C \; h_k + ... \right) \right)^2 \\ F_k \equiv F(x_k, y_k) \; ; \; f_k \equiv f(x_k, y_k) \; ; \; g_k \equiv g(x_k, y_k) \; ; \; h_k \equiv h(x_k, y_k) \end{cases}$

The partial derivatives with respect to $A, B, C, ...$ determine a system of equations for which solutions, $A_0, B_0, C_0, ...$ are optimal:

$\begin{cases} \left( \frac{\partial \left(\varepsilon^2 \right)}{\partial A} \right)_{A_0, B_0, C_0, ...} = -\sum_{k=1}^n \left( F_k - \left( A_0 \; f_k + B_0 \; g_k + C_0 \; h_k, ... \right) \right) f_k = 0 \\ \left( \frac{\partial \left( \varepsilon^2 \right)}{\partial B} \right)_{A_0, B_0, C_0, ...} = -\sum_{k=1}^n \left( F_k - \left( A_0 \; f_k + B_0 \; g_k + C_0 \; h_k, ... \right) \right) g_k = 0 \\ \left( \frac{\partial \left( \varepsilon^2 \right)}{\partial C} \right)_{A_0, B_0, C_0, ...} = -\sum_{k=1}^n \left( F_k - \left( A_0 \; f_k + B_0 \; g_k + C_0 \; h_k, ... \right) \right) h_k = 0 \\ ... \end{cases}$

The solution to this system, conventionally written with $\sum \equiv \sum_{k=1}^n$ leads to:

$\begin{bmatrix} A_0 \\ B_0 \\ C_0 \\ ... \end{bmatrix} = \begin{bmatrix} \sum f_k^2 & \sum f_k g_k & \sum f_k h_k & ... \\ \sum f_k g_k & \sum g_k^2 & \sum g_k h_k & ... \\ \sum f_k h_k & \sum g_k h_k & \sum h_k^2 & ... \\ ... & ... & ... & ... \end{bmatrix}^{-1} \begin{bmatrix} \sum F_k f_k \\ \sum F_k g_k \\ \sum F_k h_k \\ ... \end{bmatrix}$

Then we obtain the optimized values of $a, b, c, ...$ corresponding to the following system, where the unknowns are $a_0, b_0, c_0, ...$ :

$\begin{cases} A(a_0, b_0, c_0, ...) = A_0 \\ B(a_0, b_0, c_0, ...) = B_0 \\ C(a_0, b_0, c_0, ...) = C_0 \\ ... \end{cases}$

which is a system that is non-linear in the same measure that the functions $A(a, b, c, ...), B(a, b, c, ...), C(a, b, c, ...), ...$ are non-linear. But this does not prevent the regression that was performed from being linear, so even this case has its rightful place in this section.

Of course, this can be further extended by considering more variables, for example $x, y, z, t, ...$ , instead of just $x, y$ , thus working in 3D, or 4D, …, instead of 2D. Everything mentioned here figures in literature in a more detailed and more structured manner, with presentations adapted to the exposition of general theory. The purpose here was only to present a brief review, with a specific notation to be used consistently throughout the remainder of the work.

Appendix 2: Linear Regression of the Gaussian Cumulative Distribution Function¶

We consider the cumulative Gaussian distribution function of two parameters, $\sigma$ and $\mu$ , defined by[errata-reei-8]:

(9)¶ $F(x) = \frac{1}{\sqrt{2 \pi} \; \sigma} \int_{-\infty}^x \text{exp} \left( -\frac{1}{2} \left( \frac{t - \mu}{\sigma}\right)^2 \right) dt$

An example is shown in Fig. 2 (the dashed curve).

Fig. 2 Sample regression of the Gaussian cumulative distribution function.

Table 2 Numerical values corresponding to the example in Fig. 2 [errata-reei-9].¶
$k$	$x_k$	$F_k$	$\text{argErf}(2 F_k - 1)$
1	-0.914	0.001	-2.18512
2	-0.556	0.017	-1.49912	$\sigma_e$ = 0.4
3	-0.49	0.021	-1.43792	$\mu_e$ = 0.3
4	-0.195	0.097	-0.918416
5	0.019	0.258	-0.459283	$\sigma_1$ = 0.374462
6	0.045	0.258	-0.459283	$\mu_1$ = 0.266843
7	0.587	0.704	0.378967
8	0.764	0.911	0.952429
9	0.81	0.911	0.952429
10	0.884	0.979	1.43792

The data are the points that we call “experimental”: $(x_1, F_1), (x_2, F_2), ..., (x_k, F_k), ..., (x_n, F_n)$ , which, in Fig. 2 [errata-reei-10] have a particular dispersion relative to their respective nominal positions $\left(x_k, F(x_k)\right)$ on the dashed curve representing $F(x)$ .

We can write $F(x)$ equivalently in terms of the Erf funcion, pronounced “error function” and defined by:

(10)¶ $\text{erf}(z) = \frac{2}{\sqrt{\pi}} \int_0^z \text{exp} \left( -\tau^2 \right) d\tau$

The change of variable $t = \mu + \sqrt{2} \; \sigma \; \tau$ in (9) gives the relationship:

(11)¶ $F(x) = \frac{1}{\sqrt{\pi}} \int_{-\infty}^{\frac{x - \mu}{\sqrt{2} \sigma}} \text{exp} \left( -\tau^2 \right) d\tau = \frac{1}{2} + \frac{1}{2} \text{erf} \left( \frac{x - \mu}{\sqrt{2} \sigma} \right)$

The inverse function to Erf is designated Erf^(-1), or Erfinv or argErf. We will use the last notation.

Thus, the inverse relationship to (11) can be written:

(12)¶ $\frac{x - \mu}{\sqrt{2} \sigma} = \text{argErf}(2F(x) - 1)$

Which in turn leads to a linear relationship in $A$ and $B$ , defined by:

(13)¶ $y(x) = \text{argErf}(2F(x) - 1) = A \; x + B \quad \begin{cases} A = \frac{1}{\sqrt{2} \sigma} \\ B = -\frac{\mu}{\sqrt{2} \sigma} \end{cases}$

This reduces to the most elementary form of linear regression, relative to the points $(x_k, y_k)$ with $y_k$ previously calculated from:

(14)¶ $y_k = \text{argErf}(2F_k - 1)$

The optimal values of $A_1$ and $B_1$ are the solutions to the following system:

(15)¶ $\begin{bmatrix} A_1 \\ B_1 \end{bmatrix} = \begin{bmatrix} \sum \left(x_k\right)^2 & \sum x_k \\ \sum x_k & n \end{bmatrix}^{-1} \begin{bmatrix} \sum y_k x_k \\ \sum y_k \end{bmatrix}$

with the convention that $\sum \equiv \sum_{k=1}^n$ . We then deduce $\sigma_1$ and $\mu_1$ from (13):

(16)¶ $\sigma_1 = \frac{1}{\sqrt{2} A_1} \quad ; \quad \mu_1 = -\frac{B_1}{A_1}$

For the example shown, the numerical values of $\sigma_1$ and $\mu_1$ that we obtain are shown in Table 2 [errata-reei-9]. The curve of the corresponding function is marked with a solid line in Fig. 2 [errata-reei-11]. It neighbors the “theoretical” curve, which is dashed.

In fact, the example was intentionally chosen to have a very small number of widely dispersed points, so as to make the two curves clearly distinct from each other. This is depreciative rather than representative of the quality of fit that can usually be obtained.

Here is a summary of the very simple numerical computation:

Data: $(x_1, F_1), (x_2, F_2), ..., (x_k, F_k), ..., (x_n, F_n)$

Compute $y_k$ : $y_k = argErf(2F_k - 1)$

Compute $\sum x_k$ , $\sum \left(x_k\right)^2$ , $\sum y_k$ , $\sum y_k x_k$

Compute $A_1$ and $B_1$ :

$\begin{bmatrix} A_1 \\ B_1 \end{bmatrix} = \begin{bmatrix} \sum \left(x_k\right)^2 & \sum x_k \\ \sum x_k & n \end{bmatrix}^{-1} \begin{bmatrix} \sum y_k x_k \\ \sum y_k \end{bmatrix}$

Compute $\sigma_1$ and $\mu_1$ : $\sigma_1 = \frac{1}{\sqrt{2} A_1} \quad ; \quad \mu_1 = -\frac{B_1}{A_1}$

Result: $\sigma_1$ and $\mu_1$ are the approximations of $\sigma$ and $\mu$ .

If a software package implementing argErf is not readily available, a sample listing for the functions Erf and argErf is shown in the next section.

Note: The material in Appendices 1 and 2 is well known. All the same, it was useful to highlight the fundamental differences between the regression problems reviewed in Appendix 1 and those in Section 2 of the main paper. It is equally useful to provide examples that showcase the noteworthy differences between the application of regression to the Gaussian distribution, on one hand to the cumulative function (Appendix 2) and on the other to the more difficult case of the density function (Section 3 of the main text).

Listings for the functions Erf and argErf¶

The approximate values of Erf(x) are obtained with a minimum of eight significant digits of precision. We use the following series expansion:

$\text{Erf}(x) \simeq \frac{2x}{\sqrt{\pi}} \sum_{k=0}^{30} \frac{(-1)^k x^{2k}}{k!(2k + 1)} \begin{cases} \left| x \right| < 2.7 \\ \left| \text{Erf}(x) \right| < 0.999866 \end{cases}$

completed by the asymptotic expansion:

$\text{Erf}(x) \simeq \pm 1 - \frac{e^{-x^2}}{x \sqrt{\pi}} \sum_{k=0}^5 \frac{(-1)^k (2k + 1)!!}{x^{2k}} \begin{cases} + \enspace \text{if} \enspace x > 2.7 \enspace ; \enspace - \enspace \text{if} \enspace x < 2.7 \\ (2k + 1)!! = 1 * 3 * ... * (2k + 1) \\ 0.999865 < \left| \text{Erf}(x) \right| < 1 \end{cases}$

The inverse function $\text{argErf}(y)$ is calculated using Newton-Raphson’s method. The result is obtained to at least eight significant digits of precision if $\left| y \right| < 0.999999999998 \rightarrow \left| \text{argErf}(y) \right| < 5$ . Outside that domain, the result is insignificant.

The following listing is written in Pascal, and uses only the most elementary contructs and syntax. It should not be difficult to translate into any other language of choice.

Function Erf(x:extended):extended;
var
   y,p:extended;
   k:integer;
begin
     y:=0;
     p:=1;
     if ((x>-2.7)and(x<2.7)) then
     begin
          for k:=0 to 30 do
          begin
               y:=y+p/(2*k+1);
               p:=-p*x*x/(k+1);
          end;
          y:=y*2*x/sqrt(pi);
     end else
     begin
          for k:= 0 to 5 do
          begin
               y:=y+p;
               p:=-p*(2*k+1)/(2*x*x);
          end;
          y:=y*exp(-x*x)/(x*sqrt(pi));
          if x>0 then y:=1-y else y:=-1-y;
     end;
     Erf:=y;
end;

Function argErf(y:extended):extended;
var
   x:extended;
   k:integer;
begin
     x:=0;
     for k:=1 to 30 do
     begin
          x:=x+exp(x*x)*sqrt(pi)*(y-Erf(x))/2;
     end;
     argErf:=x;
end;

We can test the computation by comparing against the values in the table below (as well as the negatives of the same values):

$x = \text{argErf}(y)$	$\text{Erf}(x) = y$
0.001	0.001128378791
0.1	0.112462916
1	0.8427007929
2	0.995322265
2.699	0.9998648953
2.701	0.998664351
4	0.9999999846
5	0.9999999999984

Non-Linear Regression of the Types: Power, Exponential, Logarithmic, Weibull¶

Jean Jacquelin

The first revision of this paper was dated 01/18/2009.

The current version was updated on 04/23/2009.

Abstract¶

The parameters of power, exponential, logarithmic and Weibull functions are optimized by a non-iterative regression method based on an appropriate integral equation.

Non-Linear Regression of the Types: Power, Exponential, Logarithmic, Weibull

Jean Jacquelin

1. Introduction¶

The following two examples of regression will be treated simultaneously:

(17)¶ $y = a + b \; X^c \quad (X > 0)$

(18)¶ $y = a + b \; \text{exp}(c \; x)$

Indeed, if we take formula (17), with data points $(X_1, y_1), ..., (X_k, y_k), ..., (X_n, y_n)$ , we can pre-calculate

(19)¶ $x_k = \text{ln}(X_k)$

which turns into (18) with the data points $(x_1, y_1), ..., (x_k, y_k), ..., (x_n, y_n)$ .

Various other equations can be transformed into these two formulae:

The equation $y = a + b' \; \text{exp}(c \; (x - \mu))$ is identical to (18) with the substitution $b = b' \; \text{exp}(-c \; \mu)$ .
The equation $y = \alpha + \beta \; \text{ln}(x - \gamma)$ turns into (18) when we interchange the $x$ and $y$ values. This motivates the regression of logarithmic functions of three parameters.

Todo

Add to scikit
And so on. In particular, the case of the Weibull function of three parameters will be treated in Section 3.

The method to be used was described in Section 2 of the article Regressions and Integral Equations.

2. Regression of Functions of the Form $y(x) = a + b \; \text{exp}(c \; x)$ ¶

Integating the function $y(x)$ results in:

(20)¶ $\int_{x1}^x y(u)du = a \; (x - x_1) + \frac{b}{c} \; \text{exp}(c x) - \frac{b}{c} \; \text{exp}(c x_1)$

Replacing $\text{exp}(c \; x)$ with (18):

(21)¶ $\int_{x1}^x y(u)du = a \; (x - x_1) + \frac{1}{c} \; (y - a) - \frac{b}{c} \; \text{exp}(c \; x_1)$

From which we have the integral equation to be used:

(22)¶ $y - (a + b \; \text{exp}(c \; x_1)) = -a \; c \; (x - x_1) + c \int_{x_1}^x y(u)du$

The numerical approximations of the integral for $x = x_k$ are calculated with:

(23)¶ $\begin{cases} S_1 = 0 \quad \text{and for} \enspace k = 2 \rightarrow n \enspace \text{:} \\ S_k = S_{k-1} + \frac{1}{2}(y_k + y_{k-1})(x_k - x_{k-1}) \end{cases}$

When we replace the exact unknowns in (22) with their respective approxmations, the equation no longer holds true:

(24)¶ $y_k - y_1 \simeq -a \; c \; (x_k - x_1) + c \; S_k$

Minimizing the sum of the squared residuals:

(25)¶ $\sum_{k=1}^n \varepsilon_k^2 = \sum_{k=1}^n \left( A \; (x_k - x_1) + B \; S_k - (y_k - y_1) \right)^2$

with

(26)¶ $A = -a \; c \quad ; \quad B = c$

results in a linear regression with respect to the coefficients $A$ and $B$ , whose optimal values $A_1$ and $B_1$ can be obtained in the usual manner (with the convention that $\sum \equiv \sum_{k=1}^n$ ):

(27)¶ $\begin{bmatrix}A_1 \\ B_1\end{bmatrix} = \begin{bmatrix} \sum \left((x_k - x_1)^2\right) & \sum (x_k - x_1) \; S_k \\ \sum (x_k - x_1) \; S_k & \sum \left(S_k^2\right) \end{bmatrix}^{-1} \begin{bmatrix} \sum (y_k - y_1)(x_k - x_1) \\ \sum (y_k - y_1) \; S_k \end{bmatrix}$

We the get the optimal values $a_1$ and $c_1$ according to (26):

(28)¶ $a_1 = -\frac{A_1}{B_1} \quad ; \quad c_1 = B_1$

Only two parameters are resolvable from the form of integral equation that we chose. The third parameter appears in the numerical calculations, but not directly in the equation, so another regression is necessary to obtain it. In fact, considering (18), the result would be further improved by computing the regression on both $a$ and $b$ :

(29)¶ $\sum_{k=1}^n \varepsilon_k^2 = \sum_{k=1}^n \left( a + b \; \text{exp}(c_1 \; x_k) - y_k \right)^2$

Setting:

(30)¶ $c_2 = c_1 \quad ; \quad \theta_k = \text{exp}(c_2 \; x_k)$

the resulting regression is:

(31)¶ $\begin{bmatrix}a_2 \\ b_2\end{bmatrix} = \begin{bmatrix} n & \sum \theta_k \\ \sum \theta_k & \sum \theta_k^2 \end{bmatrix}^{-1} \begin{bmatrix} \sum y_k \\ \sum y_k \theta_k \end{bmatrix}$

Here is a summary of the computation:

Data in the case $y = a + b \; X^c$

$(X_1, y_1), (X_2, y_2), ..., (X_k, y_k), ..., (X_n, y_n)$

Compute $x_k = \text{ln}(X_k)$

Data in the case $y = a + b \; \text{exp}(c \; x)$

$(x_1, y_1), (x_2, y_2), ..., (x_k, y_k), ..., (x_n, y_n)$

Rank points in order of ascending $x_k$

Compute $S_k$ according to (23).

Compute $\sum (x_k - x_1)^2, \sum (x_k - x_1) \; S_k, \sum S_k^2, \\ \sum (y_k - y_1)(x_k - x_1), \sum (y_k - y_1) \; S_k$

Compute $B_1$ from system (27)

With $c_1 = c_2 = B_1$ , compute $\theta_k$ (from relationships in (28) and (30))

Compute $\sum \theta_k, \sum \left(\theta_k\right)^2, \sum y_k, \sum y_k \theta_k$

Compute $a_2$ and $b_2$ from system (31).

Result: $a_2$ , $b_2$ and $c_2$ are the approximations of $a$ , $b$ and $c$ .

To illustrate this calculation (Fig. 3), numerical data were generated in the following manner: $x_k$ were drawn at random from the domain under consideration. Initially, “exact” values $a_e$ , $b_e$ and $c_e$ defined the function $y(x)$ , which we call the “true” form of equation (18), and whose curve is traced by a dotted line in the figure. The exact values of $y(x_k)$ were then assigned perturbations of an amplitude drawn at random from a range between - and +10% of $y(x_k)$ , which, after rounding, produced the numrical values of $y_k$ in Table 3, represented by crosses in the figure.

Finally, the result $(a_2, b_2, c_2)$ was substituted into equation (18), to obtain the fitted curve, plotted as a solid line.

Fig. 3 A sample regression for the function $y = a + b \; \text{exp}(c \; x)$

Table 3 Numerical values corresponding to the example in Fig. 3.¶
$k$	$x_k$	$y_k$	$S_k$
1	-0.99	0.418	0
2	-0.945	0.412	0.018675	$a_e$ = 0.3
3	-0.874	0.452	0.049347	$b_e$ = 0.6
4	-0.859	0.48	0.056337	$c_e$ = 1.7
5	-0.64	0.453	0.1585
6	-0.573	0.501	0.19046	$a_2$ = 0.313648
7	-0.433	0.619	0.268859	$b_2$ = 0.574447
8	-0.042	0.9	0.565824	$c_2$ = 1.716029
9	-0.007	0.911	0.597517
10	0.054	0.966	0.654765
11	0.088	0.966	0.687609
12	0.222	1.123	0.827572
13	0.401	1.414	1.05463
14	0.465	1.683	1.15374
15	0.633	2.101	1.47159
16	0.637	1.94	1.47968
17	0.735	2.473	1.69591
18	0.762	2.276	1.76002
19	0.791	2.352	1.82713
20	0.981	3.544	2.38725

To form an objective opinion of the qualities and defects of the method presented here, it would be necessary to perform a systematic study of a very large number of cases and examples. It is certain even now that the deviations caused by errors in numerical integration would be considerably reduced as the number of points increases and their spacing becomes more uniform.

3. Regression of the Three-Parameter Weibull Cumulative Distribution Function¶

The Weibull cumulative distribution function of three parameters ( $\alpha$ , $\beta$ , and $\mu$ ) is defined by:

(32)¶ $F(t) = 1 - \text{exp} \left( -\left( \frac{t - \mu}{\beta} \right)^\alpha \right)$

For data given as $(t_1, F_1), ..., (t_k, F_k), ..., (t_n, F_n)$ , we seek to optimize $\mu$ , $\beta$ , and $\alpha$ in such a way that relationship (32) is approximately and optimally satisfied across the $n$ data points.

The inverse function of (32) is:

(33)¶ $t = \mu + \beta \left( -\text{ln}(1 - F) \right)^{\frac{1}{\alpha}}$

Setting:

(34)¶ $x = \text{ln} \left( -\text{ln}(1 - F) \right)$

and:

(35)¶ $y = t \quad ; \quad a = \mu \quad ; \quad b = \beta \quad ; \quad c = \frac{1}{\alpha}$

We can see immediately that we have transformed the problem into (18), shown above: $y = a + b \; \text{exp}(c \; x)$ .

The algorithm is deduced immediately as we transpose the notation:

Data

$(t_1, F_1), ..., (t_k, F_k), ..., (t_n, F_n)$

Procedure

Rank points in order of ascending $F_k$

Compute $x_k = \text{ln} \left( -\text{ln}(1 - F_k) \right)$

Compute $S_k$ :

$\begin{cases} S_1 = 0 \quad \text{and for} \quad k = 2 \rightarrow n \; : \\ S_k = S_{k-1} + \frac{1}{2}(t_k + t_{k-1})(x_k - x_{k-1}) \end{cases}$

Compute $B$

$\begin{bmatrix} A \\ B \end{bmatrix} = \begin{bmatrix} \sum \left((x_k - x_1)^2\right) & \sum \left(x_k - x_1\right) S_k \\ \sum \left(x_k - x_1\right) S_k & \sum \left( S_k^2 \right) \end{bmatrix}^{-1} \begin{bmatrix} \sum (t_k - t_1)(x_k - x_1) \\ \sum (t_k - t_1) S_k \end{bmatrix}$

Obtain $\alpha_c = \frac{1}{B}$

Compute $\theta_k = \text{exp}(B \; x_k)$

Compute $\beta_c$ and $\mu_c$ : $\begin{bmatrix}\mu_c \\ \beta_c\end{bmatrix} = \begin{bmatrix}n & \sum \theta_k \\ \sum \theta_k & \sum \theta_k^2 \end{bmatrix}^{-1} \begin{bmatrix}\sum t_k \\ \sum t_k \theta_k\end{bmatrix}$

Result: $\alpha_c$ , $\beta_c$ and $\mu_c$ are the approximations of $\alpha$ , $\beta$ and $\mu$

In graphical representations, it is customary to display $\text{ln}(t_k)$ as the abscissa and $\text{ln}(-\text{ln}(1 - F_k))$ as the ordinate. This is a legacy of the graphical linearization method used for the case where $\mu = 0$ . In order to respect this tradition, we will interchange the axes and display $\text{ln}(t_k)$ as the abscissa and $x_k$ as the ordinate.

Weibull’s law applies generally to the failure of materials and objects. Since the variable $t$ is time, the $t_k$ and $F_k$ are effectively sorted in ascending order.

A sample regression is shown in Fig. 4. To simulate an experiment, numerical data $(t_k, F_k)$ were generated from “exact” values $\alpha_e$ , $\beta_e$ , $\mu_e$ , defining the “true” $F(t)$ according to (32), whose curve is shown as a dashed line in the figure. A value of $t_k$ calculated from (33) corresponds to each $F_k$ . These values of $t$ are subjected to random deviations, simulating experimental uncertainty, which gives us the values of $t_k$ in Table 4. Finally, the result ( $\alpha_c$ , $\beta_c$ , and $\mu_c$ ) substituted into (32) gives the “fitted” function $F_c(t)$ (whose curve is plotted as a solid line):

(36)¶ $F_c(t) = 1 - \text{exp} \left( - \left( \frac{t - \mu_c}{\beta_c} \right)^{\alpha_c} \right)$

Fig. 4 A sample regression of a Weibull cumulative distribution function of three parameters.

Table 4 Numerical values corresponding to the example in Fig. 4.¶
$k$	$t_k$	$F_k$	$\text{ln}(-\text{ln}(1 - F_k))$	$S_k$
1	1.202	0.033	-3.39452	0
2	1.397	0.082	-2.45856	1.21627	$\alpha_e$ = 2.4
3	1.537	0.131	-1.96317	1.94301	$\beta_e$ = 1.6
4	1.57	0.181	-1.61108	2.48998	$\mu_e$ = 0.8
5	1.768	0.23	-1.34184	2.93935
6	1.856	0.279	-1.11744	3.34596	$\alpha_c$ = 2.44301
7	1.87	0.328	-0.922568	3.70901	$\beta_c$ = 1.55262
8	1.889	0.377	-0.748219	4.0367	$\mu_c$ = 0.82099
9	1.918	0.426	-0.58856	4.3406
10	2.098	0.475	-0.439502	4.63991
11	2.212	0.524	-0.297951	4.94496
12	2.349	0.573	-0.161377	5.25641
13	2.453	0.622	-0.027514	5.57782
14	2.557	0.671	0.105888	5.91199
15	2.596	0.72	0.241349	6.26101
16	2.602	0.769	0.382086	6.62678
17	2.678	0.818	0.532831	7.02475
18	2.706	0.867	0.701813	7.47965
19	3.089	0.916	0.907023	8.07424
20	3.441	0.965	1.20968	9.06241

The representation in the traditional coordinate system shows that the presence of a non-zero $\mu_c$ parameter prevents the usual graphical linearization, which is naturally to be expected. Carrying out the regression through an integral equation allows us to linearize and obtain a curve which may be substituted for the traditional method, appreciably improving the fit.

4. Discussion¶

The preceding examples (Section 2 and Section 3), show how a non-linear regression can be reduced to a linear one given the appropriate integral equation. In this manner, the usual iterative procedures can be relpaced by a simple linear calculation.

Regression of Sinusoids¶

Jean Jacquelin

The first revision of the paper Regressions of Sinusoids was dated 01/09/2009.

The current version was published on 02/15/2009.

1. Introduction¶

Under the seemingly innocuous title “Regression of Sinusoids”, ought to appear the subtitle “An Optimization Nightmare” to add a touch of realism. Indeed, we must have already been concerned with the problem to fully understand the relevance of the word. But what is it, really?

As with any number of similar problems, the data is comprised of $n$ experimental points $(x_1, y_1), (x_2, y_2), ..., (x_k, y_k), ..., (x_n, y_n)$ . We seek to adjust the parameters of a function $y = f(x)$ in such a way that its curve passes “as closely as possible” to the data points. In this particular case, we will deal with the following sinusoidal function of the four parameters $a$ , $b$ , $c$ , and $\omega$ :

(37)¶ $f(x) = a + b \; \text{sin}(\omega \; x) + c \; \text{cos}(\omega \; x)$

This function is equivalent to:

(38)¶ $\begin{cases} f(x) = a + \rho \; \text{sin}(\omega \; x + \varphi) \\ \rho = \sqrt{b^2 + c^2} \quad ; \quad b = \rho \; \text{cos}(\varphi) \quad ; \quad c = \rho \; \text{sin}(\varphi) \end{cases}$

The expression “as closely as possible” implies an optimization criterion. Specifically, we consider the sum of the squared residuals:

(39)¶ $\varepsilon^2_{a, b, c, \omega} = \sum_{k=1}^n \left( y_k - f(x_k) \right )^2 = \sum_{k=1}^n \left( y_k - \left( a + b \; \text{sin}(\omega \; x_k) + c \; \text{cos}(\omega \; x_k) \right) \right)^2$

Since that is the sum that we wish to minimize, we have the generic name “least squares method”.

When $\omega$ is known a-priori, the solution is trivial. Equation (37) effectively becomes linear in the optimization parameters ( $a$ , $b$ and $c$ ). This well defined case merits only the briefest mention, which will be made in the next section.

In all other cases, the regression (or optimization) is non-linear, due to the fact that the sum of squares has a non-linear dependency on $\omega$ .

An almost equally favorable situation arises when we have a “good enough” approximation for $\omega$ , suitable to initialize an arbitrary non-linear optimization method. Literature is full of descriptions of such methods and many are implemented in software. To discuss these methods further would exceed the scope of the framework of this paper.

But the dreaded nightmare is not far. It is crucial that the initial estimate for $\omega$ is “good enough”… And this is where sinusoidal functions differ from more accomodating non-linear functions: The more periods that the $x_k$ span, the more randomly they are distributed, or the more noise is present in the $y_k$ values, the more the condition “good enough” must be replaced with “very good”, tending to “with great precision”. In other words, we must know the value of $\omega$ we are looking for in advance!

The original method proposed in Section 3 provides a starting point for addressing this challenge. Admittedly, it would be wrong to pretend that the method is particularly robust: Section 4 will discuss some of its deficiencies. Nevertheless, thanks to the first result, we will see in Section 5 that an original regression method (a saw’s tooth), allows for a much more accurate approximation of $\omega$ through an improved linearization. Finally, Section 6 presents a summary of preformance in the course of systematic experiments. A synopsis of the entire method, which does not involve any iterative calculations, is presented in the Appendices.

Before getting into the heart of the matter, a warning must be given regarding some of the figures presented here (Fig. 5, Fig. 6, Fig. 7, Fig. 11, Fig. 14). They serve merely as illustrations of the procedures being described. To create them, we were obligated to fix on a particuar numeroical data set, which is not necessarily representative of the multitudes of possible cases. Given these figures alone, it would be absurd to form any opinion of the method in question, favorable or otherwise. This is especially true as the example has been selected with data that exaggerate all the defects that are explained in the text, to allow for easy identification and unambiguous discussion of the important features.

We see in Table 5 that our “experimental” points are sparse, non-uniformly distributed, and widely dispersed. One might suspect that this is not real experimental data, but rather a simulation obtained in the following way: An “exact” function is given, along with the coefficients $a_e$ , $b_e$ , $c_e$ , and $\omega_e$ , indicated in Table 5. The curve is plotted in Fig. 5 with a dashed line. The $x_k$ were selected at random from the domain, and rounded to the values shown in Table 5. The corresponding exact $y$ values are then computed from equation (37). The $y_k$ were then randomly perturbed with a distribution whose root mean square $\sigma_e$ was approximately 10% of the amplitude $\rho_e$ of the sinusoid. This represents a much wider dispersion compared to what is usually seen in practice.

Fig. 5 “Exact” sinusoid along with the numerical data of our example.

Table 5 Numerical values corresponding to the data in Fig. 5.¶
$k$	$x_k$	$y_k$
1	-1.983	0.936
2	-1.948	0.810	$\omega_e$ = 2
3	-1.837	0.716	$a_e$ = -0.4
4	-1.827	0.906	$b_e$ = 1.3
5	-1.663	0.247	$c_e$ = -0.6
6	-0.815	-1.513	$\rho_e$ = 1.431782
7	-0.778	-1.901	$\sigma_e$ = 0.1491
8	-0.754	-1.565
9	-0.518	-1.896
10	0.322	0.051
11	0.418	0.021
12	0.781	1.069
13	0.931	0.862
14	1.510	0.183
15	1.607	0.311

The root mean square is defined by:

(40)¶ $\sigma = \sqrt{\frac{1}{n} \sum_{k=1}^n \left( f(x_k) - y_k \right)^2}$

The values of the function $f(x)$ are to be computed using the parameters $a$ , $b$ , $c$ and $\omega$ from the example under consideration.

The values of $y_k$ , indicated in Table 5 and plotted in Fig. 5, are the data for the numerical examples in the following sections. It should be noted that the exact values $a_e$ , $b_e$ , $c_e$ and $\omega_e$ , shown in Table 5 are not used further on (except $\omega_e$ in the “trivial” case in Section 2). They should be forgotten for the remainder of the algorithm, which uses only $(x_k, y_k)$ as data. While the exact sinusoid will be shown on the various figures as a reminder, it is not an indication of the fact that the function (37) with the parameters ( $a_e, b_e, c_e, \omega_e$ ) is ever used directly.

The tables serve another role, which will be appreciated by readers interested in implementing these techniques in a computer program. If so desired, all of the sample calculations shown here can be reproduced exactly with the data and results reported below. This provides a means to validate and correct a regression program implementing the algorithms described here.

2. Case Where $\omega$ is Known A-Priori¶

When the value $\omega = \omega_e$ is fixed, the optimization only needs to be performed for the parameters $a$ , $b$ and $c$ of (37). Taking the partial derivatives of (39) with respect to the optimization parameters result in a system of three equations:

(41)¶ $\begin{cases} \left( \frac{\partial \varepsilon^2}{\partial a} \right)_ {(a_0, b_0, c_0)} = -2 \sum_{k=1}^n \left( y_k - \left( a_0 + b_0 \; \text{sin}(\omega_e \; x_k) + c_0 \; \text{cos}(\omega_e \; x_k)\right) \right) = 0 \\ \left( \frac{\partial \varepsilon^2}{\partial b} \right)_ {(a_0, b_0, c_0)} = -2 \sum_{k=1}^n \left( y_k - \left( a_0 + b_0 \; \text{sin}(\omega_e \; x_k) + c_0 \; \text{cos}(\omega_e \; x_k)\right) \right) \; \text{sin}(\omega_e \; x_k) = 0 \\ \left( \frac{\partial \varepsilon^2}{\partial c} \right)_ {(a_0, b_0, c_0)} = -2 \sum_{k=1}^n \left( y_k - \left( a_0 + b_0 \; \text{sin}(\omega_e \; x_k) + c_0 \; \text{cos}(\omega_e \; x_k)\right) \right) \; \text{cos}(\omega_e \; x_k) = 0 \\ \end{cases}$

The solution is given in the following system (42), with the convention that $\sum \equiv \sum_{k=1}^n$ :

(42)¶ $\begin{bmatrix} a_0 \\ b_0 \\ c_0 \end{bmatrix} = \begin{bmatrix} n & \sum \text{sin}(\omega_e x_k) & \sum \text{cos}(\omega_e x_k) \\ \sum \text{sin}(\omega_e x_k) & \sum \text{sin}^2(\omega_e x_k) & \sum \text{sin}(\omega_e x_k) \text{cos}(\omega_e x_k) \\ \sum \text{cos}(\omega_e x_k) & \sum \text{sin}(\omega_e x_k) \text{cos}(\omega_e x_k) & \sum \text{cos}^2(\omega_e x_k) \end{bmatrix}^{-1} \begin{bmatrix} \sum y_k \\ \sum y_k \; \text{sin}(\omega_e x_k) \\ \sum y_k \; \text{cos}(\omega_e x_k) \end{bmatrix}$

The result is presented in figure Fig. 6. We note that the root mean square $\sigma_0$ is practically identical to $\sigma_e$ in Table 5. This indicates that the regression did not increase the residuals of the fitted sinusoid relative to the “exact” one. Obviously, this is too good to always be true. So, after this brief review, we must tackle the crux of the problem: to compute a sufficiently accurate approximation for $\omega$ , since it will not be know a-priori as it was in the preceding example.

Fig. 6 Case where $\omega$ is know exactly.

Table 6 Fitting parameters of the curve shown in Fig. 6.¶
$\omega_e$ = 2
$a_0$ = -0.397904
$b_0$ = 1.283059
$c_0$ = -0.573569
$\rho_0$ = 1.405426
$\sigma_0$ = 0.147456

3. Linearization Through an Integral Equation¶

The goal being to reduce the problem to a linear regression, it is sometimes tempting to use a linear differential equation whose solution is our function of interest as the intermediary. An example is the follwing equation, for which the sinusiod (37) is a solution[errata-reei-12]:

(43)¶ $f(x) = a - \frac{1}{\omega^2} \frac{d^2 f(x)}{dx^2}$

Taking the appropriate partial derivatives of (44) leads to a system of linear equations of two unknowns $a$ and $\beta = \frac{1}{\omega^2}$ .

(44)¶ $\varepsilon^2_{(a, b, c, \omega)} = \sum_{k=1}^n \left( y_k - f(x_k) \right)^2 = \sum_{k=1}^n \left( y_k - a - \beta y''(x_k) \right)^2$

Unfortunately, this is not a viable solution in practice (unless we have at our disposal a very large number of especially well distributed samples). The stumbling block here is the calculation of $y''(x_k)$ from the $n$ data points $(x_k, y_k)$ : as illustrated in Fig. 7, the fluctuations are generally too unstable for meaningful approximation.

On the other hand, the computation of numerical integrals is clearly less problematic than that of derivatives. It is therefore no surprise that we will seek an integral equation having for its solution our function of interest. For example, in the present case, we have equation (45), for which the sinusoid (37) is a solution:

(45)¶ $f(x) = -\omega^2 \int_{x_i}^x \int_{x_j}^v f(u) du \; dv + P(x)$

$P(x)$ is a second-degree polynomial whose coefficiens depend on $a$ , $b$ , $c$ , $\omega$ and the limits of integration $x_i$ and $x_j$ .

We could, of course, go into a great amount of detail regarding $P(x)$ , but that would derail the presentation without much immediate benefit. We can simply use the fact that a complete analysis shows that the limits of integration have no bearing on the regression that follows, at least with regard to the quantity of interest, the optimized value of $\omega$ (noted as $\omega_1$ ). Therefore, to simplify, we will set $x_i = x_j = x_1$ , which results in a function of the following form:

(46)¶ $f(x) = A \; SS(x) + B \; x^2 + C \; x + D$

(47)¶ $\text{with:} \; \begin{cases} SS(x) = \int_{x_1}^x \int_{x_1}^v f(u) du \; dv \quad ; \quad A = -\omega^2 \quad ; \quad B = \frac{1}{2}a \; \omega^2 \\ C = -a \; \omega^2 \; x_1 + b \; \omega \; \text{cos}(\omega \; x_1) - c \; \omega \; \text{cos}(\omega \; x_1) \\ D = a + \frac{1}{2} \; a \; \omega^2 \; x_1^2 + (b + c \; \omega \; x_1) \; \text{sin}(\omega \; x_1) + (c - b \; \omega \; x_1) \; \text{cos}(\omega \; x_1) \end{cases}$

The coefficients $A$ , $B$ , $C$ , $D$ are unknown. However, they can be approximated through linear regression based on the precomputed values of $SS_k$ . To accomplish this, we perform two numeric integrations:

(48)¶ $\begin{cases} S(x_1) = S_1 = 0 \\ S(x_k) \simeq S_k = S_{k-1} + \frac{1}{2}(y_k + y_{k-1}) (x_k - x_{k-1}) \quad k = 2 \; \text{to} \; n \end{cases}$

(49)¶ $\begin{cases} SS(x_1) = SS_1 = 0 \\ SS(x_k) \simeq SS_k = SS_{k-1} + \frac{1}{2}(S_k + S_{k-1}) (x_k - x_{k-1}) \quad k = 2 \; \text{to} \; n \end{cases}$

It goes without saying that the points must first be ranked in order of ascending $x_k$ . It follows that the sum of squared residuals to minimize is the following[errata-reei-13]:

(50)¶ $\varepsilon^2_{a, b, c, \omega} = \sum_{k=1}^n \left( y_k - \left( A \; SS_k + B \; x_k^2 + C \; x_k + D \right) \right)^2$

There is no need to reiterate the partial derivatives with respect to $A$ , $B$ , $C$ and $D$ , used to obtain the optimized $A_1$ , $B_1$ , $C_1$ and $D_1$ , followed by $a_1$ , $b_1$ , $c_1$ and $\omega_1$ (according to (47)):

(51)¶ $\begin{bmatrix} A_1 \\ B_1 \\ C_1 \\ D_1 \end{bmatrix} = \begin{bmatrix} \sum \left( SS_k \right)^2 & \sum x_k^2 \; SS_k & \sum x_k \; SS_k & \sum SS_k \\ \sum x_k^2 \; SS_k & \sum x_k^4 & \sum x_k^3 & \sum x_k^2 \\ \sum x_k \; SS_k & \sum x_k^3 & \sum x_k^2 & \sum x_k \\ \sum SS_k & \sum x_k^2 & \sum x_k & n \end{bmatrix}^{-1} \begin{bmatrix} \sum y_k \; SS_k \\ \sum y_k x_k^2 \\ \sum y_k x_k \\ \sum y_k \end{bmatrix}$

(52)¶ $\begin{cases} \omega_1 = \sqrt{-A_1} \quad ; \quad a_1 = \frac{2 B_1}{\omega_1^2} \\ b_1 = \left( B_1 \; x_1^2 + C_1 \; x_1 + D_1 - a_1 \right) \text{sin}(\omega \; x_1) + \frac{1}{\omega_1} \left( C_1 + 2 B_1 x_1 \right) \text{cos}(\omega \; x_1) \\ c_1 = \left( B_1 \; x_1^2 + C_1 \; x_1 + D_1 - a_1 \right) \text{cos}(\omega \; x_1) - \frac{1}{\omega_1} \left( C_1 + 2 B_1 x_1 \right) \text{sin}(\omega \; x_1) \end{cases}$

The results for our sample dataset are displayed further along in Fig. 14. The numerical values of $\omega_1$ , $a_1$ , $b_1$ and $c_1$ are shown in Table 14. To compare the graphical and numerical representations with resepect to the data points $(x_k, y_k)$ , refer to the curve and table column labeled $(1)$ .

Incidentally, it is interesting to compare the results of numerical integrations (48) and (49) with respect to differentiation (Fig. 7). Observing the significant oscillations of the latter (a sample point is noted $f'_k$ on the plot), and how they would be exacerbated by further differentiation (not shown due to the excessive fluctuation amplitude), it becomes patently obvious that a feasible solution must be based on integral equations rather than differential.

Fig. 7 Numerical integrations and comparison with differentiations.[errata-reei-14].

Table 7 Cumulative sums corresponding to Fig. 7.¶
$k$	$S_k$	$SS_k$
1	0	0
2	0.030555	0.000534713
3	0.115248	0.00862678
4	0.123358	0.00981981
5	0.217904	0.0378033
6	-0.31888	-0.00501053
7	-0.382039	-0.0179775
8	-0.423631	-0.0276456
9	-0.832029	-0.175813
10	-1.60693	-1.20018
11	-1.60347	-1.35428
12	-1.40564	-1.90043
13	-1.26081	-2.10041
14	-0.958286	-2.74284
15	-0.934327	-2.83463

While noting that with more numerous and uniformly distributed $x_k$ , coupled with less scattered $y_k$ would result in more acceptable fluctuations in the derivatives, we must also remember that a good solution must apply not only to the trivial cases, but the difficult ones as well.

4. A Brief Analysis of Performance¶

The most important parameter to optimize is $\omega$ . This comes with the understanding that once the optimization has succeeded, we can always fall back to the conventional regression method in Section 2 to obtain $a$ , $b$ and $c$ . We will therefore concentrate on an investigation restricted to results with respect to $\omega$ . The three most influential factors are:

The number of points, $n_p$ , in each period of the sinusoid.

The distribution of samples in the domain:

Either uniform: $x_{k+1} - x_k = constant$

Or random: $x_k$ is drawn at random from the available domain.

The scatter of the ordinates $y_k$ , characterized by $(\sigma_1 / \rho_e)$ , the ratio of the root mean squared error (40) to the amplitude of the sinusoid.

4.1 Uniform Distribution of Abscissae With Non-Dispersed Ordinates¶

We find that the ratio $\omega_1 / \omega_e$ , which is expected to be equal to 1, is affected by a deviation inversely proportional to $n_p$ (Fig. 8). It is conceivably possible to construct an empirical function to correct the deviation. But such a function would not be very interesting, as the correction would not be satisfactory for the randomly distributed points we well investigate further on. A more general approach, described in Section 5, seems more appropriate.

Fig. 8 Effect of the number of points per period, with a uniform distribution.

Table 8 Numerical results corresponding to Fig. 8.[errata-reei-15]¶
$n_p$	$\frac{\omega_1}{\omega_e}$
5	1.273
6	1.156
7	1.103
8	1.073
9	1.055
10	1.043
11	1.034
12	1.028
13	1.023
14	1.02
15	1.017
16	1.015
17	1.013
18	1.012
19	1.01
20	1.009

Todo

Place table next to figure if possible: .figure {float: left;}?

4.2 Random Distribution of Abscissae With Non-Dispersed Ordinates¶

Since $x_k$ are randomly distributed, the dispersion in the calculated values of $\omega_1$ increases as $n_p$ becomes small. For a fixed value of $n_p$ , the cumulative distribution (computed from 10000 simulations), is represented in Fig. 9. It can be seen that the expected median $\omega_{1m} / \omega_e$ , which is expected to be equal to 1, is affected even more than under the conditions described in Section 4.1.

Fig. 9 Cumulative distributions of $\omega_1$ (random distribution of $x_k$ , non-dispersed $y_k$ ).

Table 9 Median values of distributions shown in Fig. 9.¶
$n_p$	$\frac{\omega_{1m}}{\omega_e}$
8	1.132
10	1.096
12	1.075
15	1.052
20	1.032
50	1.006

Todo

Place table next to figure if possible: .figure {float: left;}?

4.3 Random Distribution of Abscissae With Dispersed Ordinates¶

As expected, the dispersion of $\omega_1$ is greater than in the previous case. Fig. 10 shows the the case where $(\sigma_1/\rho_e) = 10\%$ , compared to $(\sigma_1/\rho_e) = 0$ as represented in Fig. 9. Nevertheless, the median values are barely affected (compare Table 10 to the previous data in Table 9).

Fig. 10 Cumulative distributions of $\omega_1$ with randomly distributed ordinates such that $(\sigma_1/\rho_e = 0.1)$

Table 10 Median values of distributions shown in Fig. 10.¶
$n_p$	$\frac{\omega_{1m}}{\omega_e}$
8	1.142
10	1.105
12	1.081
15	1.057
20	1.036
50	1.007

Todo

Place table next to figure if possible: .figure {float: left;}?

5. Further Optimizations Based on Estimates of $a$ and $\rho$ ¶

We now shift our interest to a function $y = f(x)$ , expressed in the form (38). The inverse function is an arcsine, but can also be expressed as an arctangent:

(53)¶ $\begin{cases} \Phi(x) = \text{arctan} \left( \frac{f(x) - a} {\sqrt{\rho^2 - \left( f(x) - a \right)^2}} \right) \\ \omega \; x + \varphi = \pm \Phi(x) + \pi K_{(x)} \end{cases}$

$\text{arctan}$ denotes the principal values (between $-\frac{\pi}{2}$ and $\frac{\pi}{2}$ ) of the multivalued function. The sign of $\Phi$ and the integer $K_{(x)}$ depend on the half-period of the sinusoid in which the point $(x, y)$ is found, making their dependence on $x$ discontinuous. Additionally, we can show that the sign is + when $K_{(x)}$ is even and - when odd.

If we consider $\Phi(x)$ by itself, it is a sawtooth function, as shown in Fig. 11 by the dotted curve. The points $(x_k, \Phi_k)$ , with $\Phi_k = \Phi(x_k)$ , are represented by crosses. Rewritten in this manner, the problem becomes one of regression of a sawtooth function, which is largely as “nightmarish” as regression of the sinusoid. Since we have no information other than the points $(x_k, \Phi_k)$ , determining $K_{(x)}$ difficult and mostly empirical, and therefore not guaranteed to always be feasible in the general case. The situation is different in our case because we have already found the orders of magnitude of the parameters: $a_1$ , $b_1$ , $c_1$ and $\omega_1$ , and consequently $\rho_1$ and $\varphi_1$ through (38). In the current instance, which will be represented by the subscript 2, we posit that:

(54)¶ $a_2 = a_1 \quad ; \quad \rho_2 = \rho_1 = \sqrt{b_1^2 + c_1^2} \quad ; \quad \rho_1 \; \text{cos}(\varphi_1) = b_1 \quad ; \quad \rho_1 \; \text{sin}(\varphi_1) = c_1 \\ \text{If} \; b_1 > 0 \quad \rightarrow \quad \varphi_1 = \text{arctan} \left( \frac{c_1}{b_1} \right) \quad ; \quad \text{if} \; b_1 < 0 \quad \rightarrow \quad \varphi_1 = \text{arctan} \left( \frac{c_1}{b_1}\right) + \pi$

Now it is possible to calculate $K_1, K_2, ..., K_k, ..., K_n$ in number of different ways. One possibility is as follows, where $\text{round}$ rounds the real argument to the nearest integer:

(55)¶ $K_k = \text{round} \left( \frac{\omega_1 \; x_k + \varphi_1}{\pi} \right)$

Another way to write (53), applied to $x = x_k$ , better demonstrates the nearly linear relationship $\theta_k \approx \omega_2 \; x_k + \varphi_2$ between $x_k$ and $\theta_k$ , defined by:

(56)¶ $\begin{cases} \theta_k = (-1)^{K_k} \; \text{arctan} \left( \frac{y_k - a_2} {\sqrt{\rho_2^2 - (y_k - a_2)^2}^2} \right) + \pi K_k \\ \text{if} \; \rho_2^2 \leq (y_k - a_2)^2 \quad \rightarrow \quad \text{arctan} = +\frac{\pi}{2} \; \text{if} \; y_k > a_2 \; \text{or} \; = -\frac{\pi}{2} \; \text{if} \; y_k < a_2 \end{cases}$

We see in Fig. 11, that the series of points $(x_k, \Phi_k)$ represented by crosses, is transformed into a series of points $(x_k, \theta_k)$ represented by squares, some of which coincide.

Fig. 11 Transformation of a sawtooth function in preparation for linear regression[errata-reei-16]

Table 11 Numerical data for the points shown in Fig. 11.¶
$k$	$\Phi_k$	$K_k$	$\theta_k$
1	1.16394	-1	-4.30553
2	0.975721	-1	-4.11731
3	0.864493	-1	-4.00609
4	1.11266	-1	-4.25425
5	0.438724	-1	-3.58032
6	-0.990033	-1	-2.15156
7	-1.5708	0	-1.5708
8	-1.06193	0	-1.06193
9	-1.5708	0	-1.5708
10	0.288353	0	0.288353
11	0.266008	0	0.266008
12	1.5708	1	1.5708
13	1.04586	1	2.09573
14	0.388646	1	2.75295
15	0.490008	1	2.65159

It is clear that the points $(x_k, \theta_k)$ tend towards a straight line, which makes it possible to perform a linear regression to obtain $\omega_2$ from the slope and $\varphi_2$ from the x-intercept. Once we have computed $\theta_k$ from (56), the result can be obtained in the usual manner:

(57)¶ $\begin{bmatrix} \omega_2 \\ \varphi_2 \end{bmatrix} = \begin{bmatrix} \sum (x_k)^2 & \sum x_k \\ \sum x_k & n \end{bmatrix}^{-1} \begin{bmatrix} \sum \theta_k x_k \\ \sum \theta_k \end{bmatrix}$

Completed by:

(58)¶ $b_2 = \rho_2 \; \text{cos}(\varphi_2) \quad ; \quad c_2 = \rho_2 \; \text{sin}(\varphi_2)$

The numerical results are summarized in Table 14 (in the table column labeled $(2)$ ), and the curve is shown in Fig. 14.

The performance measured in terms of the optimization of $\omega$ is shown in Fig. 12 and Fig. 13. Note the improvements in the optimized median value, $\omega_m$ , shown in Table 12 and Table 13, which was the objective of this phase of the algorithm.

Fig. 12 Cumulative distributions of $\omega_2$ (random distribution of $x_k$ , non-dispersed $y_k$ ).

Todo

Fix the broken figure

Table 12 Median values of distributions shown in Fig. 12.¶
$n_p$	$\frac{\omega_{2m}}{\omega_e}$
8	1.173
10	1.142
12	1.117
15	1.070
20	0.984

Todo

Fix the broken table

Fig. 13 Cumulative distributions of $\omega_2$ with randomly distributed ordinates such that $(\sigma_2/\rho_e = 0.1)$

Todo

Fix the broken figure

Table 13 Median values of distributions shown in Fig. 13.¶
$n_p$	$\frac{\omega_{2m}}{\omega_e}$
8	1.160
10	1.129
12	1.110
15	1.066
20	0.990
50	0.825

Todo

Fix the broken table

6. Final Steps and Results¶

As soon as we have a good approximation ( $\omega_2$ ) of $\omega$ , the classical method shown in Section 2 can be applied:

(59)¶ $\begin{bmatrix} a_3 \\ b_3 \\ c_3 \end{bmatrix} = \begin{bmatrix} n & \sum \text{sin}(\omega_2 \; x_k) & \sum \text{cos}(\omega_2 \; x_k) \\ \sum \text{sin}(\omega_2 \; x_k) & \sum \text{sin}^2(\omega_2 \; x_k) & \sum \text{sin}(\omega_2 \; x_k) \; \text{cos}(\omega_2 \; x_k) \\ \sum \text{cos}(\omega_2 \; x_k) & \sum \text{sin}(\omega_2 \; x_k) \; \text{cos}(\omega_2 \; x_k) & \sum \text{cos}^2(\omega_2 \; x_k) \end{bmatrix}^{-1} \begin{bmatrix} \sum y_k \\ \sum y_k \; \text{sin}(\omega_2 \; x_k) \\ \sum y_k \; \text{cos}(\omega_2 \; x_k) \end{bmatrix}$

This allows us to perform a final optimization of the dimensional parameters of the sinusoid, which was not possible in the previous step because $a_2$ and $\rho_2$ were fixed. The final result is presented in Fig. 14. The numerical results are summarized in Table 14 (in the table column labeled $(3)$ ). The intermediate results (sinusoids $(1)$ and $(2)$ , along with their respective parameters) are reported in the same figure and table.

Fig. 14 Regression results for a sample sinusoid.

Table 14 Parameters of the results shown in Fig. 14¶
	$(1)$	$(2)$	$(3)$
$\omega$	2.32536	2.02074	2.02074
$a$	-0.345959	-0.345959	-0.405617
$b$	1.34913	1.35253	1.2752
$c$	0.358335	-0.345283	-0.577491
$\rho$	1.39591	1.39591	1.39987
$\varphi$	0.25961	-0.249948	-0.425231

This figure might give the appearance that the prodedure is apparently converging the solid curves with the dotted one with successive approximations. However, attempts to use this approach iteratively would have no effect: repeating the calculations would not modify the final result because it is based on the initial numerical integrations, whose inherent noise will not decrease with further iterations.

Sampling a large number of simulations of different conditions would be required to form an objective opinion of the properties of this method. We can refer to Table 12 and Table 13 for a summary of the results for the the optimization of $\omega$ , which is the essential part of the algorithm. Naturally, the final results are unchanged from what is shown in Fig. 12 and Table 13 since $\omega_3 = \omega_2$ .

For each simulated value of $(\omega_3, a_3, b_3, c_3)$ , the root mean square is computed from

(60)¶ $\sigma_3 = \sqrt{\frac{1}{n} \sum_{k=1}^n \left( y_k - \left( a_3 + b_3 \; \text{sin}(\omega_3 \; x_k) + c_3 \; \text{cos}(\omega_3 \; x_k) \right) \right)^2}$

The distribution of $(\sigma_3/\rho_e)$ for each fixed value of $n_p$ is plotted from 10000 simulations, each with a different set of $(x_k, y_k)$ , since the ordinates are perturbed with a characteristic root mean square of $\sigma_e$ about the nominal (defined in Section1). For the plot in figure sin-11, the dispersed ordinates are generated so that the ratio $\sigma_e / \rho_e$ is always a constant, in this case 0.1. The abscissae, on the other hand, are generated differently:

Fig. 15: The abscissae of successive points are distributed uniformly.

Fig. 16: The abscissae are distributed randomly (with $n_p$ points per full period)

It might seem surprising that the root mean squared error $\sigma_3$ is smaller than the initial $\sigma_e$ . The explanation is that the “optimized” sinusoid will often pass closer to the data points than the “exact” one, especially with more dispersed points. The differences stem from the optimization of $\omega_2$ , which can differ significantly from $\omega_e$ , as we saw in Fig. 12 and Fig. 13. Consequently, for dispersed ordinates, the error does not increase as significantly as we might have feared in moving away from uniformly distributed abscissae.

Fig. 15 Cumulative distribution functions for the root mean squared error, with uniformly distributed absissae.

Fig. 16 Cumulative distribution functions for the root mean squared error, with non-uniformly distributed absissae.

The biggest difference between the uniform and non-uniform cases is the failure rate of the process. This must be mentioned, since it is the common lot of all methods when the points are irregularly distributed. Inevitably (not frequently, but nevertheless with a non-zero probabililty), we run into data sets in which all the points are tightly clustered. The sinusoid for such data is not defined, and therefore can be neither characterized nor approximated. This can be expressed differently in different portions of the computation, for example through indeterminacy (division by a number too close to zero), non-invertibility of one of the matrices, square root of a negative number, etc.

The calculation of $\omega_1$ in (52) is where the majority of failures occur in this method, fortunately quite rarely. For the hundreds of thousands of simulations performed with uniformly distributed abscissae, no failure was ever encountered. This is not surprising since the points could never be clumped in that situation. The case with randomly selected abscissae, on the other hand, showed a very low failure rate, dependent on both the dispersion and the quantity of data points under consideration, as shown in Fig. 17 (which required nearly five hundred thousand simulations for a coherent view to emerge).

Fig. 17 Failure rates (orders of magnitude).

7. Discussion¶

Appendix 1: Summary of Sinusoidal Regression Algorithm¶

Appendix 2: Detailed Procedure for Sinusoidal Regression¶

Application to the Logistic Distribution (Three Parameters)¶

Application to the Logistic Distribution (Four Parameters)¶

Mixed Linear and Sinusoidal Regression¶

Generalized Sinusoidal Regression¶

Damped Sinusoidal Regression¶

Double Exponential Regression & Double Power Regression¶

Multivariate Regression¶

Rather than a single variable, $x$ , the objective function depends on multiple variables, $x, t, ...$ .

Review of the Linear Case¶

In the simplest case, the function $y(x, t, ...)$ is linear with respect to the optimization parameters $\lambda_1, \lambda_2, ..., \lambda_j, ..., \lambda_m$ :

$y(x, t, ...) = \lambda_1 f_1(x, t, ...) + ... + \lambda_j f_j(x, t, ...) + ... + \lambda_m f_m(x, t, ...)$

The given functions $f_j(x, t, ...)$ do not themselves depend on the optimization parameters.

Given the $n$ data points $(x_1, t_1, ...; y_1), (x_2, t_2, ...; y_2), ..., (x_k, t_k, ...; y_k), ..., (x_n, t_n, ...; y_n)$ [errata-reei-17], the fitting parameters can be computed using the least-squares method:

With: $f_{j,k} = f_j(x_k, t_k, ...)$

$\begin{bmatrix} \lambda_1 \\ ... \\ \lambda_j \\ ... \\ \lambda_m \end{bmatrix} = \begin{bmatrix} \sum_{k=1}^n \left(f_{1,k}\right)^2 & ... & \sum_{k=1}^n f_{1,k} f_{j,k} & ... & \sum_{k=1}^n f_{1,k} f_{m,k} \\ ... & ... & ... & ... & ... \\ \sum_{k=1}^n f_{1,k} f_{j,k} & ... & \sum_{k=1}^n \left(f_{j,k}\right)^2 & ... & \sum_{k=1}^n f_{j,k} f_{m,k} \\ ... & ... & ... & ... & ... \\ \sum_{k=1}^n f_{1,k} f_{m,k} & ... & \sum_{k=1}^n f_{j,k} f_{m,k} & ... & \sum_{k=1}^n \left(f_{m,k}\right)^2 \end{bmatrix}^{-1} \begin{bmatrix} \sum_{k=1}^n y_k f_{1,k} \\ ... \\ \sum_{k=1}^n y_k f_{j,k} \\ ... \\ \sum_{k=1}^n y_k f_{m,k} \end{bmatrix}$

Non-Linear Case¶

If one or more of the functions depend on the fitting parameter(s), the corresponding $y(x, t, ...)$ will no longer be linear with respect to the optimization parameters:

$y(x, t, ...) = b_1 \varphi_1(p_1, p_2, ...; x, t, ...) + b_2 \varphi_2(p_1, p_2, ...; x, t, ...) + ... + \lambda_1 f_1(x, t, ...) + ... + \lambda_j f_j(x, t, ...) + ... + \lambda_m f_m(x, t, ...)$

The functions given as $\varphi_1, \varphi_2, ...$ depend on the fitting parameters $p_1, p_2, ...$ . The complete set of parameters to optimize is therefore $p_1, p_2, ..., b_1, b_2, ..., \lambda_1, ..., \lambda_j, ..., lambda_m$ . The least squares method can not be applied directly in this case. A multitude of different methods exist, based generally on forming an initial estimate of the non-linear parameters ( $p_1, p_2, ...$ ), followed by recursive approximations to progressively optimize the initial guess.

The method studied here follows from a very different principle. We perform one or more integrations of $y(x, t, ...)$ , for example $\int y(x, y, ...) dx$ , or $\int y(x, t, ...) dt$ , or $\int \int y(x, t, ...) dx dt$ , or other multiple integrals.

We can can also introduce new functions $g(x, t, ...)$ , to simplify the integrals: $\int g(x, t, ...) y(x, t, ...) dx$ , or $\int g(x, t, ...) y(x, t, ...) dt$ , or $\int \int g(x, t, ...) y(x, t, ...) dx dt$ , or other multiple integrations.

We can see that the possibilities are endless. It is sometimes possible to form linear combinations of such equations that cancel out the non linear terms depending on the parameters $p_1, p_2, ...$ .

For Example, consider the following function:

$y(x, t) = b \text{exp}(p x t) + \lambda(x - t)^2 \quad \begin{cases} b_1 = b \quad ; \quad b_2 = b_3 = ... = 0 \\ p_1 = p \quad ; \quad p_2 = p_3 = ... = 0 \\ \varphi(p_1, p_2, ...; x, t, ...) = \text{exp}(p x t) \\ \lambda_1 = \lambda \quad ; \quad \lambda_2 = \lambda_3 = ... = 0 \\ f_1(x, t, ...) = (x - t)^2 \end{cases} \\ \begin{cases} \text{with :} \\ g(x, t) = t \end{cases} \begin{cases} \int g(x, t) y(x, y) dx = \int t b \text{exp}(p x t) dx + \int t \lambda(x - t)^2 dx \\ = \frac{b}{p} \text{exp}(p x t) + \frac{\lambda}{3} t (x-t)^3 + ... \end{cases}$

We can see that a linear combination between this equation and the definition of $y(x, t)$ can be used to cancel out the term $\text{exp}(p x t)$ :

$y(x, t) = p \int t y(x, t) dx - \frac{1}{3} \lambda p t (x - t)^3 + \lambda (x - t)^2 + ... \\ y(x, t) = p S(x, t) + \lambda_1 f_1(x, t) + \lambda_2 f_2(x, t) + ... \begin{cases} S(x, t) = \int t y(x, t) dx \\ f_1(x, t) = t(x - t)^3 \\ f_2(x, t) = (x - t)^2 \end{cases}$

$S(x, t)$ will be approximated numerically from the data.

This relationship is linear in $p, \lambda_1, \lambda_2$ . Least squares regression gives us the desired approximation of $p$ . This reduces the equation for $y(x, t)$ to a linear relationship with regards to $b$ and $lambda$ .

Nevertheless, the equations above are not quite correct, as they were deliberately left incomplete, so as to simplify the following explanation. The integrals can not be left in their indefinite form, especially when it comes to numerical integration. It is necessary to define a lower limit if integration. This requirement does not pose any fundamental difficulty for a function $y(x)$ of one variable. It does, however, become a sensitive problem for functions $y(x, t, ...)$ of multiple variables. The potential problems with numerical integration must be examined carefully in the presence of multiple variables ( $x, t, ...$ ), even when the integration is only carried out over one of them.

REGRESSIONS et EQUATIONS INTEGRALES¶

Translator’s Note¶

Regressions and Integral Equations¶

Abstract¶

1. Introduction¶

2. Principle of Linearization Through Differential and/or Integral Equations¶

3. Example: Illustration of the Gaussian Probability Density Function¶

4. Discussion¶

Appendix 1: Review of Linear Regression¶

Appendix 2: Linear Regression of the Gaussian Cumulative Distribution Function¶

Listings for the functions Erf and argErf¶

Non-Linear Regression of the Types: Power, Exponential, Logarithmic, Weibull¶

Abstract¶

1. Introduction¶

2. Regression of Functions of the Form ¶

3. Regression of the Three-Parameter Weibull Cumulative Distribution Function¶

4. Discussion¶

Regression of Sinusoids¶

1. Introduction¶

2. Case Where is Known A-Priori¶

3. Linearization Through an Integral Equation¶

4. A Brief Analysis of Performance¶

4.1 Uniform Distribution of Abscissae With Non-Dispersed Ordinates¶

4.2 Random Distribution of Abscissae With Non-Dispersed Ordinates¶

4.3 Random Distribution of Abscissae With Dispersed Ordinates¶

5. Further Optimizations Based on Estimates of and ¶

6. Final Steps and Results¶

7. Discussion¶

Appendix 1: Summary of Sinusoidal Regression Algorithm¶

Appendix 2: Detailed Procedure for Sinusoidal Regression¶

Application to the Logistic Distribution (Three Parameters)¶

Application to the Logistic Distribution (Four Parameters)¶

Mixed Linear and Sinusoidal Regression¶

Generalized Sinusoidal Regression¶

Damped Sinusoidal Regression¶

Double Exponential Regression & Double Power Regression¶

Multivariate Regression¶

Review of the Linear Case¶

Non-Linear Case¶

2. Regression of Functions of the Form $y(x) = a + b \; \text{exp}(c \; x)$ ¶

2. Case Where $\omega$ is Known A-Priori¶

5. Further Optimizations Based on Estimates of $a$ and $\rho$ ¶