Testing SKG¶

skg.tests is the scikit-guess test package. Each module in the main skg package has a corresponding test module here.

Each function is tested for speed and quality using a modified version of scipy.optimize.curve_fit as the gold standard. The reason for choosing curve_fit is that it is a very common and accessible function to use. If this package provides benefits to users of curve_fit, it will likely improve the experience for other packages like lmfit as well.

Each test computes a quality metric that is stored in a global dictionary keyed by test ID. The dictionary is set up by the tests.conftest.quality_metric fixture. Quality metrics are aggregated and published in an HTML file upon completion.

The following tests are run for every fitting function in the scikit:

Speed: A simple benchmark of the fit against the modified curve_fit.
Convergence: Checks that curve_fit converges faster using the fit as an initial guess than it would with default parameters.
Total Speed: Benchmarks the fit + curve_fit with the fit as an initial guess against just curve_fit with the default guess. Passing this test is optional.
RMS: The RMS of the data about the function fitted with the fitting routine must be within an acceptable threshold of the RMS of the data about the reqult of curve_fit.
Accuracy: The parameters computed by the fitting function must be within acceptable thresholds of the parameters computed by curve_fit.
Pathological Cases: Each function has pathological cases that it can not handle properly. Tests for each function should be included on a case-by-case basis to ensure the contractal behavior in these cases. For example, exponential fits can not be made to colinear data, unless it is horizontal. All distributions run into issues when they are no longer over-determined.
CurveFit: An implicit test of curve_fit is done with each dataset, to ensure that the RMS of the data about the fit is lower than the RMS of the data about the model with initial fitting parameters (only for noisy datasets).
Input Parameters: Each input parameter should be tested individually to check for contracual behavior. Specifically, Weibull and power fits need to be careful about negative numbers in the input.
Paper: If the source material on wich a the algorithm is based provides a sample, a “Paper Test” may be added to verify the results against what should be an independent implementation.

Todo

Add a partial domain test. E.g., Gaussian without the peak portion, etc. Not always valid, e.g. for exponential, which has no partial domain.

Todo

Add test to show that it works with sorted=True and x-data reversed.

Additional Notes¶

Test environments can be created under conda with the following commands:

conda create --name skg-testing-py3.6 --no-default-packages python=3.6 nomkl numpy scipy pytest sphinx sphinx_rtd_theme
conda create --name skg-testing-py3.7 --no-default-packages python=3.7 nomkl numpy scipy pytest sphinx sphinx_rtd_theme
...