Testing SKG¶
skg.tests
is the scikit-guess test package. Each module in the main
skg
package has a corresponding test module here.
Each function is tested for speed and quality using a modified version of
scipy.optimize.curve_fit
as the gold standard. The reason for choosing
curve_fit
is that it is a very common and accessible
function to use. If this package provides benefits to users of
curve_fit
, it will likely improve the experience for
other packages like lmfit as well.
Each test computes a quality metric that is stored in a global dictionary keyed
by test ID. The dictionary is set up by the
tests.conftest.quality_metric
fixture. Quality metrics are aggregated
and published in an HTML file upon completion.
The following tests are run for every fitting function in the scikit:
- Speed
- A simple benchmark of the fit against the modified
curve_fit
. - Convergence
- Checks that
curve_fit
converges faster using the fit as an initial guess than it would with default parameters. - Total Speed
- Benchmarks the fit +
curve_fit
with the fit as an initial guess against justcurve_fit
with the default guess. Passing this test is optional. - RMS
- The RMS of the data about the function fitted with the fitting
routine must be within an acceptable threshold of the RMS of the
data about the reqult of
curve_fit
. - Accuracy
- The parameters computed by the fitting function must be within
acceptable thresholds of the parameters computed by
curve_fit
. - Pathological Cases
- Each function has pathological cases that it can not handle properly. Tests for each function should be included on a case-by-case basis to ensure the contractal behavior in these cases. For example, exponential fits can not be made to colinear data, unless it is horizontal. All distributions run into issues when they are no longer over-determined.
- CurveFit
- An implicit test of
curve_fit
is done with each dataset, to ensure that the RMS of the data about the fit is lower than the RMS of the data about the model with initial fitting parameters (only for noisy datasets). - Input Parameters
- Each input parameter should be tested individually to check for contracual behavior. Specifically, Weibull and power fits need to be careful about negative numbers in the input.
- Paper
- If the source material on wich a the algorithm is based provides a sample, a “Paper Test” may be added to verify the results against what should be an independent implementation.
Todo
Add a partial domain test. E.g., Gaussian without the peak portion, etc. Not always valid, e.g. for exponential, which has no partial domain.
Todo
Add test to show that it works with sorted=True and x-data reversed.
Additional Notes¶
Test environments can be created under conda with the following commands:
conda create --name skg-testing-py3.6 --no-default-packages python=3.6 nomkl numpy scipy pytest sphinx sphinx_rtd_theme
conda create --name skg-testing-py3.7 --no-default-packages python=3.7 nomkl numpy scipy pytest sphinx sphinx_rtd_theme
...