skg.util

Shared utility functions used by the fitting routines.

Functions

bias_mean(x, y) Computes the bias of a dataset as the mean of the y values.
counts(p, size) Compute the number of elements in each cluster in a dataset of size elements.
ends(p, size) Compute the exclusive end indices of each cluster for a dataset of size elements.
means(x, p) Compute the means of each cluster in x.
medians(x, p) Compute the median of each cluster in x.
preprocess(x[, copy, float, axis]) Ensure that x is a properly formatted numpy array.
preprocess_npair(x, y[, axis, xcopy, ycopy]) Ensure that x and y are floating point arrays of compatible size.
preprocess_pair(x, y[, sorted, xcopy, ycopy]) Ensure that x and y are floating point arrays of the same size, ranked in increasing order by x.
roots(x, y[, sorted, bias, return_indices, …]) Interpolate the roots of a 1-D dataset.
stds(x, p) Compute the standard deviations of each cluster in x.
sums(x, p) Compute the sums of each cluster in x.
vars(x, p) Compute the variances of each cluster in x.
skg.util.bias_mean(x, y)[source]

Computes the bias of a dataset as the mean of the y values.

This function is a wrapper to be used as bias for roots.

Parameters:
  • x (array-like) – The x-values, passed in to fulfill the correct interface and otherwise ignored.
  • y (array-like) – The y-values of the data, treated as a 1D raveled array.
Returns:

Return type:

The mean of y.

skg.util.counts(p, size)[source]

Compute the number of elements in each cluster in a dataset of size elements.

Parameters:
  • p (array-like) – The start points of each cluster. Clusters continue until the start of the next cluster.
  • size (int) – The size of the dataset (not of p). Necessary because p omits the end index.
Returns:

c – The number of elements in each cluster. c.size == p.size.

Return type:

numpy.ndarray

skg.util.ends(p, size)[source]

Compute the exclusive end indices of each cluster for a dataset of size elements.

Parameters:
  • p (array-like) – The start points of each cluster. Clusters continue until the start of the next cluster.
  • size (int) – The size of the dataset (not of p). Necessary because p omits the end index.
Returns:

q – The exclusive indices of the end of each cluster. q.size == p.size.

Return type:

numpy.ndarray

skg.util.means(x, p)[source]

Compute the means of each cluster in x.

Parameters:
  • x (array-like) – The array to compute means over.
  • p (array-like) – The start points of each cluster. Clusters continue until the start of the next cluster. The indices should start with zero and increase monotonically.
Returns:

m – The mean of each cluster identified by p.

Return type:

numpy.ndarray

skg.util.medians(x, p)[source]

Compute the median of each cluster in x.

Parameters:
  • x (array-like) – The array to compute medians over.
  • p (array-like) – The start points of each cluster. Clusters continue until the start of the next cluster. The indices should start with zero and increase monotonically.
Returns:

m – The median of each cluster identified by p.

Return type:

numpy.ndarray

skg.util.preprocess(x, copy=False, float=False, axis=None)[source]

Ensure that x is a properly formatted numpy array.

Proper formatting means at least one dimension, and may include optional copying, reshaping and coersion into a floating point datatype.

Parameters:
  • x (array-like) – The array to process. If not already a numpy array, it will be converted to one.
  • copy (bool, optional) – If True, a copy is made regardless of whether x is already a numpy array or not. The default is False.
  • float (bool, optional) – If True, and x is not an inexact array already (numpy.float16, numpy.float32, numpy.float64, numpy.float96, numpy.float128, etc), coerce to be of type numpy.float_. Defaults to False.
  • axis (int, optional) – If specified, the specified axis is moved to the end of the shape. Default is to return x with the original dimensions.
Returns:

x – Processed version of the input.

Return type:

ndarray

skg.util.preprocess_npair(x, y, axis=-1, xcopy=False, ycopy=False)[source]

Ensure that x and y are floating point arrays of compatible size.

x is an array containing vectors along dimension axis. y contains scalar elements. The shape of y must match that of x exactly except for axis.

Parameters:
  • x (array-like) – The vector x-values of the data points. The array will be converted to floating point, and raveled along all dimensions but axis, which will be the last dimension.
  • y (array-like) – The y-values of the data points corresponding to x. Must have one fewer dimension than x, and its shape must match all elements of x’s shape except axis. Will be converted to floating point and raveled.
  • xcopy (bool, optional) – Ensure that x gets copied even if it is already an array. The default is to leave arrays untouched as much as possible.
  • ycopy (bool) – Ensure that y gets copied even if it is already an array. The default is to leave arrays untouched as much as possible.
Returns:

x, y – Processed versions of the inputs.

Return type:

ndarray

See also

preprocess_pair
For cases when x and y both contain scalars, and are the exact same size.
skg.util.preprocess_pair(x, y, sorted=True, xcopy=False, ycopy=False)[source]

Ensure that x and y are floating point arrays of the same size, ranked in increasing order by x.

Parameters:
  • x (array-like) – The x-values of the data points. The array will be converted to floating point, raveled and sorted, only as necessary.
  • y (array-like) – The y-values of the data points corresponding to x. Must be the same size as x. Will be converted to floating point and raveled only as necessary. Will be sorted if x gets sorted.
  • sorted (bool) – Set to True if x is already monotonically increasing or decreasing. If False, x will be sorted into increasing order, and y will be sorted along with it.
  • xcopy (bool, optional) – Ensure that x gets copied even if it is already an array. The default is to leave arrays untouched as much as possible.
  • ycopy (bool) – Ensure that y gets copied even if it is already an array. The default is to leave arrays untouched as much as possible.
Returns:

x, y – Processed versions of the inputs.

Return type:

ndarray

See also

preprocess_npair
Similar function but for x containing vectors and y scalars.
skg.util.roots(x, y, sorted=True, bias=<function bias_mean>, return_indices=False, return_bias=False)[source]

Interpolate the roots of a 1-D dataset.

Roots are interpolated using linear interpolation about an arbitrary bias, possibly generated from the data.

Parameters:
  • x (array-like) – The x-values of the data. Must be monotonically increasing or decreasing. Treated as a 1D raveled array.
  • y (array-like) – The y-values of the data. Treated as a 1D raveled array. Must be the same lendth as x.
  • sorted (bool) – Set to True if x is already monotonically increasing or decreasing. If False, x will be sorted into increasing order, and y will be sorted along with it.
  • bias (scalar or array-like or callable, optional) – Either a fixed y-value that the data is offset by, or a callable that generates the value from x and y. The roots are computed for the y-values with the bias subtracted off. Defaults to the mean of the data.
  • return_indices (bool, optional) – If True, return the indices of the interpolation points.
  • return_bias (bool, optional) – If True, return the bias as an additional output parameter. This parameter is especially useful if bias is a callable.
Returns:

  • roots (numpy.ndarray) – The x-values of the interpolated intersections between the data and the bias.
  • indices (numpy.ndarray) – If return_indices is set, this will be the indices of the interpolation points in x, as if returned by numpy.searchsorted. This will be the index of the point on the right of the interval containing the root.
  • bias (numpy.ndarray or scalar) – If return_bias is set, this will be the actual scalar or array that is subtraced from the y-values to compute the roots.

skg.util.stds(x, p)[source]

Compute the standard deviations of each cluster in x.

Parameters:
  • x (array-like) – The array to compute standard deviations over.
  • p (array-like) – The start points of each cluster. Clusters continue until the start of the next cluster. The indices should start with zero and increase monotonically.
Returns:

s – The standard deviation of each cluster identified by p.

Return type:

numpy.ndarray

skg.util.sums(x, p)[source]

Compute the sums of each cluster in x.

Parameters:
  • x (array-like) – The array to compute sums over.
  • p (array-like) – The start points of each cluster. Clusters continue until the start of the next cluster. The indices should start with zero and increase monotonically.
Returns:

s – The sum of each cluster identified by p.

Return type:

numpy.ndarray

skg.util.vars(x, p)[source]

Compute the variances of each cluster in x.

Parameters:
  • x (array-like) – The array to compute variances over.
  • p (array-like) – The start points of each cluster. Clusters continue until the start of the next cluster. The indices should start with zero and increase monotonically.
Returns:

v – The variance of each cluster identified by p.

Return type:

numpy.ndarray