# Documentation for bootstrap-stat¶

## Literary References¶

[ET93]: Bradley Efron and Robert J. Tibshirani, “An Introduction to the

Bootstrap”. Chapman & Hall, 1993.

[Koh95]: Ron Kohavi, “A Study of Cross-Validation and Bootstrap for

Accuracy Estimation and Model Selection”. International Joint Conference on Artificial Intelligence, 1995.

[ET97]: Bradley Efron and Robert Tibshirani. “Improvements on

Cross-Validation: The .632+ Bootstrap Method”. Journal of the American Statistical Association, Vol. 92, No. 438. June 1997, pp. 548–560.

[Arl10]: Sylvain Arlot, “A Survey of Cross-Validation Procedures for

Model Selection”. Statistics Surveys, Vol. 4, 2010.

## Module Functions¶

class bootstrap_stat.EmpiricalDistribution(data)

Empirical Distribution

The Empirical Distribution puts probability 1/n on each of n observations.

Parameters

data (array_like or pandas DataFrame) – The data.

sample(size=None, return_indices=False, reset_index=True)

Sample from the empirical distribution

Parameters
• size (int or tuple of ints, optional) – Output shape. If None (default), samples the same number of points as the original dataset.

• return_indices (boolean, optional) – If True, return the indices of the data points sampled. Defaults to False.

• reset_index (boolean, optional) – If True (default), reset the index. Applies only to data frames. This is usually what we would want to do, except for debugging perhaps.

Returns

• samples (ndarray or pandas DataFrame) – IID samples from the empirical distribution.

• ind (ndarray) – Indices of samples chosen. Only returned if return_indices is True.

calculate_parameter(t)

Calculate a parameter of the distribution.

Parameters

t (function) – Function to be applied to dataset. If using an n-Sample Distribution, t should take as input a tuple of data sets of the appropriate size.

Returns

tF – Parameter of distribution.

Return type

float

class bootstrap_stat.MultiSampleEmpiricalDistribution(datasets)

Multi-Sample Empirical Distribution

Parameters

datasets (tuple of arrays or pandas DataFrames.) – Observed data sets.

Notes

Suppose we observe

\begin{align}\begin{aligned}x_i \sim F, i=1,...,m\\y_j \sim G, j=1,...,n\end{aligned}\end{align}

Then $$P = (\hat{F}, \hat{G})$$ is the probabilistic mechanism consisting of the two empirical distributions $$F$$ and $$G$$. Sampling from $$P$$ amounts to sampling $$m$$ points IID from $$\hat{F}$$, and $$n$$ points IID from $$\hat{G}$$. This is called the Two Sample Empirical Distribution, and comes up a lot in A/B testing. The generalization to more than two samples is obvious.

Because of some of the implementation details of the bootstrap methods, we use tuples for everything. So when initializing, be sure to wrap the different datasets in a tuple. Samples will themselves be tuples, etc. See the function documentation below for details and examples.

This is a relatively thin wrapper around the regular EmpiricalDistribution: we just create a distinct EmpiricalDistribution for each dataset, and use that for sampling.

Examples

>>> data = [1, 2, 3]
>>> dist = EmpiricalDistribution(data)
>>> dist.sample()
[1, 2, 1]
>>> data_a = [1, 2, 3]
>>> data_b = [4, 5, 6]
>>> data = (data_a, data_b)  # Note tuple
>>> dist = MultiSampleEmpiricalDistribution(data)
>>> a, b = dist.sample()  # Can de-tuple directly
>>> a
[2, 2, 3]
>>> b
[4, 6, 4]
>>> ab = dist.sample()  # Or indirectly, which is often more useful
>>> ab
[array([2, 2, 3]), array([4, 6, 4])]

sample(size=None)

Sample from the empirical distribution

Parameters

size (tuple of ints, optional) – Number of samples to be drawn from each EmpiricalDistribution. If None (default), samples the same numbers of points as the original datasets.

Returns

samples – IID samples from the empirical distributions.

Return type

tuple of ndarray or pandas DataFrame

calculate_parameter(t)

Calculate a parameter of the distribution.

Parameters

t (function) – Function to be applied to dataset. Should take as input a tuple of data sets of the appropriate size.

Returns

tF – Parameter of distribution.

Return type

float

Examples

Suppose we are in the Two-Sample case and have two empirical distributions, $$\hat{F}$$ and $$\hat{G}$$, and we want to calculate the difference in means of these distributions. We might do something like:

>>> data_a = [1, 2, 3]
>>> data_b = [4, 5, 6]
>>> data = (data_a, data_b)  # Note tuple
>>> dist = MultiSampleEmpiricalDistribution(data)
>>> def parameter(ab):
...     a, b = ab  # Note de-tupling
...     return np.mean(b) - np.mean(a)
>>> dist.calculate_parameter(parameter)
3.0

bootstrap_stat.jackknife_standard_error(x, stat, return_samples=False, jv=None, num_threads=1)

Jackknife estimate of standard error.

Parameters
• x (array_like or pandas DataFrame or tuple) – The data. If tuple. interpretation is as a MultiSample distribution, like in A/B testing.

• stat (function) – The statistic.

• return_samples (boolean, optional) – If True, return the jackknife values. Defaults to False.

• jv (array_like, optional) – Jackknife values. Can be passed if they have already been calculated, which will speed this up considerably.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• se (float) – The standard error.

• jv (ndarray) – Jackknife values. Only returned if return_samples is True.

Notes

The jackknife estimate of standard error is only applicable when stat is a plug-in statistic, that is, having the form $$t(\hat{F})$$, where $$\hat{F}$$ is the empirical distribution. Moreover, it is only applicable when t is a smooth function. Notable exceptions include the median. The jackknife cannot be used to estimate the standard error of non-smooth estimators. See [EF93, S10.6]

bootstrap_stat.standard_error(dist, stat, robustness=None, B=200, size=None, jackknife_after_bootstrap=False, return_samples=False, theta_star=None, num_threads=1)

Standard error

Parameters
• dist (EmpiricalDistribution) – The empirical distribution.

• stat (function) – The statistic for which we wish to calculate the standard error.

• robustness (float or None, optional) – Controls whether to use a robust estimate of standard error. If specified, should be a float in (0.5, 1.0), with lower values corresponding to greater bias but increased robustness. If None (default), uses the non-robust estimate of standard error.

• B (int, optional) – Number of bootstrap samples. Defaults to 200.

• size (int or tuple of ints, optional) – Size to pass for generating samples from the distribution. Defaults to None, indicating the samples will be the same size as the original dataset.

• jackknife_after_bootstrap (boolean, optional) – If True, will estimate the variability of our estimate of standard error. See [ET93, S19.4] for details.

• return_samples (boolean, optional) – If True, return the bootstrapped statistic values. Defaults to False.

• theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• se (float) – The standard error.

• se_jack (float) – Jackknife-after-bootstrap estimate of the standard error of se (i.e. an estimate of the variability of se itself). Only returned if jackknife_after_bootstrap is True.

• theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.

bootstrap_stat.infinitesimal_jackknife(x, stat, eps=0.001, influence_components=None, return_influence_components=False, num_threads=1)

Infinitesimal Jackknife

Parameters
• x (array_like or pandas DataFrame or tuple) – The data. If tuple. interpretation is as a MultiSample distribution, like in A/B testing.

• stat (function) – The statistic. Should take x and a resampling vector as input. See Notes.

• eps (float, optional) – Epsilon for limit calculation. Defaults to 1e-3.

• influence_components (array_like, optional) – Influence components. See Notes.

• return_influence_components (boolean, optional) – Specifies whether to return the influence components. Defaults to False.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• se (float) – The standard error.

• influence_components (array_like) – The influence components. Only returned if return_influence_components is True.

Notes

The infinitesimal jackknife requires the statistic to be expressed in “resampling form”. See [ET93, S21] for details. The ith influence component is a type of derivative of the statistic with respect to the ith observation. This is computed by a finite difference method: we simply evaluate the statistic putting a little extra weight (eps) on the ith observation, minus the statistic evaluated on the original dataset, divided by eps.

In some cases, there is an analytical formula for the influence components. In these cases, it would be better for the caller to compute the influence components elsewhere and simply pass them to this function.

bootstrap_stat.t_interval(dist, stat, theta_hat, stabilize_variance=False, se_hat=None, fast_std_err=None, alpha=0.05, Binner=25, Bouter=1000, Bvar=100, size=None, empirical_distribution=<class 'bootstrap_stat.EmpiricalDistribution'>, return_samples=False, theta_star=None, se_star=None, z_star=None, num_threads=1)

Bootstrap-t Intervals

Parameters
• dist (EmpiricalDistribution) – The empirical distribution.

• stat (function) – The statistic for which we wish to calculate a confidence interval.

• theta_hat (float) – The observed statistic.

• stabilize_variance (boolean, optional) – If True, use the variance stabilization technique. Defaults to False.

• se_hat (float or None, optional) – The standard error of the observed data. If None (default), will be calculated using the nonrobust bootstrap estimate of standard error, using the default number of iterations. The user may wish to use a nondefault number of bootstrap iterations to calculate this, or a robust variant. If so, the user should calculate this externally and pass it to this function.

• fast_std_err (function or None, optional) – To speed this up, the user may specify a fast function for computing the standard error of a bootstrap sample. If not specified, we will use the nonrobust bootstrap estimate of standard error, using the default number of iterations. This can also be used to specify a nondefault bootstrap methodology, such as a robust version. See Examples for some examples.

• alpha (float, optional) – Number controlling the size of the interval. That is, this function will return a 100(1-2*alpha)% confidence interval. Defaults to 0.05, corresponding to a 90% confidence interval.

• Binner (int, optional) – Number of bootstrap samples for calculating standard error. Defaults to 25.

• Bouter (int, optional) – Number of bootstrap samples for calculating percentiles. Defaults to 1000.

• Bvar (int, optional) – Number of bootstrap samples used to estimate the relationship between the statistic and the standard error for use with variance stabilization. Defaults to 100.

• size (int or tuple of ints, optional) – Size to pass for generating samples from the distribution. Defaults to None, indicating the samples will be the same size as the original dataset.

• empirical_distribution (class, optional) – Class to be used to generate an empirical distribution. Defaults to the regular EmpiricalDistribution, but any class that implements a sample method will work. For example, the MultiSampleEmpiricalDistribution can be used. This can be used to accommodate more exotic applications of the Bootstrap.

• return_samples (boolean, optional) – If True, return the bootstrapped statistic values. Defaults to False.

• theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.

• se_star (array_like, optional) – Bootstrapped statistic standard errors. Can be passed if they have already been calculated, which will speed this up considerably.

• z_star (array_like, optional) – Bootstrapped pivot values. Can be passed if they have already been calculated, which will speed this up considerably.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• ci_low, ci_high (float) – Lower and upper bounds on a 100(1-2*alpha)% confidence interval on theta.

• theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.

• se_star (ndarray) – Array of bootstrapped statistic standard errors. Only returned if return_samples is True.

• z_star (ndarray) – Array of bootstrapped pivot values. Only returned if return_samples is True and stabilize_variance is False.

Examples

>>> x = np.random.randn(100)
>>> dist = EmpiricalDistribution(x)
>>> def statistic(x): return np.mean(x)
>>> theta_hat = statistic(x)
>>> ci_low, ci_high = t_interval(dist, statistic, theta_hat)

>>> se_hat = standard_error(dist, statistic, robustness=0.95, B=2000)
>>> ci_low, ci_high = t_interval(dist, statistic, theta_hat, se_hat=se_hat)

>>> def fast_std_err(x): return np.sqrt(np.var(x, ddof=1) / len(x))
>>> ci_low, ci_high = t_interval(dist, statistic, theta_hat,
...                              fast_std_err=fast_std_err)

>>> def fast_std_err(x):
...    dist = EmpiricalDistribution(x)
...    return standard_error(dist, statistic, robustness=0.95, B=2000)
>>> ci_low, ci_high = t_interval(dist, statistic, theta_hat,
...                              fast_std_err=fast_std_err)

bootstrap_stat.percentile_interval(dist, stat, alpha=0.05, B=1000, size=None, return_samples=False, theta_star=None, num_threads=1)

Percentile Intervals

Parameters
• dist (EmpiricalDistribution) – The empirical distribution.

• stat (function) – The statistic.

• alpha (float, optional) – Number controlling the size of the interval. That is, this function will return a 100(1 - 2 * alpha)% confidence interval. Defaults to 0.05.

• B (int, optional) – Number of bootstrap samples. Defaults to 1000.

• size (int or tuple of ints, optional) – Size to pass for generating samples from the distribution. Defaults to None, indicating the samples will be the same size as the original dataset.

• return_samples (boolean, optional) – If True, return the bootstrapped statistic values. Defaults to False.

• theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• ci_low, ci_high (float) – Lower and upper bounds on a 100(1-2*alpha)% confidence interval on theta.

• theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.

bootstrap_stat.bcanon_interval(dist, stat, x, alpha=0.05, B=1000, size=None, return_samples=False, theta_star=None, theta_hat=None, jv=None, num_threads=1)

BCa Confidence Intervals

Parameters
• dist (EmpiricalDistribution) – The empirical distribution.

• stat (function) – The statistic.

• x (array_like or pandas DataFrame or tuple) – The data, used to evaluate the observed statistic and compute jackknife values.

• alpha (float, optional) – Number controlling the size of the interval. That is, this function will return a 100(1-2*alpha)% confidence interval. Defaults to 0.05.

• B (int, optional) – Number of bootstrap samples. Defaults to 1000.

• size (int or tuple of ints, optional) – Size to pass for generating samples from the distribution. Defaults to None, indicating the samples will be the same size as the original dataset.

• return_samples (boolean, optional) – If True, return the bootstrapped statistic values. Defaults to False.

• theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.

• theta_hat (float, optional) – Observed statistic. Can be passed if it has already been calculated, which will speed this up slightly.

• jv (array_like, optional) – Jackknife values. Can be passed if they have already been calculated, which will speed this up considerably.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• ci_low, ci_high (float) – Lower and upper bounds on a 100(1-2*alpha)% confidence interval on theta.

• theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.

• jv (ndarray) – Jackknife values. Only returned if return_samples is True.

bootstrap_stat.abcnon_interval(x, stat, alpha=0.05, eps=0.001, influence_components=None, second_derivatives=None, return_influence_components=False, num_threads=1)

ABC Confidence Intervals

Parameters
• x (array_like or pandas DataFrame) – The data, used to evaluate the observed statistic and compute influence components.

• stat (function) – The statistic. Should take x and a resampling vector as input. See Notes.

• alpha (float or array of floats, optional) – Number controlling the size of the interval. That is, this function will return a 100(1 - 2 * alpha)% confidence interval. Defaults to 0.05. Alternatively, the user can pass an array of floats in (0, 1). In that case, the return value will be a tuple of confidence points. See Notes.

• influence_components (array_like, optional) – Influence components. See Notes.

• second_derivatives (array_like, optional) – Vector of second derivatives. See Notes.

• return_influence_components (boolean, optional) – Specifies whether to return the influence components and second derivatives. Defaults to False.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• ci_low, ci_high (float) – Lower and upper bounds on a 100(1 - 2 * alpha)% confidence interval on theta.

• influence_components (array_like) – Vector of influence components. Only returned if return_influence_components is True.

• second_derivatives (array_like) – Vector of second_derivatives. Only returned if return_influence_components is True.

Notes

Approximate Bootstrap Confidence (ABC) intervals require the statistic to be expressed in resampling form. See [ET93, S14.4] for details. It only applies to statistics which are smooth functions of the data. A notable example where ABC does not apply is the sample median.

ABC works in terms of the derivatives and second derivatives of the statistic with respect to the data. In some cases, analytical forms are possible. In that case, the caller may wish to calculate them externally to this function and pass them in. The default behavior is to calculate them using a finite difference method.

When we want to compute multiple confidence points, we can reuse many of the calculations. For example, if we want to compute a 90% confidence interval and a 99% confidence interval, we would specify alpha = [0.005, 0.05, 0.95, 0.995] and read off the appropriate return values. This would be much faster than multiple calls to this function. (The recurring cost is 1 call to stat for each additional point specified.) This in turn facilitates using these intervals for achieved significance levels, effectively by inverting the interval having endpoint 0. See the ASL functions for how this might be done.

bootstrap_stat.calibrate_interval(dist, stat, x, theta_hat, alpha=0.05, B=1000, return_confidence_points=False, num_threads=1)

Calibrated confidence interval

Parameters
• dist (EmpiricalDistribution) – The empirical distribution.

• stat (function) – The statistic.

• x (array_like or pandas DataFrame or tuple) – The data.

• theta_hat (float) – Observed statistic.

• alpha (float, optional) – Number controlling the size of the interval. That is, this function will return a 100(1-2*alpha)% confidence interval. Defaults to 0.05.

• B (int, optional) – Number of bootstrap samples. Defaults to 1000.

• return_confidence_points (boolean, optional) – If True, returns the estimated confidence points have the desired coverage. Defaults to False.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• ci_low, ci_high (float) – Lower and upper bounds on a 100(1-2*alpha)% confidence interval on theta.

• lmbda_low, lmbda_high (float) – The estimated confidence points having the desired coverage. For example, if the interval has the nominal coverage, then lmbda_low would be alpha and lmbda_high would be 1 - alpha.

Notes

While we can in principle calibrate any type of confidence interval, in most instances that results in a “double bootstrap”. For this reason, only ABC intervals are currently supported. Moreover, from my limited experience calibration seems pretty finicky. I would consider this function to be illustrative but experimental. See [ET93, S18] for details.

We compute the observed coverage for a range of points around the nominal alpha and 1-alpha. Then we fit a smoother (loess) to these data and invert to find the confidence point, lmbda, having the desired coverage.

bootstrap_stat.jackknife_values(x, stat, sample=None, num_threads=1)

Compute jackknife values.

Parameters
• x (array_like or pandas DataFrame or tuple of arrays/DataFrames.) – The data.

• stat (function) – The statistic.

• sample (int, optional) – When Jackknifing a multi-sample distribution, like for an A/B test, we generate one set of jackknife values for each sample. The caller should specify which sample for which jackknife values should be generated, calling this function once for each sample.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

jv – The jackknife values.

Return type

ndarray

Notes

The jackknife values consist of the statistic applied to a collection of datasets derived from the original by holding out each observation in turn. For example, let x1 be the dataset corresponding to x, but with the first datapoint removed. The first jackknife value is simply stat(x1).

bootstrap_stat.bias(dist, stat, t, B=200, return_samples=False, theta_star=None, num_threads=1)

Estimate of bias

Parameters
• dist (EmpiricalDistribution) – The empirical distribution.

• stat (function) – The statistic for which we wish to calculate the bias.

• t (function) – Function to be applied to empirical distribution function.

• B (int, optional) – Number of bootstrap samples. Defaults to 200.

• return_samples (boolean, optional) – If True, return the bootstrapped statistic values. Defaults to False.

• theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• bias (float) – Estimate of bias.

• theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.

bootstrap_stat.better_bootstrap_bias(x, stat, B=400, return_samples=False, num_threads=1)

Better bootstrap bias.

Parameters
• x (array_like or pandas DataFrame) – The data.

• stat (function) – The statistic. Should take x and a resampling vector as input. See Notes.

• B (int, optional) – Number of bootstrap samples. Defaults to 400.

• return_samples (boolean, optional) – If True, return the bootstrapped statistic values. Defaults to False.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• bias (float) – Estimate of bias.

• theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.

Notes

The “better” bootstrap bias estimate is only applicable when stat is a plug-in statistic for the parameter being estimated, that is, having the form $$t(\hat{F})$$, where $$\hat{F}$$ is the empirical distribution. Notable situations where this assumption does not hold include robust statistics like using the alpha-trimmed mean, since that is not the plug-in statistic for the mean. In cases like that, just use the “worse” bias function. The advantage of the “better” bootstrap bias estimate is faster convergence. Whereas using B = 400 is typically adequate here, it can take thousands of bootstrap samples to give an accurate estimate in the “worse” bias function.

bootstrap_stat.jackknife_bias(x, stat, return_samples=False, jv=None, num_threads=1)

Jackknife estimate of bias.

Parameters
• x (array_like or pandas DataFrame) – The data.

• stat (function) – The statistic.

• return_samples (boolean, optional) – If True, return the jackknife values. Defaults to False.

• jv (array_like, optional) – Jackknife values. Can be passed if they have already been calculated, which will speed this up considerably.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• bias (float) – Estimate of bias.

• jv (ndarray) – Jackknife values. Only returned if return_samples is True.

Notes

The jackknife estimate of bias is only applicable when stat is a plug-in statistic, that is, having the form $$t(\hat{F})$$, where $$\hat{F}$$ is the empirical distribution. Moreover, it is only applicable when t is a smooth function. Notable exceptions include the median. The jackknife cannot be used to estimate the bias of non-smooth estimators. See [EF93, S10.5] for details.

bootstrap_stat.bias_corrected(x, stat, method='better_bootstrap_bias', dist=None, t=None, B=None, return_samples=False, theta_star=None, jv=None, num_threads=1)

Bias-corrected estimator.

Parameters
• x (array_like or pandas DataFrame) – The data.

• stat (function) – The statistic. For use with “bias” and “jackknife” methods, should take a dataset as the input. For use with the “better_bootstrap_bias” method, it should take the dataset and a resampling vector as input. See the documentation in the better_bootstrap_bias function for details.

• method (["better_bootstrap_bias", "bias", "jackknife"]) – The method by which we correct for bias. Defaults to “better_bootstrap_bias”.

• dist (EmpiricalDistribution) – The empirical distribution. Required when method == “bias”.

• t (function) – Function to be applied to empirical distribution function. Required when method == “bias”.

• B (int, optional) – Number of bootstrap samples. Required when method == “bias” or “better_bootstrap_bias”. Defaults to 400 when method == “better_bootstrap_bias” or 4000 when method == “bias”.

• return_samples (boolean, optional) – If True, return the bootstrapped statistic or jackknife values. Defaults to False.

• theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably. Only used when method == “bias”.

• jv (array_like, optional) – Jackknife values. Can be passed if they have already been calculated, which will speed this up considerably. Only used when method == “jackknife”.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• theta_bar (float) – The bias-corrected estimator.

• theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True and method is either “bias” or “better_bootstrap_bias”.

• jv (ndarray) – Jackknife values. Only returned if return_samples is True and method == “jackknife”.

Notes

Per [ET93, S10.6], bias-corrected estimators tend to have much higher variance than the non-corrected version. This should be assessed, for example, using the bootstrap to directly estimate the standard error of the corrected and uncorrected estimators.

bootstrap_stat.bootstrap_asl(dist, stat, x, B=1000, size=None, return_samples=False, theta_star=None, theta_hat=None, two_sided=False, num_threads=1)

Achieved Significance Level, general bootstrap method

Parameters
• dist (EmpiricalDistribution) – The empirical distribution.

• stat (function) – The test statistic.

• x (array_like or pandas DataFrame or tuple) – The data. It isn’t used for anything, so the caller can pass None if the data are not available, but it is here for consistency across asl routines.

• B (int, optional) – Number of bootstrap samples. Defaults to 1000.

• size (int or tuple of ints, optional) – Size to pass for generating samples from the alternative distribution. Defaults to None.

• return_samples (boolean, optional) – If True, return the bootstrapped statistic or jackknife values. Defaults to False.

• theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.

• theta_hat (float, optional) – Observed statistic. Can be passed if it has already been calculated, which will speed this up slightly.

• two_sided (boolean, optional) – If True, computes a two-sided significance value. If False (default), only a one-sided value is returned. Support for two-sided tests is experimental. Use with caution!

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• asl (float) – Achieved significance level, the probability of an outcome at least as extreme as that actually observed under the null hypothesis; aka the p-value.

• theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.

bootstrap_stat.percentile_asl(dist, stat, x, theta_0=0, B=1000, size=None, return_samples=False, theta_star=None, theta_hat=None, two_sided=False, num_threads=1)

Achieved Significance Level, percentile method

Parameters
• dist (EmpiricalDistribution) – The empirical distribution.

• stat (function) – The test statistic.

• x (array_like or pandas DataFrame or tuple) – The data, used to calculate the observed value of the statistic if theta_hat is not passed.

• theta_0 (float, optional) – The mean of the test statistic under the null hypothesis. Defaults to 0.

• B (int, optional) – Number of bootstrap samples.

• size (int or tuple of ints, optional) – Size to pass for generating samples from the alternative distribution. Defaults to None.

• return_samples (boolean, optional) – If True, return the bootstrapped statistic or jackknife values. Defaults to False.

• theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.

• theta_hat (float, optional) – Observed statistic. Can be passed if it has already been calculated, which will speed this up slightly.

• two_sided (boolean, optional) – If True, computes a two-sided significance value. If False (default), only a one-sided value is returned. Support for two-sided tests is experimental. Use with caution!

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• asl (float) – Achieved significance level, the probability of an outcome at least as extreme as that actually observed under the null hypothesis; aka the p-value.

• theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.

Notes

Under the null hypothesis, the value of the statistic is theta_0. Suppose theta_hat > theta_0. Let theta_lo, theta_hi be the endpoints of a 100(1-alpha)% confidence interval on theta. Suppose alpha is such that theta_lo = theta_0. Then alpha is the achieved significance level.

For the percentile interval, this is simply the fraction of bootstrap samples that are “on the other side” of theta_0 from theta_hat.

bootstrap_stat.bcanon_asl(dist, stat, x, theta_0=0, B=1000, size=None, return_samples=False, theta_star=None, theta_hat=None, jv=None, two_sided=False, num_threads=1)

Achieved Significance Level, bcanon method

Parameters
• dist (EmpiricalDistribution) – The empirical distribution.

• stat (function) – The test statistic.

• x (array_like or pandas DataFrame or tuple) – The data, used to evaluate the observed statistic and compute jackknife values.

• theta_0 (float, optional) – The mean of the test statistic under the null hypothesis. Defaults to 0.

• B (int, optional) – Number of bootstrap samples.

• size (int or tuple of ints, optional) – Size to pass for generating samples from the alternative distribution. Defaults to None.

• return_samples (boolean, optional) – If True, return the bootstrapped statistic or jackknife values. Defaults to False.

• theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.

• theta_hat (float, optional) – Observed statistic. Can be passed if it has already been calculated, which will speed this up slightly.

• jv (array_like, optional) – Jackknife values. Can be passed if they have already been calculated, which will speed this up considerably.

• two_sided (boolean, optional) – If True, computes a two-sided significance value. If False (default), only a one-sided value is returned. Support for two-sided tests is experimental. Use with caution!

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• asl (float) – Achieved significance level, the probability of an outcome at least as extreme as that actually observed under the null hypothesis; aka the p-value.

• theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.

• jv (ndarray) – Jackknife values. Only returned if return_samples is True.

bootstrap_stat.bootstrap_power(alt_dist, null_dist, stat, asl=<function bootstrap_asl>, alpha=0.05, size=None, P=100, **kwargs)

Bootstrap Power

Parameters
• alt_dist (EmpiricalDistribution) – Distribution under the alternative hypothesis.

• null_dist (class) – Class corresponding to the null distribution. See Notes.

• stat (function) – Function that computes the test statistic.

• asl (function, optional) – Function that computes an achieved significance level. Defaults to bootstrap_asl.

• alpha (float, optional) – Desired Type-I error rate. Defaults to 0.05.

• size (int or tuple of ints, optional) – Size to pass for generating samples from the alternative distribution. Defaults to None.

• P (int, optional) – Number of Monte Carlo simulations to run for the purposes of calculating power. Defaults to 100.

• kwargs (optional) – Other keyword arguments to pass to asl, such as the number of bootstrap samples to use.

Returns

pwr – The fraction of Monte Carlo simulations in which the null hypothesis was rejected.

Return type

float

Notes

Perhaps the most confusing aspect of this function is that there are two distribution passed as input, and they are of a different form. The alt_dist should be passed as an instance of an EmpiricalDistribution or a subclass thereof. We use this parameter to generate samples from that distribution. We then need to generate an EmpiricalDistribution from that sample, for which we need the class corresponding to the null distribution, not an instance thereof. I recognize this is confusing!

bootstrap_stat.prediction_error_optimism(dist, data, train, predict, error, B=200, apparent_error=None, num_threads=1)

Prediction Error, Optimism Method

Parameters
• dist (EmpiricalDistribution) – Empirical distribution.

• data (array_like or pandas DataFrame) – The data.

• train (function) – Function which takes as input a dataset sampled from the empirical distribution and returns a fitted model.

• predict (function) – Function which takes as input a fitted model and a dataset, and returns the predicted labels for that dataset

• error (function) – Function which takes as input a fitted model and a dataset, and returns the mean prediction error on that dataset.

• B (int, optional) – Number of bootstrap samples. Defaults to 200.

• apparent_error (float, optional) – The prediction error of the model on the dataset used to train the model, also known as the training error. If omitted, will be calculated. Can be passed to this function to save time, for example if the model had already been fit elsewhere.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

pe – Prediction error.

Return type

float

Notes

The bootstrap estimate of prediction error can be used for model selection. It is similar to cross validation. It adds a bias correction term to the apparent error (the accuracy of the predictor applied to the same dataset used to train the predictor). This bias correction term is called the optimism. See [ET93, S17] for details.

bootstrap_stat.prediction_error_632(dist, data, train, predict, error, B=200, apparent_error=None, use_632_plus=False, gamma=None, no_inf_err_rate=None, num_threads=1)

.632 Bootstrap

Parameters
• dist (EmpiricalDistribution) – Empirical distribution.

• data (array_like or pandas DataFrame) – The data.

• train (function) – Function which takes as input a dataset sampled from the empirical distribution and returns a fitted model.

• predict (function) – Function which takes as input a fitted model and a dataset, and returns the predicted labels for that dataset

• error (function) – Function which takes as input a fitted model and a dataset, and returns the prediction error for each observation in that dataset.

• B (int, optional) – Number of bootstrap samples. Defaults to 200.

• apparent_error (float, optional) – The prediction error of the model on the dataset used to train the model, also known as the training error. If omitted, will be calculated. Can be passed to this function to save time, for example if the model had already been fit elsewhere.

• use_632_plus (boolean, optional) – If True, uses the .632+ bootstrap. See Notes.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

pe – Prediction error.

Return type

float

Notes

The .632 bootstrap estimate of prediction error is the weighted average of the apparent error and another term called eps0. The latter term is kind of like cross validation: we generate a bootstrap sample, train a model on it, and then make predictions using that model on the original dataset. But we only care about the predictions on observations that are not part of the bootstrap sample. We then average those prediction errors across all bootstrap samples. See [ET93, S17.7] for details.

The method is so-called because the estimated prediction error is .368 times the apparent error plus .632 times eps0. [ET93] reports that this method performed better than leave-one-out cross validation in their simulations, having lower variance, but they themselves admit they had not thoroughly evaluated it. [Koh95] reported that the .632 bootstrap performed quite poorly when overfitting is present. This led Efron and Tibshirani to propose the .632+ bootstrap in [ET97]. Finally, [Arl10] surveys various approaches to model selection, recommending 10-fold cross validation as the preferred method of model selection.

Because the .632 bootstrap has apparently not withstood the test of time, I have made no attempt to implement it efficiently, instead preferring the easy-to-follow approach below. This function should really only serve pedagogical purposes: it is not recommended for serious applications!

The no information error rate looks at the prediction at point j, and computes the error for every label y_i. It averages the errors over all i and j. This is because, if we assume the features offer no insight into the labels, any observation is as good as any other at predicting any particular label.

bootstrap_stat.prediction_interval(dist, x, mean=None, std=None, B=1000, alpha=0.05, t_star=None, return_t_star=False, num_threads=-1)

Prediction interval

Parameters
• dist (EmpiricalDistribution) – The empirical distribution.

• x (array_like or pandas DataFrame) – The data.

• mean (function, optional) – A function returning the mean of a bootstrap sample. Defaults to np.mean, but this only works for arrays, not DataFrames. To emphasize, this function must return a float!

• std (function, optional) – A function returning the standard deviation of a bootstrap sample. Defaults to np.std using ddof=1. As with mean, specify something different for DataFrames!

• B (int, optional) – Number of bootstrap samples. Defaults to 1000.

• alpha (float, optional) – Number controlling the size of the interval. That is, this function will return a 100(1-2 * alpha)% prediction interval. Defaults to 0.05.

• t_star (array_like or None) – Array of studentized values, used to calculate the interval. Can be passed to this function to speed it up, for example when calculating multiple intervals.

• return_t_star (boolean, optional) – If True, return the studentized values. (Sometimes it is helpful to plot these.)

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• pred_low, pred_high (float) – A 100(1 - 2 * alpha)% prediction interval on a point sampled from F.

• t_star (array) – Array of studentized values. Returned only if return_t_star is True.

Notes

Suppose we observe $$X_1, X_2, \ldots, X_n$$ sampled IID from a distribution $$F$$. We wish to calculate a range of plausible values for a new point drawn from the same distribution. This function returns such a prediction interval.

bootstrap_stat.multithreaded_bootstrap_samples(dist, stat, B, size=None, jackknife=False, num_threads=-1)

Generate bootstrap samples in parallel.

Parameters
• dist (EmpiricalDistribution) – The empirical distribution.

• stat (function or dict_like) – The statistic or statistics for which we wish to calculate the standard error. If you want to compute different statistics on the same bootstrap samples, specify as a dictionary having functions as values.

• B (int) – Number of bootstrap samples.

• size (int or tuple of ints, optional) – Size to pass for generating samples. Defaults to None.

• num_threads (int, optional) – Number of threads to use. Defaults to the number of available CPUs.

Returns

theta_star – Array of bootstrapped statistic values. When multiple statistics are being calculated on the same bootstrap samples, the return value will be a dictionary having keys the same as stat and values an ndarray for each statistic.

Return type

ndarray or dictionary

Notes

Jackknifing is not currently supported.

bootstrap_stat.bootstrap_samples(dist, stat, B, size=None, jackknife=False, num_threads=1)

Generate bootstrap samples.

Parameters
• dist (EmpiricalDistribution) – The empirical distribution.

• stat (function or dict_like) – The statistic or statistics for which we wish to calculate the standard error. If you want to compute different statistics on the same bootstrap samples, specify as a dictionary having functions as values.

• B (int) – Number of bootstrap samples.

• size (int or tuple of ints, optional) – Size to pass for generating samples. Defaults to None.

• jackknife (boolean, optional) – If True, returns an array of jackknife bootstrap statistics (see the jackknife_array return value). Defaults to False.

• num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to -1 to use all available cores.

Returns

• theta_star (ndarray or dictionary) – Array of bootstrapped statistic values. When multiple statistics are being calculated on the same bootstrap samples, the return value will be a dictionary having keys the same as stat and values an ndarray for each statistic.

• jackknife_array (ndarray) – Array of n arrays or dicts, where n is the number of elements of the original dataset upon which the empirical distribution is based. Each element of this array is in turn either an array or dict according to stat. If stat is just a single function, it will be an array, otherwise a dict. The ith element of the inner array will be those bootstrap statistics corresponding to samples not including the ith value from the original dataset. (For jackknifing, only samples not including the ith datapoint can be used for inferences involving the ith point.) See [ET93, S19.4] for details. Only returned if jackknife is True.

bootstrap_stat.loess(z0, z, y, alpha, sided='both')

Locally estimated scatterplot smoothing

Parameters
• z0 (float) – Test point

• z (array_like) – Endogenous variables.

• y (array_like) – Exogenous variables.

• alpha (float) – Smoothing parameter, governing how many points to include in the local fit. Should be between 0 and 1, with higher values corresponding to more smoothing.

• sided (["both", "trailing", "leading"], optional) – Dictates what side(s) of z0 can be used to create the smooth. With the default behavior (“both”), points both before and after z0 can be used. If “trailing” is specified, only points less than or equal to z0 may be used to perform the smooth. If “leading” is specified, only points greater than or equal to z0 may be used. This is intended to support time series methods where we only want to do trailing averages.

Returns

y_smoothed – The smoothed estimate of the exogenous variable evaluated at z0.

Return type

float