Documentation for bootstrapstat¶
Literary References¶
 [ET93]: Bradley Efron and Robert J. Tibshirani, “An Introduction to the
Bootstrap”. Chapman & Hall, 1993.
 [Koh95]: Ron Kohavi, “A Study of CrossValidation and Bootstrap for
Accuracy Estimation and Model Selection”. International Joint Conference on Artificial Intelligence, 1995.
 [ET97]: Bradley Efron and Robert Tibshirani. “Improvements on
CrossValidation: The .632+ Bootstrap Method”. Journal of the American Statistical Association, Vol. 92, No. 438. June 1997, pp. 548–560.
 [Arl10]: Sylvain Arlot, “A Survey of CrossValidation Procedures for
Model Selection”. Statistics Surveys, Vol. 4, 2010.
Module Functions¶

class
bootstrap_stat.
EmpiricalDistribution
(data)¶ Empirical Distribution
The Empirical Distribution puts probability 1/n on each of n observations.
 Parameters
data (array_like or pandas DataFrame) – The data.

sample
(size=None, return_indices=False, reset_index=True)¶ Sample from the empirical distribution
 Parameters
size (int or tuple of ints, optional) – Output shape. If None (default), samples the same number of points as the original dataset.
return_indices (boolean, optional) – If True, return the indices of the data points sampled. Defaults to False.
reset_index (boolean, optional) – If True (default), reset the index. Applies only to data frames. This is usually what we would want to do, except for debugging perhaps.
 Returns
samples (ndarray or pandas DataFrame) – IID samples from the empirical distribution.
ind (ndarray) – Indices of samples chosen. Only returned if return_indices is True.

calculate_parameter
(t)¶ Calculate a parameter of the distribution.
 Parameters
t (function) – Function to be applied to dataset. If using an nSample Distribution, t should take as input a tuple of data sets of the appropriate size.
 Returns
tF – Parameter of distribution.
 Return type
float

class
bootstrap_stat.
MultiSampleEmpiricalDistribution
(datasets)¶ MultiSample Empirical Distribution
 Parameters
datasets (tuple of arrays or pandas DataFrames.) – Observed data sets.
Notes
Suppose we observe
\[ \begin{align}\begin{aligned}x_i \sim F, i=1,...,m\\y_j \sim G, j=1,...,n\end{aligned}\end{align} \]Then \(P = (\hat{F}, \hat{G})\) is the probabilistic mechanism consisting of the two empirical distributions \(F\) and \(G\). Sampling from \(P\) amounts to sampling \(m\) points IID from \(\hat{F}\), and \(n\) points IID from \(\hat{G}\). This is called the Two Sample Empirical Distribution, and comes up a lot in A/B testing. The generalization to more than two samples is obvious.
Because of some of the implementation details of the bootstrap methods, we use tuples for everything. So when initializing, be sure to wrap the different datasets in a tuple. Samples will themselves be tuples, etc. See the function documentation below for details and examples.
This is a relatively thin wrapper around the regular EmpiricalDistribution: we just create a distinct EmpiricalDistribution for each dataset, and use that for sampling.
Examples
>>> data = [1, 2, 3] >>> dist = EmpiricalDistribution(data) >>> dist.sample() [1, 2, 1] >>> data_a = [1, 2, 3] >>> data_b = [4, 5, 6] >>> data = (data_a, data_b) # Note tuple >>> dist = MultiSampleEmpiricalDistribution(data) >>> a, b = dist.sample() # Can detuple directly >>> a [2, 2, 3] >>> b [4, 6, 4] >>> ab = dist.sample() # Or indirectly, which is often more useful >>> ab [array([2, 2, 3]), array([4, 6, 4])]

sample
(size=None)¶ Sample from the empirical distribution
 Parameters
size (tuple of ints, optional) – Number of samples to be drawn from each EmpiricalDistribution. If None (default), samples the same numbers of points as the original datasets.
 Returns
samples – IID samples from the empirical distributions.
 Return type
tuple of ndarray or pandas DataFrame

calculate_parameter
(t)¶ Calculate a parameter of the distribution.
 Parameters
t (function) – Function to be applied to dataset. Should take as input a tuple of data sets of the appropriate size.
 Returns
tF – Parameter of distribution.
 Return type
float
Examples
Suppose we are in the TwoSample case and have two empirical distributions, \(\hat{F}\) and \(\hat{G}\), and we want to calculate the difference in means of these distributions. We might do something like:
>>> data_a = [1, 2, 3] >>> data_b = [4, 5, 6] >>> data = (data_a, data_b) # Note tuple >>> dist = MultiSampleEmpiricalDistribution(data) >>> def parameter(ab): ... a, b = ab # Note detupling ... return np.mean(b)  np.mean(a) >>> dist.calculate_parameter(parameter) 3.0

bootstrap_stat.
jackknife_standard_error
(x, stat, return_samples=False, jv=None, num_threads=1)¶ Jackknife estimate of standard error.
 Parameters
x (array_like or pandas DataFrame or tuple) – The data. If tuple. interpretation is as a MultiSample distribution, like in A/B testing.
stat (function) – The statistic.
return_samples (boolean, optional) – If True, return the jackknife values. Defaults to False.
jv (array_like, optional) – Jackknife values. Can be passed if they have already been calculated, which will speed this up considerably.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
se (float) – The standard error.
jv (ndarray) – Jackknife values. Only returned if return_samples is True.
Notes
The jackknife estimate of standard error is only applicable when stat is a plugin statistic, that is, having the form \(t(\hat{F})\), where \(\hat{F}\) is the empirical distribution. Moreover, it is only applicable when t is a smooth function. Notable exceptions include the median. The jackknife cannot be used to estimate the standard error of nonsmooth estimators. See [EF93, S10.6]

bootstrap_stat.
standard_error
(dist, stat, robustness=None, B=200, size=None, jackknife_after_bootstrap=False, return_samples=False, theta_star=None, num_threads=1)¶ Standard error
 Parameters
dist (EmpiricalDistribution) – The empirical distribution.
stat (function) – The statistic for which we wish to calculate the standard error.
robustness (float or None, optional) – Controls whether to use a robust estimate of standard error. If specified, should be a float in (0.5, 1.0), with lower values corresponding to greater bias but increased robustness. If None (default), uses the nonrobust estimate of standard error.
B (int, optional) – Number of bootstrap samples. Defaults to 200.
size (int or tuple of ints, optional) – Size to pass for generating samples from the distribution. Defaults to None, indicating the samples will be the same size as the original dataset.
jackknife_after_bootstrap (boolean, optional) – If True, will estimate the variability of our estimate of standard error. See [ET93, S19.4] for details.
return_samples (boolean, optional) – If True, return the bootstrapped statistic values. Defaults to False.
theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
se (float) – The standard error.
se_jack (float) – Jackknifeafterbootstrap estimate of the standard error of se (i.e. an estimate of the variability of se itself). Only returned if jackknife_after_bootstrap is True.
theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.

bootstrap_stat.
infinitesimal_jackknife
(x, stat, eps=0.001, influence_components=None, return_influence_components=False, num_threads=1)¶ Infinitesimal Jackknife
 Parameters
x (array_like or pandas DataFrame or tuple) – The data. If tuple. interpretation is as a MultiSample distribution, like in A/B testing.
stat (function) – The statistic. Should take x and a resampling vector as input. See Notes.
eps (float, optional) – Epsilon for limit calculation. Defaults to 1e3.
influence_components (array_like, optional) – Influence components. See Notes.
return_influence_components (boolean, optional) – Specifies whether to return the influence components. Defaults to False.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
se (float) – The standard error.
influence_components (array_like) – The influence components. Only returned if return_influence_components is True.
Notes
The infinitesimal jackknife requires the statistic to be expressed in “resampling form”. See [ET93, S21] for details. The ith influence component is a type of derivative of the statistic with respect to the ith observation. This is computed by a finite difference method: we simply evaluate the statistic putting a little extra weight (eps) on the ith observation, minus the statistic evaluated on the original dataset, divided by eps.
In some cases, there is an analytical formula for the influence components. In these cases, it would be better for the caller to compute the influence components elsewhere and simply pass them to this function.

bootstrap_stat.
t_interval
(dist, stat, theta_hat, stabilize_variance=False, se_hat=None, fast_std_err=None, alpha=0.05, Binner=25, Bouter=1000, Bvar=100, size=None, empirical_distribution=<class 'bootstrap_stat.EmpiricalDistribution'>, return_samples=False, theta_star=None, se_star=None, z_star=None, num_threads=1)¶ Bootstrapt Intervals
 Parameters
dist (EmpiricalDistribution) – The empirical distribution.
stat (function) – The statistic for which we wish to calculate a confidence interval.
theta_hat (float) – The observed statistic.
stabilize_variance (boolean, optional) – If True, use the variance stabilization technique. Defaults to False.
se_hat (float or None, optional) – The standard error of the observed data. If None (default), will be calculated using the nonrobust bootstrap estimate of standard error, using the default number of iterations. The user may wish to use a nondefault number of bootstrap iterations to calculate this, or a robust variant. If so, the user should calculate this externally and pass it to this function.
fast_std_err (function or None, optional) – To speed this up, the user may specify a fast function for computing the standard error of a bootstrap sample. If not specified, we will use the nonrobust bootstrap estimate of standard error, using the default number of iterations. This can also be used to specify a nondefault bootstrap methodology, such as a robust version. See Examples for some examples.
alpha (float, optional) – Number controlling the size of the interval. That is, this function will return a 100(12*`alpha`)% confidence interval. Defaults to 0.05, corresponding to a 90% confidence interval.
Binner (int, optional) – Number of bootstrap samples for calculating standard error. Defaults to 25.
Bouter (int, optional) – Number of bootstrap samples for calculating percentiles. Defaults to 1000.
Bvar (int, optional) – Number of bootstrap samples used to estimate the relationship between the statistic and the standard error for use with variance stabilization. Defaults to 100.
size (int or tuple of ints, optional) – Size to pass for generating samples from the distribution. Defaults to None, indicating the samples will be the same size as the original dataset.
empirical_distribution (class, optional) – Class to be used to generate an empirical distribution. Defaults to the regular EmpiricalDistribution, but any class that implements a sample method will work. For example, the MultiSampleEmpiricalDistribution can be used. This can be used to accommodate more exotic applications of the Bootstrap.
return_samples (boolean, optional) – If True, return the bootstrapped statistic values. Defaults to False.
theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.
se_star (array_like, optional) – Bootstrapped statistic standard errors. Can be passed if they have already been calculated, which will speed this up considerably.
z_star (array_like, optional) – Bootstrapped pivot values. Can be passed if they have already been calculated, which will speed this up considerably.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
ci_low, ci_high (float) – Lower and upper bounds on a 100(12*`alpha`)% confidence interval on theta.
theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.
se_star (ndarray) – Array of bootstrapped statistic standard errors. Only returned if return_samples is True.
z_star (ndarray) – Array of bootstrapped pivot values. Only returned if return_samples is True and stabilize_variance is False.
Examples
>>> x = np.random.randn(100) >>> dist = EmpiricalDistribution(x) >>> def statistic(x): return np.mean(x) >>> theta_hat = statistic(x) >>> ci_low, ci_high = t_interval(dist, statistic, theta_hat)
>>> se_hat = standard_error(dist, statistic, robustness=0.95, B=2000) >>> ci_low, ci_high = t_interval(dist, statistic, theta_hat, se_hat=se_hat)
>>> def fast_std_err(x): return np.sqrt(np.var(x, ddof=1) / len(x)) >>> ci_low, ci_high = t_interval(dist, statistic, theta_hat, ... fast_std_err=fast_std_err)
>>> def fast_std_err(x): ... dist = EmpiricalDistribution(x) ... return standard_error(dist, statistic, robustness=0.95, B=2000) >>> ci_low, ci_high = t_interval(dist, statistic, theta_hat, ... fast_std_err=fast_std_err)

bootstrap_stat.
percentile_interval
(dist, stat, alpha=0.05, B=1000, size=None, return_samples=False, theta_star=None, num_threads=1)¶ Percentile Intervals
 Parameters
dist (EmpiricalDistribution) – The empirical distribution.
stat (function) – The statistic.
alpha (float, optional) – Number controlling the size of the interval. That is, this function will return a 100(1  2 * alpha)% confidence interval. Defaults to 0.05.
B (int, optional) – Number of bootstrap samples. Defaults to 1000.
size (int or tuple of ints, optional) – Size to pass for generating samples from the distribution. Defaults to None, indicating the samples will be the same size as the original dataset.
return_samples (boolean, optional) – If True, return the bootstrapped statistic values. Defaults to False.
theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
ci_low, ci_high (float) – Lower and upper bounds on a 100(12*`alpha`)% confidence interval on theta.
theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.

bootstrap_stat.
bcanon_interval
(dist, stat, x, alpha=0.05, B=1000, size=None, return_samples=False, theta_star=None, theta_hat=None, jv=None, num_threads=1)¶ BCa Confidence Intervals
 Parameters
dist (EmpiricalDistribution) – The empirical distribution.
stat (function) – The statistic.
x (array_like or pandas DataFrame or tuple) – The data, used to evaluate the observed statistic and compute jackknife values.
alpha (float, optional) – Number controlling the size of the interval. That is, this function will return a 100(12*`alpha`)% confidence interval. Defaults to 0.05.
B (int, optional) – Number of bootstrap samples. Defaults to 1000.
size (int or tuple of ints, optional) – Size to pass for generating samples from the distribution. Defaults to None, indicating the samples will be the same size as the original dataset.
return_samples (boolean, optional) – If True, return the bootstrapped statistic values. Defaults to False.
theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.
theta_hat (float, optional) – Observed statistic. Can be passed if it has already been calculated, which will speed this up slightly.
jv (array_like, optional) – Jackknife values. Can be passed if they have already been calculated, which will speed this up considerably.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
ci_low, ci_high (float) – Lower and upper bounds on a 100(12*`alpha`)% confidence interval on theta.
theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.
jv (ndarray) – Jackknife values. Only returned if return_samples is True.

bootstrap_stat.
abcnon_interval
(x, stat, alpha=0.05, eps=0.001, influence_components=None, second_derivatives=None, return_influence_components=False, num_threads=1)¶ ABC Confidence Intervals
 Parameters
x (array_like or pandas DataFrame) – The data, used to evaluate the observed statistic and compute influence components.
stat (function) – The statistic. Should take x and a resampling vector as input. See Notes.
alpha (float or array of floats, optional) – Number controlling the size of the interval. That is, this function will return a 100(1  2 * alpha)% confidence interval. Defaults to 0.05. Alternatively, the user can pass an array of floats in (0, 1). In that case, the return value will be a tuple of confidence points. See Notes.
influence_components (array_like, optional) – Influence components. See Notes.
second_derivatives (array_like, optional) – Vector of second derivatives. See Notes.
return_influence_components (boolean, optional) – Specifies whether to return the influence components and second derivatives. Defaults to False.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
ci_low, ci_high (float) – Lower and upper bounds on a 100(1  2 * alpha)% confidence interval on theta.
influence_components (array_like) – Vector of influence components. Only returned if return_influence_components is True.
second_derivatives (array_like) – Vector of second_derivatives. Only returned if return_influence_components is True.
Notes
Approximate Bootstrap Confidence (ABC) intervals require the statistic to be expressed in resampling form. See [ET93, S14.4] for details. It only applies to statistics which are smooth functions of the data. A notable example where ABC does not apply is the sample median.
ABC works in terms of the derivatives and second derivatives of the statistic with respect to the data. In some cases, analytical forms are possible. In that case, the caller may wish to calculate them externally to this function and pass them in. The default behavior is to calculate them using a finite difference method.
When we want to compute multiple confidence points, we can reuse many of the calculations. For example, if we want to compute a 90% confidence interval and a 99% confidence interval, we would specify alpha = [0.005, 0.05, 0.95, 0.995] and read off the appropriate return values. This would be much faster than multiple calls to this function. (The recurring cost is 1 call to stat for each additional point specified.) This in turn facilitates using these intervals for achieved significance levels, effectively by inverting the interval having endpoint 0. See the ASL functions for how this might be done.

bootstrap_stat.
calibrate_interval
(dist, stat, x, theta_hat, alpha=0.05, B=1000, return_confidence_points=False, num_threads=1)¶ Calibrated confidence interval
 Parameters
dist (EmpiricalDistribution) – The empirical distribution.
stat (function) – The statistic.
x (array_like or pandas DataFrame or tuple) – The data.
theta_hat (float) – Observed statistic.
alpha (float, optional) – Number controlling the size of the interval. That is, this function will return a 100(12*`alpha`)% confidence interval. Defaults to 0.05.
B (int, optional) – Number of bootstrap samples. Defaults to 1000.
return_confidence_points (boolean, optional) – If True, returns the estimated confidence points have the desired coverage. Defaults to False.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
ci_low, ci_high (float) – Lower and upper bounds on a 100(12*`alpha`)% confidence interval on theta.
lmbda_low, lmbda_high (float) – The estimated confidence points having the desired coverage. For example, if the interval has the nominal coverage, then lmbda_low would be alpha and lmbda_high would be 1  alpha.
Notes
While we can in principle calibrate any type of confidence interval, in most instances that results in a “double bootstrap”. For this reason, only ABC intervals are currently supported. Moreover, from my limited experience calibration seems pretty finicky. I would consider this function to be illustrative but experimental. See [ET93, S18] for details.
We compute the observed coverage for a range of points around the nominal alpha and 1alpha. Then we fit a smoother (loess) to these data and invert to find the confidence point, lmbda, having the desired coverage.

bootstrap_stat.
jackknife_values
(x, stat, sample=None, num_threads=1)¶ Compute jackknife values.
 Parameters
x (array_like or pandas DataFrame or tuple of arrays/DataFrames.) – The data.
stat (function) – The statistic.
sample (int, optional) – When Jackknifing a multisample distribution, like for an A/B test, we generate one set of jackknife values for each sample. The caller should specify which sample for which jackknife values should be generated, calling this function once for each sample.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
jv – The jackknife values.
 Return type
ndarray
Notes
The jackknife values consist of the statistic applied to a collection of datasets derived from the original by holding out each observation in turn. For example, let x1 be the dataset corresponding to x, but with the first datapoint removed. The first jackknife value is simply stat(x1).

bootstrap_stat.
bias
(dist, stat, t, B=200, return_samples=False, theta_star=None, num_threads=1)¶ Estimate of bias
 Parameters
dist (EmpiricalDistribution) – The empirical distribution.
stat (function) – The statistic for which we wish to calculate the bias.
t (function) – Function to be applied to empirical distribution function.
B (int, optional) – Number of bootstrap samples. Defaults to 200.
return_samples (boolean, optional) – If True, return the bootstrapped statistic values. Defaults to False.
theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
bias (float) – Estimate of bias.
theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.

bootstrap_stat.
better_bootstrap_bias
(x, stat, B=400, return_samples=False, num_threads=1)¶ Better bootstrap bias.
 Parameters
x (array_like or pandas DataFrame) – The data.
stat (function) – The statistic. Should take x and a resampling vector as input. See Notes.
B (int, optional) – Number of bootstrap samples. Defaults to 400.
return_samples (boolean, optional) – If True, return the bootstrapped statistic values. Defaults to False.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
bias (float) – Estimate of bias.
theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.
Notes
The “better” bootstrap bias estimate is only applicable when stat is a plugin statistic for the parameter being estimated, that is, having the form \(t(\hat{F})\), where \(\hat{F}\) is the empirical distribution. Notable situations where this assumption does not hold include robust statistics like using the alphatrimmed mean, since that is not the plugin statistic for the mean. In cases like that, just use the “worse” bias function. The advantage of the “better” bootstrap bias estimate is faster convergence. Whereas using B = 400 is typically adequate here, it can take thousands of bootstrap samples to give an accurate estimate in the “worse” bias function.

bootstrap_stat.
jackknife_bias
(x, stat, return_samples=False, jv=None, num_threads=1)¶ Jackknife estimate of bias.
 Parameters
x (array_like or pandas DataFrame) – The data.
stat (function) – The statistic.
return_samples (boolean, optional) – If True, return the jackknife values. Defaults to False.
jv (array_like, optional) – Jackknife values. Can be passed if they have already been calculated, which will speed this up considerably.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
bias (float) – Estimate of bias.
jv (ndarray) – Jackknife values. Only returned if return_samples is True.
Notes
The jackknife estimate of bias is only applicable when stat is a plugin statistic, that is, having the form \(t(\hat{F})\), where \(\hat{F}\) is the empirical distribution. Moreover, it is only applicable when t is a smooth function. Notable exceptions include the median. The jackknife cannot be used to estimate the bias of nonsmooth estimators. See [EF93, S10.5] for details.

bootstrap_stat.
bias_corrected
(x, stat, method='better_bootstrap_bias', dist=None, t=None, B=None, return_samples=False, theta_star=None, jv=None, num_threads=1)¶ Biascorrected estimator.
 Parameters
x (array_like or pandas DataFrame) – The data.
stat (function) – The statistic. For use with “bias” and “jackknife” methods, should take a dataset as the input. For use with the “better_bootstrap_bias” method, it should take the dataset and a resampling vector as input. See the documentation in the better_bootstrap_bias function for details.
method (["better_bootstrap_bias", "bias", "jackknife"]) – The method by which we correct for bias. Defaults to “better_bootstrap_bias”.
dist (EmpiricalDistribution) – The empirical distribution. Required when method == “bias”.
t (function) – Function to be applied to empirical distribution function. Required when method == “bias”.
B (int, optional) – Number of bootstrap samples. Required when method == “bias” or “better_bootstrap_bias”. Defaults to 400 when method == “better_bootstrap_bias” or 4000 when method == “bias”.
return_samples (boolean, optional) – If True, return the bootstrapped statistic or jackknife values. Defaults to False.
theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably. Only used when method == “bias”.
jv (array_like, optional) – Jackknife values. Can be passed if they have already been calculated, which will speed this up considerably. Only used when method == “jackknife”.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
theta_bar (float) – The biascorrected estimator.
theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True and method is either “bias” or “better_bootstrap_bias”.
jv (ndarray) – Jackknife values. Only returned if return_samples is True and method == “jackknife”.
Notes
Per [ET93, S10.6], biascorrected estimators tend to have much higher variance than the noncorrected version. This should be assessed, for example, using the bootstrap to directly estimate the standard error of the corrected and uncorrected estimators.

bootstrap_stat.
bootstrap_asl
(dist, stat, x, B=1000, size=None, return_samples=False, theta_star=None, theta_hat=None, two_sided=False, num_threads=1)¶ Achieved Significance Level, general bootstrap method
 Parameters
dist (EmpiricalDistribution) – The empirical distribution.
stat (function) – The test statistic.
x (array_like or pandas DataFrame or tuple) – The data. It isn’t used for anything, so the caller can pass None if the data are not available, but it is here for consistency across asl routines.
B (int, optional) – Number of bootstrap samples. Defaults to 1000.
size (int or tuple of ints, optional) – Size to pass for generating samples from the alternative distribution. Defaults to None.
return_samples (boolean, optional) – If True, return the bootstrapped statistic or jackknife values. Defaults to False.
theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.
theta_hat (float, optional) – Observed statistic. Can be passed if it has already been calculated, which will speed this up slightly.
two_sided (boolean, optional) – If True, computes a twosided significance value. If False (default), only a onesided value is returned. Support for twosided tests is experimental. Use with caution!
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
asl (float) – Achieved significance level, the probability of an outcome at least as extreme as that actually observed under the null hypothesis; aka the pvalue.
theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.

bootstrap_stat.
percentile_asl
(dist, stat, x, theta_0=0, B=1000, size=None, return_samples=False, theta_star=None, theta_hat=None, two_sided=False, num_threads=1)¶ Achieved Significance Level, percentile method
 Parameters
dist (EmpiricalDistribution) – The empirical distribution.
stat (function) – The test statistic.
x (array_like or pandas DataFrame or tuple) – The data, used to calculate the observed value of the statistic if theta_hat is not passed.
theta_0 (float, optional) – The mean of the test statistic under the null hypothesis. Defaults to 0.
B (int, optional) – Number of bootstrap samples.
size (int or tuple of ints, optional) – Size to pass for generating samples from the alternative distribution. Defaults to None.
return_samples (boolean, optional) – If True, return the bootstrapped statistic or jackknife values. Defaults to False.
theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.
theta_hat (float, optional) – Observed statistic. Can be passed if it has already been calculated, which will speed this up slightly.
two_sided (boolean, optional) – If True, computes a twosided significance value. If False (default), only a onesided value is returned. Support for twosided tests is experimental. Use with caution!
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
asl (float) – Achieved significance level, the probability of an outcome at least as extreme as that actually observed under the null hypothesis; aka the pvalue.
theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.
Notes
Under the null hypothesis, the value of the statistic is theta_0. Suppose theta_hat > theta_0. Let theta_lo, theta_hi be the endpoints of a 100(1alpha)% confidence interval on theta. Suppose alpha is such that theta_lo = theta_0. Then alpha is the achieved significance level.
For the percentile interval, this is simply the fraction of bootstrap samples that are “on the other side” of theta_0 from theta_hat.

bootstrap_stat.
bcanon_asl
(dist, stat, x, theta_0=0, B=1000, size=None, return_samples=False, theta_star=None, theta_hat=None, jv=None, two_sided=False, num_threads=1)¶ Achieved Significance Level, bcanon method
 Parameters
dist (EmpiricalDistribution) – The empirical distribution.
stat (function) – The test statistic.
x (array_like or pandas DataFrame or tuple) – The data, used to evaluate the observed statistic and compute jackknife values.
theta_0 (float, optional) – The mean of the test statistic under the null hypothesis. Defaults to 0.
B (int, optional) – Number of bootstrap samples.
size (int or tuple of ints, optional) – Size to pass for generating samples from the alternative distribution. Defaults to None.
return_samples (boolean, optional) – If True, return the bootstrapped statistic or jackknife values. Defaults to False.
theta_star (array_like, optional) – Bootstrapped statistic values. Can be passed if they have already been calculated, which will speed this up considerably.
theta_hat (float, optional) – Observed statistic. Can be passed if it has already been calculated, which will speed this up slightly.
jv (array_like, optional) – Jackknife values. Can be passed if they have already been calculated, which will speed this up considerably.
two_sided (boolean, optional) – If True, computes a twosided significance value. If False (default), only a onesided value is returned. Support for twosided tests is experimental. Use with caution!
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
asl (float) – Achieved significance level, the probability of an outcome at least as extreme as that actually observed under the null hypothesis; aka the pvalue.
theta_star (ndarray) – Array of bootstrapped statistic values. Only returned if return_samples is True.
jv (ndarray) – Jackknife values. Only returned if return_samples is True.

bootstrap_stat.
bootstrap_power
(alt_dist, null_dist, stat, asl=<function bootstrap_asl>, alpha=0.05, size=None, P=100, **kwargs)¶ Bootstrap Power
 Parameters
alt_dist (EmpiricalDistribution) – Distribution under the alternative hypothesis.
null_dist (class) – Class corresponding to the null distribution. See Notes.
stat (function) – Function that computes the test statistic.
asl (function, optional) – Function that computes an achieved significance level. Defaults to bootstrap_asl.
alpha (float, optional) – Desired TypeI error rate. Defaults to 0.05.
size (int or tuple of ints, optional) – Size to pass for generating samples from the alternative distribution. Defaults to None.
P (int, optional) – Number of Monte Carlo simulations to run for the purposes of calculating power. Defaults to 100.
kwargs (optional) – Other keyword arguments to pass to asl, such as the number of bootstrap samples to use.
 Returns
pwr – The fraction of Monte Carlo simulations in which the null hypothesis was rejected.
 Return type
float
Notes
Perhaps the most confusing aspect of this function is that there are two distribution passed as input, and they are of a different form. The alt_dist should be passed as an instance of an EmpiricalDistribution or a subclass thereof. We use this parameter to generate samples from that distribution. We then need to generate an EmpiricalDistribution from that sample, for which we need the class corresponding to the null distribution, not an instance thereof. I recognize this is confusing!

bootstrap_stat.
prediction_error_optimism
(dist, data, train, predict, error, B=200, apparent_error=None, num_threads=1)¶ Prediction Error, Optimism Method
 Parameters
dist (EmpiricalDistribution) – Empirical distribution.
data (array_like or pandas DataFrame) – The data.
train (function) – Function which takes as input a dataset sampled from the empirical distribution and returns a fitted model.
predict (function) – Function which takes as input a fitted model and a dataset, and returns the predicted labels for that dataset
error (function) – Function which takes as input a fitted model and a dataset, and returns the mean prediction error on that dataset.
B (int, optional) – Number of bootstrap samples. Defaults to 200.
apparent_error (float, optional) – The prediction error of the model on the dataset used to train the model, also known as the training error. If omitted, will be calculated. Can be passed to this function to save time, for example if the model had already been fit elsewhere.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
pe – Prediction error.
 Return type
float
Notes
The bootstrap estimate of prediction error can be used for model selection. It is similar to cross validation. It adds a bias correction term to the apparent error (the accuracy of the predictor applied to the same dataset used to train the predictor). This bias correction term is called the optimism. See [ET93, S17] for details.

bootstrap_stat.
prediction_error_632
(dist, data, train, predict, error, B=200, apparent_error=None, use_632_plus=False, gamma=None, no_inf_err_rate=None, num_threads=1)¶ .632 Bootstrap
 Parameters
dist (EmpiricalDistribution) – Empirical distribution.
data (array_like or pandas DataFrame) – The data.
train (function) – Function which takes as input a dataset sampled from the empirical distribution and returns a fitted model.
predict (function) – Function which takes as input a fitted model and a dataset, and returns the predicted labels for that dataset
error (function) – Function which takes as input a fitted model and a dataset, and returns the prediction error for each observation in that dataset.
B (int, optional) – Number of bootstrap samples. Defaults to 200.
apparent_error (float, optional) – The prediction error of the model on the dataset used to train the model, also known as the training error. If omitted, will be calculated. Can be passed to this function to save time, for example if the model had already been fit elsewhere.
use_632_plus (boolean, optional) – If True, uses the .632+ bootstrap. See Notes.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
pe – Prediction error.
 Return type
float
Notes
The .632 bootstrap estimate of prediction error is the weighted average of the apparent error and another term called eps0. The latter term is kind of like cross validation: we generate a bootstrap sample, train a model on it, and then make predictions using that model on the original dataset. But we only care about the predictions on observations that are not part of the bootstrap sample. We then average those prediction errors across all bootstrap samples. See [ET93, S17.7] for details.
The method is socalled because the estimated prediction error is .368 times the apparent error plus .632 times eps0. [ET93] reports that this method performed better than leaveoneout cross validation in their simulations, having lower variance, but they themselves admit they had not thoroughly evaluated it. [Koh95] reported that the .632 bootstrap performed quite poorly when overfitting is present. This led Efron and Tibshirani to propose the .632+ bootstrap in [ET97]. Finally, [Arl10] surveys various approaches to model selection, recommending 10fold cross validation as the preferred method of model selection.
Because the .632 bootstrap has apparently not withstood the test of time, I have made no attempt to implement it efficiently, instead preferring the easytofollow approach below. This function should really only serve pedagogical purposes: it is not recommended for serious applications!
The no information error rate looks at the prediction at point j, and computes the error for every label y_i. It averages the errors over all i and j. This is because, if we assume the features offer no insight into the labels, any observation is as good as any other at predicting any particular label.

bootstrap_stat.
prediction_interval
(dist, x, mean=None, std=None, B=1000, alpha=0.05, t_star=None, return_t_star=False, num_threads=1)¶ Prediction interval
 Parameters
dist (EmpiricalDistribution) – The empirical distribution.
x (array_like or pandas DataFrame) – The data.
mean (function, optional) – A function returning the mean of a bootstrap sample. Defaults to np.mean, but this only works for arrays, not DataFrames. To emphasize, this function must return a float!
std (function, optional) – A function returning the standard deviation of a bootstrap sample. Defaults to np.std using ddof=1. As with mean, specify something different for DataFrames!
B (int, optional) – Number of bootstrap samples. Defaults to 1000.
alpha (float, optional) – Number controlling the size of the interval. That is, this function will return a 100(12 * alpha)% prediction interval. Defaults to 0.05.
t_star (array_like or None) – Array of studentized values, used to calculate the interval. Can be passed to this function to speed it up, for example when calculating multiple intervals.
return_t_star (boolean, optional) – If True, return the studentized values. (Sometimes it is helpful to plot these.)
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
pred_low, pred_high (float) – A 100(1  2 * alpha)% prediction interval on a point sampled from F.
t_star (array) – Array of studentized values. Returned only if return_t_star is True.
Notes
Suppose we observe \(X_1, X_2, \ldots, X_n\) sampled IID from a distribution \(F\). We wish to calculate a range of plausible values for a new point drawn from the same distribution. This function returns such a prediction interval.

bootstrap_stat.
multithreaded_bootstrap_samples
(dist, stat, B, size=None, jackknife=False, num_threads=1)¶ Generate bootstrap samples in parallel.
 Parameters
dist (EmpiricalDistribution) – The empirical distribution.
stat (function or dict_like) – The statistic or statistics for which we wish to calculate the standard error. If you want to compute different statistics on the same bootstrap samples, specify as a dictionary having functions as values.
B (int) – Number of bootstrap samples.
size (int or tuple of ints, optional) – Size to pass for generating samples. Defaults to None.
num_threads (int, optional) – Number of threads to use. Defaults to the number of available CPUs.
 Returns
theta_star – Array of bootstrapped statistic values. When multiple statistics are being calculated on the same bootstrap samples, the return value will be a dictionary having keys the same as stat and values an ndarray for each statistic.
 Return type
ndarray or dictionary
Notes
Jackknifing is not currently supported.

bootstrap_stat.
bootstrap_samples
(dist, stat, B, size=None, jackknife=False, num_threads=1)¶ Generate bootstrap samples.
 Parameters
dist (EmpiricalDistribution) – The empirical distribution.
stat (function or dict_like) – The statistic or statistics for which we wish to calculate the standard error. If you want to compute different statistics on the same bootstrap samples, specify as a dictionary having functions as values.
B (int) – Number of bootstrap samples.
size (int or tuple of ints, optional) – Size to pass for generating samples. Defaults to None.
jackknife (boolean, optional) – If True, returns an array of jackknife bootstrap statistics (see the jackknife_array return value). Defaults to False.
num_threads (int, optional) – Number of threads to use for multicore processing. Defaults to 1, meaning all calculations will be done in a single thread. Set to 1 to use all available cores.
 Returns
theta_star (ndarray or dictionary) – Array of bootstrapped statistic values. When multiple statistics are being calculated on the same bootstrap samples, the return value will be a dictionary having keys the same as stat and values an ndarray for each statistic.
jackknife_array (ndarray) – Array of n arrays or dicts, where n is the number of elements of the original dataset upon which the empirical distribution is based. Each element of this array is in turn either an array or dict according to stat. If stat is just a single function, it will be an array, otherwise a dict. The ith element of the inner array will be those bootstrap statistics corresponding to samples not including the ith value from the original dataset. (For jackknifing, only samples not including the ith datapoint can be used for inferences involving the ith point.) See [ET93, S19.4] for details. Only returned if jackknife is True.

bootstrap_stat.
loess
(z0, z, y, alpha, sided='both')¶ Locally estimated scatterplot smoothing
 Parameters
z0 (float) – Test point
z (array_like) – Endogenous variables.
y (array_like) – Exogenous variables.
alpha (float) – Smoothing parameter, governing how many points to include in the local fit. Should be between 0 and 1, with higher values corresponding to more smoothing.
sided (["both", "trailing", "leading"], optional) – Dictates what side(s) of z0 can be used to create the smooth. With the default behavior (“both”), points both before and after z0 can be used. If “trailing” is specified, only points less than or equal to z0 may be used to perform the smooth. If “leading” is specified, only points greater than or equal to z0 may be used. This is intended to support time series methods where we only want to do trailing averages.
 Returns
y_smoothed – The smoothed estimate of the exogenous variable evaluated at z0.
 Return type
float