Bias ==== The bias of an estimator is the systematic gap between its expected value and the parameter it is trying to estimate. If the bias is small relative to the standard error, it can usually be ignored. If it is large, it distorts every downstream conclusion: confidence intervals are off-center, *p*-values are miscalibrated, and reported point estimates are nudged in a fixed direction. Knowing how big the bias is — even approximately — is essential before reporting it. For a few canonical estimators, bias has a closed-form expression. The plug-in sample variance has bias :math:`-\sigma^2 / n`. The sample correlation has a bias of order :math:`1/n` whose leading term is known. Beyond a small set of textbook cases, no formula is available. The bootstrap fills the gap by estimating bias directly from the data — but at first glance this looks impossible. Bias is a statement about how the estimator behaves under the *unknown* population distribution :math:`F`. The data we have is one sample from :math:`F`. How can we possibly say what the average behavior of :math:`\hat{\theta}` is across the samples we *didn't* draw? The bootstrap principle ----------------------- Bias is defined as .. math:: \mathrm{bias}_F(\hat{\theta}) = \mathrm{E}_F[\hat{\theta}] - \theta(F), where :math:`\hat{\theta} = s(X_1, \ldots, X_n)` is the estimator, :math:`\theta(F)` is the parameter of interest as a functional of the true distribution :math:`F`, and the expectation is over hypothetical repeated samples of size :math:`n` drawn from :math:`F`. We do not know :math:`F`, so neither term is directly computable. The bootstrap principle is the substitution :math:`F \to \hat{F}`, where :math:`\hat{F}` is the empirical distribution of the observed sample. Both terms now become tractable: .. math:: \mathrm{bias}_{\hat{F}}(\hat{\theta}) = \mathrm{E}_{\hat{F}}[\hat{\theta}^*] - \theta(\hat{F}). The expectation :math:`\mathrm{E}_{\hat{F}}[\hat{\theta}^*]` is the average of the estimator over samples of size :math:`n` drawn *with replacement* from the observed data — a quantity we can approximate to arbitrary precision by Monte Carlo: draw :math:`B` bootstrap samples, compute :math:`\hat{\theta}^*_b` on each, and average. The plug-in value :math:`\theta(\hat{F})` is whatever the parameter functional :math:`\theta(\cdot)` evaluates to when fed the empirical distribution. For most estimators the two coincide: :math:`\theta(\hat{F}) = \hat{\theta}`. (For some, like a trimmed mean used to estimate the population mean, they don't, which is why ``bias`` accepts both ``stat`` and ``t`` arguments — see [ET93]_ S10.2.) What does this estimate? It is the bias of :math:`\hat{\theta}` *if the empirical distribution were the truth*. That is well-defined and fully computable, but it is not the quantity we ultimately want, which is :math:`\mathrm{bias}_F(\hat{\theta})`. The two will generally differ — which raises the question of why the substitution is useful at all. The answer rests on two observations. First, :math:`\hat{F}` converges to :math:`F` as :math:`n \to \infty`, so for parameters that depend continuously on the distribution, bias under :math:`\hat{F}` converges to bias under :math:`F`. The bootstrap bias estimate is *consistent*. Second, and more importantly for finite :math:`n`: the bias of a smooth estimator typically depends on :math:`F` through a few moments — variance, third cumulant — that :math:`\hat{F}` already captures well even for modest sample sizes. Concretely, if the leading bias term is :math:`c(F)/n` for some smooth functional :math:`c`, then the bootstrap reports :math:`c(\hat{F})/n`, which differs from the truth only at order :math:`1/n^{3/2}`. The bootstrap bias estimate thus inherits the estimator's own smoothness in :math:`F`. A sanity check: the sample variance ----------------------------------- The plug-in sample variance, .. math:: \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \bar{x})^2, is the textbook example of a biased estimator. Its expectation is :math:`(n - 1)\sigma^2 / n`, so :math:`\mathrm{bias}_F(\hat{\sigma}^2) = -\sigma^2 / n`. On the mouse-control survival data (:math:`n = 9`, :math:`\hat{\sigma}^2 = 1603.7`), the analytical bias estimate is .. math:: -\hat{\sigma}^2 / n = -178.2. The bootstrap should recover this without invoking the formula: .. code-block:: python import numpy as np import bootstrap_stat as bp from bootstrap_stat.datasets import mouse_data data = np.array(mouse_data("control")) def variance_p(x, p): m = np.dot(p, x) return np.dot(p, (x - m) ** 2) bias_hat = bp.better_bootstrap_bias(data, variance_p, B=10000) # -177.0 (analytical: -178.2) The agreement to three significant figures is not a numerical accident: for the plug-in variance, one can show by direct algebra that :math:`\mathrm{E}_{\hat{F}}[\hat{\sigma}^{2*}] = (n-1) \hat{\sigma}^2 / n`, so the bootstrap bias estimate is exactly :math:`-\hat{\sigma}^2 / n` in expectation, with Monte Carlo error shrinking like :math:`1/\sqrt{B}`. The bootstrap recovers the classical result by the same mechanism that produces it analytically — except that it does so for any estimator, smooth or not, with no algebra required. The example ----------- For a problem with no closed-form bias, we use the patch data from Table 10.1 of [ET93]_. Eight subjects each wore three medical patches designed to deliver a hormone: a placebo, an "old" patch from an established manufacturing plant, and a "new" patch from a newly opened plant. The FDA criterion for declaring the two plants bioequivalent is that the average effect of switching from the old plant to the new is at most 20% of the average effect of switching from placebo to the old plant. With .. math:: y_i = \text{newpatch}_i - \text{oldpatch}_i, \qquad z_i = \text{oldpatch}_i - \text{placebo}_i, bioequivalence requires :math:`|\theta| \le 0.20`, where :math:`\theta = \mathrm{E}[Y] / \mathrm{E}[Z]`. The natural plug-in estimator is .. math:: \hat{\theta} = \bar{y} / \bar{z}. This is a ratio of means: even though :math:`\bar{y}` and :math:`\bar{z}` are individually unbiased estimators of :math:`\mathrm{E}[Y]` and :math:`\mathrm{E}[Z]`, the ratio is biased at order :math:`1/n` because the denominator's variability contributes asymmetrically. With :math:`n = 8` we are well into the regime where this matters. Setting up ---------- .. code-block:: python import numpy as np import bootstrap_stat as bp from bootstrap_stat.datasets import patch_data data = patch_data() # n = 8 def ratio(d): return d["y"].mean() / d["z"].mean() def ratio_p(d, p): return np.dot(p, d["y"].values) / np.dot(p, d["z"].values) dist = bp.EmpiricalDistribution(data) theta_hat = ratio(data) # -0.0713 The point estimate is :math:`\hat{\theta} = -0.071`, well within the bioequivalence threshold of 0.20. The remaining question is whether that point estimate is biased enough to matter. Bootstrap bias -------------- :func:`~bootstrap_stat.bias.bias` implements the direct estimate .. math:: \widehat{\mathrm{bias}} = \frac{1}{B} \sum_{b=1}^{B} \hat{\theta}^*_b - \theta(\hat{F}). It takes both ``stat`` (the estimator, applied to a sample) and ``t`` (the parameter functional, applied to a distribution). For a plug-in statistic the two are the same function: .. code-block:: python bias_hat = bp.bias(dist, ratio, ratio, B=2000) # 0.00946 ~0.5 s The estimated bias is :math:`+0.009`, about 13% of the magnitude of :math:`\hat{\theta} = -0.071`. The sign is informative: the ratio estimate is biased toward zero, so the true bioequivalence ratio is likely *more* negative than :math:`\hat{\theta}` suggests. Bias is a noisy quantity in the bootstrap. The mean of :math:`\hat{\theta}^*` has Monte Carlo standard error roughly :math:`\widehat{\mathrm{se}}(\hat{\theta}) / \sqrt{B}`, and we are subtracting a non-noisy quantity from it. To resolve a bias of :math:`+0.009` against an SE of :math:`0.106`, we need :math:`B = 2000` to reduce the Monte Carlo SE to :math:`\sim 0.002`. This is far more bootstrap samples than are needed for a standard error. Better bootstrap bias --------------------- :func:`~bootstrap_stat.bias.better_bootstrap_bias` is a variance-reduction technique for plug-in statistics that converges much faster than the direct estimator. It exploits the identity .. math:: \mathrm{E}_{\hat{F}}[\hat{\theta}^*] - \theta(\hat{F}) = \mathrm{E}_{\hat{F}}\left[ s(\mathbf{x}, \mathbf{p}^*) - s(\mathbf{x}, \bar{\mathbf{p}}^*) \right], where :math:`\mathbf{p}^*_b` is the resampling-frequency vector for the :math:`b`\ th bootstrap sample and :math:`\bar{\mathbf{p}}^* = B^{-1} \sum_b \mathbf{p}^*_b`. Both terms are evaluated at the *same* realized resampling vectors, removing the dominant source of Monte Carlo variance. The catch is the resampling-form requirement: the statistic must take ``(x, p)`` rather than just ``x``. .. code-block:: python bias_hat = bp.better_bootstrap_bias(data, ratio_p, B=400) # 0.00953 ~0.03 s With :math:`B = 400` the better estimator is already as accurate as the direct estimator at :math:`B = 2000`, an order-of-magnitude speedup at equal precision. Jackknife bias -------------- :func:`~bootstrap_stat.bias.jackknife_bias` estimates the bias from the :math:`n` leave-one-out values: .. math:: \widehat{\mathrm{bias}}_{\mathrm{jack}} = (n - 1) \big( \overline{\hat{\theta}_{(\cdot)}} - \hat{\theta} \big), where :math:`\hat{\theta}_{(i)}` is the statistic computed with observation :math:`i` removed and :math:`\overline{\hat{\theta}_{(\cdot)}}` is their mean. .. code-block:: python bias_hat = bp.jackknife_bias(data, ratio) # 0.00800 < 0.01 s The jackknife is fast and deterministic — :math:`n` evaluations total — and reports bias :math:`+0.008` on the patch data, agreeing with the bootstrap to within Monte Carlo noise. As with the standard error, the catch is smoothness: the jackknife is consistent only for plug-in statistics whose parameter functional is smooth at :math:`\hat{F}`. It is *not* applicable to the median or other quantile-based statistics. See [ET93]_ (S10.5). Bias correction --------------- Once the bias is estimated, the natural move is to subtract it: .. math:: \tilde{\theta} = \hat{\theta} - \widehat{\mathrm{bias}}. :func:`~bootstrap_stat.bias.bias_corrected` computes this directly, defaulting to ``better_bootstrap_bias`` for the underlying estimate: .. code-block:: python theta_tilde = bp.bias_corrected(data, ratio_p, B=400) # -0.0808 ~0.03 s The corrected estimate :math:`\tilde{\theta} = -0.081` is slightly more negative than the naive :math:`\hat{\theta} = -0.071` — consistent with the bias being toward zero — and remains comfortably within the 0.20 bioequivalence threshold. **Bias correction has a cost.** The corrected estimator :math:`\tilde{\theta}` is the difference of two random quantities, :math:`\hat{\theta}` and :math:`\widehat{\mathrm{bias}}`, and its variance is therefore the sum of their variances minus twice their covariance. In practice this means :math:`\widehat{\mathrm{se}}(\tilde{\theta})` is typically larger than :math:`\widehat{\mathrm{se}}(\hat{\theta})` — sometimes substantially. For statistics where the bias is small relative to the SE, the increase in variance can outweigh the reduction in bias, producing a corrected estimator with worse mean-squared error than the original. The standard rule of thumb from [ET93]_ (S10.6) is: correct only if :math:`|\widehat{\mathrm{bias}}| / \widehat{\mathrm{se}} > 0.25`. For the patch ratio, that ratio is :math:`0.009 / 0.106 \approx 0.09`, well below the threshold; the right choice is to report :math:`\hat{\theta}` and leave it uncorrected, while noting that the bias is small and toward zero. Comparison ---------- All bias estimates are for the ratio :math:`\hat{\theta} = \bar{y} / \bar{z}` on the patch data (:math:`n = 8`, :math:`\hat{\theta} = -0.0713`): .. list-table:: :header-rows: 1 :widths: 32 12 12 16 * - Method - Bias - Corrected :math:`\tilde{\theta}` - Wall time * - Bootstrap (:math:`B = 2000`) - +0.0095 - −0.081 - 0.5 s * - Better bootstrap (:math:`B = 400`) - +0.0095 - −0.081 - 0.03 s * - Better bootstrap (:math:`B = 2000`) - +0.0083 - −0.080 - 0.13 s * - Jackknife - +0.0080 - −0.079 - < 0.01 s For most applications, :func:`~bootstrap_stat.bias.better_bootstrap_bias` is the right default for plug-in statistics: it converges roughly an order of magnitude faster than the direct estimator and is similarly easy to use. :func:`~bootstrap_stat.bias.jackknife_bias` is competitive at :math:`O(n)` evaluations for smooth statistics on small samples, but should not be applied to non-smooth functionals. :func:`~bootstrap_stat.bias.bias` is the general-purpose fallback when the statistic is not a plug-in (e.g., a robust estimator used for a non-robust parameter), at the cost of needing :math:`B \approx 2000` for stable answers. Whichever method is used, the decision to *correct* should follow the bias-to-SE rule: correction is worth its variance penalty only when the bias is a substantial fraction of the standard error. See [ET93]_ (S10) for a full treatment.