Prediction Intervals
====================

A confidence interval and a prediction interval answer related but
fundamentally different questions. A confidence interval covers a
*parameter* — a fixed but unknown number, like the population mean
:math:`\mu`. A prediction interval covers a future *observation* —
a random quantity drawn from the same distribution as the data, like
next year's measurement. The targets of the two intervals are
qualitatively different, and that difference shows up in the width:
a prediction interval is always wider than a confidence interval at
the same nominal level, and the gap grows with sample size rather
than shrinking.

The reason is decomposable. Predicting a new observation
:math:`X_{n+1}` introduces two independent sources of uncertainty.
The first is estimation uncertainty about the population
:math:`F` itself — the same uncertainty a confidence interval
captures. The second is the *inherent variability* of
:math:`X_{n+1}`: even if we knew :math:`F` exactly, a new draw
would still vary. As :math:`n \to \infty` the first source vanishes,
but the second does not. The confidence interval for :math:`\mu`
shrinks to a point in the limit; the prediction interval converges
to a fixed-width band determined by :math:`F`'s spread. The two
intervals are not the same thing, and a prediction interval is
never a "more conservative confidence interval".

We illustrate this on the rainfall data from Table 4.2 of [ET93]_:
:math:`n = 106` years of yearly rainfall measurements, in inches,
recorded in Nevada City, California from 1873 through 1978. The
sample is large enough that the confidence interval for the
population mean is narrow, and the contrast with the prediction
interval is dramatic.

Setting up
----------

The data are univariate — one rainfall measurement per year — and
we treat the years as exchangeable. (Real rainfall has temporal
structure that this assumption ignores; the example is for
illustration of the bootstrap, not for serious climate prediction.)

.. code-block:: python

    import numpy as np
    import bootstrap_stat as bp
    from bootstrap_stat.datasets import rainfall_data

    df = rainfall_data()
    x = df["rainfall"].values        # n = 106
    x_bar = x.mean()                  # 52.66 inches
    s     = x.std(ddof=1)             # 15.43 inches

    dist = bp.EmpiricalDistribution(x)

The mean annual rainfall over the observed century is
:math:`\bar{x} = 52.7` inches with sample standard deviation
:math:`s = 15.4`. Two natural questions follow:

- What is the population mean :math:`\mu = \mathrm{E}[X]`? — call
  for a confidence interval.
- What might next year's rainfall :math:`X_{n+1}` be? — call for a
  prediction interval.

The two are computed from the same data but answer different
questions, and we will see they differ in width by an order of
magnitude.

Why prediction intervals are wider
----------------------------------

Under the textbook normal model :math:`X \sim \mathcal{N}(\mu,
\sigma^2)`, both intervals have closed-form expressions. The 90%
confidence interval for :math:`\mu` is

.. math::

   \bar{x} \pm t_{n-1,\,1-\alpha} \cdot \frac{s}{\sqrt{n}},

with width :math:`\propto 1 / \sqrt{n}`. The 90% prediction interval
for :math:`X_{n+1}` is

.. math::

   \bar{x} \pm t_{n-1,\,1-\alpha} \cdot s \sqrt{1 + \tfrac{1}{n}},

with width :math:`\propto \sqrt{1 + 1/n}`. The ratio of widths is

.. math::

   \frac{\text{PI width}}{\text{CI width}} = \sqrt{n + 1}
   \;\approx\; \sqrt{n}.

For :math:`n = 106` that ratio is 10.3 — and the actual numbers
bear it out:

.. code-block:: python

    from scipy import stats

    n = len(x)
    t_crit = stats.t.ppf(1 - 0.05, df=n - 1)

    ci_lo = x_bar - t_crit * s / np.sqrt(n)
    ci_hi = x_bar + t_crit * s / np.sqrt(n)
    # CI for mu:        (50.17, 55.15)   width 4.97 inches

    pi_lo = x_bar - t_crit * s * np.sqrt(1 + 1/n)
    pi_hi = x_bar + t_crit * s * np.sqrt(1 + 1/n)
    # PI for X_{n+1}:   (26.94, 78.38)   width 51.44 inches

The confidence interval for the population mean is a narrow
five-inch band. The prediction interval for next year is a fifty-inch
band — ten times wider, just as :math:`\sqrt{n + 1}` predicts.

The decomposition makes this concrete. Under the normal model,
:math:`\mathrm{Var}(\bar{x} - X_{n+1}) = \sigma^2 / n + \sigma^2 =
\sigma^2 (1 + 1/n)`, exactly the two terms in the formula. The
:math:`1/n` term is the estimation variance — uncertainty about
:math:`\mu`. The :math:`1` is the inherent variance of a single
draw — uncertainty about :math:`X_{n+1}` itself. Doubling the
sample size halves the first; it leaves the second untouched. As
:math:`n` grows the prediction interval converges to
:math:`\mu \pm z_{1-\alpha} \sigma`, a band whose width is fixed by
:math:`F`'s spread regardless of how much data we collect.

This asymptotic behavior is the cleanest way to see that the two
intervals answer different questions. A confidence interval gets
better with data; a prediction interval gets better only up to the
floor set by the data-generating distribution.

Classical prediction interval
-----------------------------

For reference and comparison, the formula above gives:

.. code-block:: python

    pi_lo, pi_hi = (26.94, 78.38)   # 90% classical PI; < 0.001 s

The classical PI is exact under bivariate normality of the data and
remains a reasonable approximation when the distribution is not too
skewed. For rainfall data the empirical distribution has skewness
:math:`\approx 0.38` and a Shapiro-Wilk test does not reject
normality (:math:`p = 0.16`), so the classical interval is a
defensible baseline here. For data that are clearly non-normal —
heavy-tailed, bounded, or with mass at zero — the classical formula
will mis-state the asymmetry of the interval.

Bootstrap prediction interval
-----------------------------

:func:`~bootstrap_stat.prediction.prediction_interval` constructs a
distribution-free PI using a studentized bootstrap. For each of
:math:`B` replicates it draws a bootstrap sample
:math:`x^* = (X_1^*, \ldots, X_n^*)` and one extra observation
:math:`z^*` from the empirical distribution, then forms the
studentized "prediction error"

.. math::

   t^* = \frac{\bar{x}^* - z^*}{s^*}.

The :math:`\alpha` and :math:`1 - \alpha` quantiles of :math:`t^*`
are inverted to give

.. math::

   \big( \bar{x} - t^*_{1-\alpha} \cdot s,\;
         \bar{x} - t^*_{\alpha}   \cdot s \big).

The construction mirrors the classical *t*-distribution argument
exactly, but reads the quantiles off the empirical bootstrap
distribution rather than the *t*\ :sub:`n-1` table. No normality
assumption is required; the only assumption is that the studentized
prediction error is approximately pivotal — that its distribution
is roughly the same under :math:`\hat{F}` as under :math:`F`.

.. code-block:: python

    pi_lo, pi_hi = bp.prediction_interval(dist, x, B=2000, alpha=0.05)
    # (30.29, 79.22)   width 48.93 inches   ~0.1 s

The bootstrap interval is shifted upward relative to the classical
one — its lower bound is 30.3 against the classical 26.9, while
the upper bounds are nearly identical. This shift reflects the
empirical distribution's right skewness: rainfall is bounded below
by zero and has a longer right tail, so the lower extreme of a
plausible new draw is less extreme than a symmetric normal model
would predict. The bootstrap captures that asymmetry; the classical
formula imposes symmetry around :math:`\bar{x}` whether the data
support it or not.

For data closer to normal the bootstrap and classical intervals
agree to within Monte Carlo noise. The bootstrap doesn't invent
differences when the assumption holds — it just doesn't impose the
assumption when it doesn't. That is the practical case for using it
even when normality looks plausible: nothing is lost, and asymmetry
is captured automatically when present.

The bootstrap PI is computationally cheap. With :math:`B = 2000`
replicates the wall time is well under a second on a single thread,
because each replicate is a single mean and standard deviation. For
larger samples or more demanding statistics multithreading is
available via the ``num_threads`` argument.

A note on prediction error
--------------------------

The same module also exposes
:func:`~bootstrap_stat.prediction.prediction_error_optimism` and
:func:`~bootstrap_stat.prediction.prediction_error_632`, which
solve a related but distinct problem. Where
:func:`~bootstrap_stat.prediction.prediction_interval` quantifies a
*range* for a new observation drawn from a univariate distribution,
the prediction-error functions estimate a *scalar* error rate of a
fitted model on new observations — the quantity used for model
selection. Conceptually:

- **Prediction interval**: how variable is :math:`X_{n+1}`? (range)
- **Prediction error**: how accurate is :math:`\hat{f}(X_{n+1})`?
  (scalar)

The two share the bootstrap idea — resample to estimate behavior on
data the model has not seen — but the quantities returned and the
typical use cases are different. The .632 bootstrap was historically
popular for model selection but has not held up as well as
cross-validation [Koh95]_ [Arl10]_; the implementation in this
library is best treated as illustrative. For prediction *intervals*,
:func:`~bootstrap_stat.prediction.prediction_interval` remains the
straightforward and effective tool.

Comparison
----------

All intervals below are nominal 90% (:math:`\alpha = 0.05`) on the
rainfall data (:math:`n = 106`, :math:`\bar{x} = 52.66`,
:math:`s = 15.43`):

.. list-table::
   :header-rows: 1
   :widths: 32 14 14 12 16

   * - Method
     - Lower
     - Upper
     - Width
     - Wall time
   * - **CI for** :math:`\mu` (classical)
     - 50.17
     - 55.15
     - 4.97
     - < 0.001 s
   * - **CI for** :math:`\mu` (BCa, B=2000)
     - 50.22
     - 55.05
     - 4.83
     - 0.2 s
   * - **PI for** :math:`X_{n+1}` (classical)
     - 26.94
     - 78.38
     - 51.44
     - < 0.001 s
   * - **PI for** :math:`X_{n+1}` (bootstrap, B=2000)
     - 30.29
     - 79.22
     - 48.93
     - 0.1 s

The two confidence intervals (classical and BCa) cover the
population mean within a five-inch band — narrow, because the
sample mean is precisely estimated from :math:`n = 106`
observations. The two prediction intervals cover next year's
rainfall within a band ten times wider, because next year's
rainfall is itself variable. Closing that gap is not a question of
gathering more data; the inherent variability of yearly rainfall is
the band's irreducible floor.

For applied work,
:func:`~bootstrap_stat.prediction.prediction_interval` is the
right tool for prediction intervals when distributional assumptions
are doubtful or the sample is small. It costs slightly more than
the classical formula, makes no parametric assumption, and
correctly captures asymmetry in the data-generating distribution
when it is present. See [ET93]_ (S6.5) for the underlying theory.