API Reference#
gamdist: Generalized Additive Models fit via ADMM.
The package fits the GLM/GAM zoo – binary, continuous, or count outcomes paired with continuous, categorical, or spline-transformed features, with arbitrary convex regularization on each term – by splitting the joint convex problem into per-feature primal steps and a per-outcome proximal step coordinated by ADMM dual variables. The modular decomposition follows Chu, Keshavarz, & Boyd’s A Distributed Algorithm for Fitting Generalized Additive Models.
Public API:
GAM– single-response generalized additive model.MultiTaskGAM– joint fit of K tasks with optional cross-task coupling regularizers.SplineError– raised when spline knot selection or basis evaluation fails.fit_adaptive_lasso()– two-stage adaptive-lasso wrapper aroundGAM.fit().
GAM#
Single-response GAM and the ADMM orchestration loop.
This module contains GAM – the user-facing model – together
with the family / link tables and the ADMM driver that ties per-feature
primal steps to the per-outcome proximal step.
The public entry points are:
GAM– configure with a family and link, register features withGAM.add_feature(), fit withGAM.fit(), then callGAM.predict(),GAM.deviance(),GAM.aic(),GAM.summary(), etc.fit_adaptive_lasso()– two-stage adaptive-lasso wrapper that refits with reweighted L1 penalties.
The algorithm follows Chu, Keshavarz, & Boyd’s distributed-fitting
paper (citation key GAMADMM in GAM’s docstring); the
convexity-only design rule is captured in CLAUDE.md.
- class gamdist.gamdist.GAM(family: Literal['normal', 'binomial', 'poisson', 'gamma', 'exponential', 'inverse_gaussian', 'quasi_binomial', 'quasi_poisson', 'quantile', 'huber'] | None = None, link: Literal['identity', 'logistic', 'probit', 'complementary_log_log', 'log', 'reciprocal', 'reciprocal_squared'] | None = None, dispersion: float | None = None, estimate_overdispersion: bool = False, name: str | None = None, load_from_file: str | None = None, tau: float | None = None, delta: float | None = None)[source]#
Generalized Additive Model fit via ADMM.
Configure with a family and link, register features with
add_feature(), fit withfit(), then callpredict(),deviance(),aic(),summary(), etc.- Parameters:
family (str, optional) –
Family of the model. One of:
"normal"– continuous responses."binomial"– binary responses (or counts withcovariate_class_sizesat fit time)."poisson"– counts."gamma"– positive continuous responses (in progress)."exponential"– gamma with dispersion fixed at 1."inverse_gaussian"– positive continuous responses with cubic mean–variance relationship (in progress)."quasi_binomial"– binomial score; dispersion estimated from the data via Pearson \(\chi^2 / (n - p)\) sodispersion()reflects over- or under-dispersion relative to the exact-binomial baseline of 1."quasi_poisson"– Poisson score; same Pearson dispersion estimator, useful when count data hasvariance > mean."quantile"– pinball loss; requirestau."huber"– robust M-estimator; requiresdelta.
Multinomial (ordinal / nominal) and proportional-hazards families are not yet supported. Required unless loading an existing model from file (see
load_from_file). Point estimates for"quasi_binomial"/"quasi_poisson"coincide with their full-likelihood cousins; only the dispersion estimator (and any inferential quantity that depends on it) differs.link (str, optional) –
Link function. Supported links and their canonical families:
Link
Canonical for family
identitynormallogisticbinomiallogpoissonreciprocalgammareciprocal_squaredinverse_gaussianprobitandcomplementary_log_logare also accepted (non-canonical forbinomial). If not specified, the canonical link is used. Per the convexity-only design rule (CLAUDE.md), only(family, link)pairs with a dedicated convex proximal-operator implementation are accepted.dispersion (float, optional) – Dispersion parameter. Some families (
binomial,poisson) have a fixed dispersion independent of the data. For other families the dispersion is typically estimated from the data; passing a known value here reduces uncertainty in the model.estimate_overdispersion (bool, optional) – Whether to estimate over-dispersion for binomial and poisson families. Only meaningful when covariate classes are present and have at least modest size. See [GLM], S4.5. Defaults to
False.name (str, optional) – Name for the model. Used in plots and in deriving filenames when
save_flag=Trueis passed tofit().load_from_file (str, optional) – Pickle path produced by a previous
save_flag=Truefit. When set, every other parameter is ignored.tau (float, optional) – Quantile level in \((0, 1)\) for
family="quantile"(pinball loss).tau=0.5recovers the conditional median. Required whenfamily="quantile"and ignored otherwise.delta (float, optional) – Knee parameter for
family="huber". Residuals with \(|y - \mu| \le \delta\) are penalized as \(0.5 r^2\) (least squares); larger residuals are penalized linearly, capping their per-observation influence. Must be positive and is in the units ofy. Required whenfamily="huber"and ignored otherwise.
References
[glmnet]glmnet (R package). The standard package for GAMs in R. https://cran.r-project.org/web/packages/glmnet/index.html
[pygam]pygam (Python package). Implements GAMs in Python without using ADMM. dswah/pyGAM
[GAM]Hastie, T. and Tibshirani, R. Generalized Additive Models. The book by the folks who invented GAMs.
[ESL]Hastie, T., Tibshirani, R., and Friedman, J. The Elements of Statistical Learning. Covers a lot more than just GAMs.
[GAMr] (1,2,3,4)Wood, S. N. Generalized Additive Models: An Introduction with R. Covers more implementation details than [GAM].
[ADMM]Boyd, S., Parikh, N., Chu, E., Peleato, B., and Eckstein, J. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.
[GAMADMM]Chu, E., Keshavarz, A., and Boyd, S. A Distributed Algorithm for Fitting Generalized Additive Models. Forms the basis of this package’s approach.
- add_feature(name: str, type: Literal['categorical', 'linear', 'spline'], transform: Callable[[ndarray[tuple[Any, ...], dtype[Any]]], ndarray[tuple[Any, ...], dtype[Any]]] | None = None, rel_dof: float | None = None, regularization: dict[str, Any] | None = None, constraints: dict[str, Any] | None = None) None[source]#
Add a feature to the model.
An implicit constant feature is always included, representing the overall average response.
- Parameters:
name (str) – Name for the feature. Used internally to keep track of features and also used when saving files and in plots.
type (str) –
Type of feature. One of:
"categorical"– for categorical variables."linear"– for variables with a linear contribution to the response."spline"– for variables with a potentially nonlinear contribution to the response.
transform (callable, optional) – Optional transform applied to the feature data. Any callable may be used; it is applied to data provided during fitting and prediction. Common choices include
numpy.log(),numpy.log1p(), ornumpy.sqrt(). The user may wish to start with a base feature"age"and use derived features"age_linear"and"age_quadratic"to permit quadratic models for that feature, with potentially different regularization applied to each.rel_dof (float, optional) –
Relative degrees of freedom. Applicable only to spline features. The degrees of freedom associated with a spline represent how “wiggly” it is allowed to be: a spline with two degrees of freedom is just a line. (Since features are constrained to have zero mean response over the data, linear features only have one degree of freedom.) The relative degrees of freedom set the baseline smoothing parameter \(\lambda\) associated with the feature. When
fit()is called, the user may specify an overallsmoothingparameter applied to all features to alter the amount of regularization in the entire model; the actual degrees of freedom will vary based on the amount of smoothing. By default, splines have 4 relative degrees of freedom.Regularization of any feature effectively reduces the degrees of freedom; that adjustment is not yet wired in.
regularization (dict, optional) – Dictionary specifying the regularization applied to this feature. Different types of features support different types of regularization. Splines always include a \(C^2\) smoothness penalty controlled via
rel_dof;regularization={"group_lasso": {"coef": lam}}additionally shrinks the entire spline contribution and can zero it out.group_lasso_infis the \(L_\infty\)-norm variant (\(\lambda \|f_j\|_\infty\)); it produces a clipping rather than a uniform contraction and is also available on linear and categorical features.huberis a bounded-influence ridge analogue:regularization={"huber": {"coef": lam, "delta": d}}adds \(\lambda h_\delta(\mathrm{coef})\) per parameter, with \(h_\delta\) quadratic for small magnitudes and linear beyond \(\delta\). Available on linear and categorical features. Other features have more diverse options; see each feature class’s docstring.constraints (dict, optional) – Optional convex shape constraints on the feature’s coefficients.
sign("nonnegative"/"nonpositive"),lower, andupper(floats) bound coefficients.monotonic("increasing"/"decreasing"),convex, andconcaveimpose ordering / second-difference constraints (categorical features additionally require anorderlist of category labels; splines order along the knots). Linear features support onlysign/lower/upper. See each feature class’s docstring for details.
- fit(X: DataFrame, y: ndarray[tuple[Any, ...], dtype[Any]], covariate_class_sizes: ndarray[tuple[Any, ...], dtype[Any]] | None = None, weights: ndarray[tuple[Any, ...], dtype[Any]] | None = None, optimizer: str = 'admm', smoothing: float = 1.0, save_flag: bool = False, verbose: bool = False, plot_convergence: bool = False, max_its: int = 100, n_jobs: int = 1) None[source]#
Fit the model to data.
- Parameters:
X (pandas.DataFrame) – Dataframe of features. The column names must correspond to the names of features added to the model.
Xmay have extra columns corresponding to features not included in the model; these are silently ignored. Where applicable, the data should be “pre-transformation”: any transformations specified inadd_feature()are applied here.y (array-like) – Response. Depending on the model family, the response may need to be in a particular form (for example, for a binomial family,
yshould contain 0s and 1s); this is not checked.covariate_class_sizes (array-like, optional) – If observations are grouped into covariate classes, the size of those classes should be listed here.
weights (array-like, optional) – Per-observation weights. Effectively specifies the dispersion of each observation.
optimizer (str) – Optimization method. Currently only
"admm"is supported, so this argument has no effect.smoothing (float) – Multiplicative scale applied to every feature’s regularization coefficient. Lets the user set the overall smoothing by cross-validation while keeping the relative regularization across features fixed. Defaults to
1.0, which leaves the regularization as specified inadd_feature().save_flag (bool) – Whether to save intermediate results after each iteration. Useful for complicated models with massive data sets that take a while to fit; if the system crashes during the fit the analyst can resume from the last checkpoint. Defaults to
False.verbose (bool) – Print mildly useful information during the fit. Defaults to
False.plot_convergence (bool) – Plot the convergence graph at the end of the fit. Defaults to
False.max_its (int) – Maximum number of ADMM iterations. Defaults to 100.
n_jobs (int) – Number of threads to use for the per-feature primal step within each ADMM iteration. Defaults to 1 (serial); pass
-1to useos.cpu_count(). NumPy / SciPy / cvxpy release the GIL during their numeric kernels, so threading produces real speedup. Expect a 2-4x ceiling on models with several non-trivial features (splines, categoricals via cvxpy); pure linear-only models are usually faster serial because the per-feature work is too cheap to amortize thread-dispatch overhead.
Notes
Many binomial data sets include multiple observations with identical features. For example, a data set with features
genderandcountryand a binary survival response might be presented in the compact formgender
country
patients
survivors
M
USA
50
48
F
USA
70
65
M
CAN
40
38
F
CAN
45
43
This is still a binomial family, just more compact. The compact format is not yet supported; in this context it is important to check for over-dispersion (see [GLM]). The current implementation assumes no over-dispersion and that the number of observations sharing a feature pattern is small.
- predict(X: DataFrame) ndarray[tuple[Any, ...], dtype[float64]][source]#
Apply the fitted model to features.
- Parameters:
X (pandas.DataFrame) – Data for which to predict the response. The column names must correspond to the names of the features used to fit the model.
Xmay have extra columns corresponding to features not in the model; these are silently ignored. Where applicable, the data should be “pre-transformation”: transformations specified at model definition time are applied here.- Returns:
mu – Predicted mean response for each row of
X.- Return type:
ndarray
- robust_covariance(cov_type: str = 'HC0') ndarray[tuple[Any, ...], dtype[float64]][source]#
Sandwich (Huber-White) covariance estimator.
Returns the heteroscedasticity-consistent covariance for the parameter vector defined by
_design_matrix:V = (X’ W X)^{-1} (X’ diag(s_i^2) X) (X’ W X)^{-1}
where
W_ii = (dmu/deta)_i^2 / V(mu_i)is the IRLS weight ands_i = (y_i - mu_i) (dmu/deta)_i / V(mu_i)is the per-observation score on theetascale.muis the fitted mean from this GAM. The dispersion drops out, so the estimator is identical for known and estimated dispersion.For canonical-link GLMs without regularization, the result matches
statsmodels.GLM(...).fit(cov_type="HC0").cov_params()on the same parametrization to numerical precision. Penalized fits return the naive sandwich at the penalized point estimate; for proper penalized inference a bread that includes the penalty Hessian is needed (future work).- Parameters:
cov_type (
"HC0"or"HC1") –"HC0"is the standard White estimator."HC1"multiplies byn / (n - p)for a small-sample correction (matches Stata / statsmodels HC1).- Returns:
V – Covariance for the parameter vector
(intercept, linear..., categorical contrasts...), in the order produced by_design_matrix().- Return type:
(p, p) ndarray
- Raises:
AttributeError – If the model has not been fit.
NotImplementedError – For spline features, covariate classes, observation weights, the quantile or huber families, or unsupported
cov_typevalues.
- confidence_intervals(X: DataFrame, prediction: bool = False, width: float = 0.95) ndarray[tuple[Any, ...], dtype[float64]][source]#
Confidence intervals on predictions.
Note
Not yet implemented; raises
NotImplementedError.Two notions of confidence interval are useful here. The first is a confidence interval on \(\mu\), the mean response; this captures the uncertainty in the fitted model. The second is a confidence interval on observations of this model. For a Gaussian family the model might be a perfect fit with billions of observations – so \(\mu\) is known precisely and the mean-response interval is tiny – but observations are still spread around the mean, so the prediction interval is wider. For a binomial family the estimated mean lies in \((0, 1)\) and admits a confidence interval, but the observed response is always 0 or 1 so a “prediction interval” is only meaningful in a pedantic sense.
When making multiple predictions a “global” set of intervals (all predictions inside their intervals with the specified joint probability) is sometimes desirable. This function does not compute global intervals; each interval is computed in vacuo.
- Parameters:
X (pandas.DataFrame) – Data for which to predict the response. The column names must correspond to the names of the features used to fit the model.
Xmay have extra columns corresponding to features not in the model; these are silently ignored. Where applicable, the data should be “pre-transformation”.prediction (bool) – If
True, return a confidence interval on the predicted response; ifFalse, on the mean response. Defaults toFalse.width (float) – Desired confidence width in \((0, 1)\). Defaults to
0.95.
- Returns:
bounds – Lower and upper bounds on the confidence interval associated with each prediction.
- Return type:
ndarray of shape
(n, 2)
- plot(name: str, true_fn: Callable[[ndarray[tuple[Any, ...], dtype[float64]]], ndarray[tuple[Any, ...], dtype[float64]]] | None = None) None[source]#
Plot the model component for a particular feature.
- Parameters:
name (str) – Name of the feature; must be one of the features in the model.
true_fn (callable, optional) – Function representing the “true” relationship between the feature and the response, overlaid on the plot for comparison.
- residuals(kind: Literal['response', 'pearson', 'deviance', 'anscombe'] = 'deviance') ndarray[tuple[Any, ...], dtype[float64]][source]#
Residuals for the fitted model.
- Parameters:
kind (
"response","pearson","deviance", or"anscombe") –"response"returnsy - mu(raw error)."pearson"returns the standardized residual(y - E[y]) / sqrt(Var(y))using the family’s variance function and the estimated dispersion. For binomial with covariate classes,yis the count andE[y] = m * mu."deviance"returnssign(y - E[y]) * sqrt(d_i), whered_i >= 0is the per-observation deviance contribution; squaring and summing gives the total deviance (Wood 2017 Sec 3.1.7)."anscombe"returns the family-specific transformation(A(y) - A(mu)) / (V(mu)^(1/6) * sqrt(phi))withA(t) = integral V(s)^(-1/3) ds, chosen so the residual is approximately standard-normal under correct specification (McCullagh & Nelder 1989 Sec 2.4.1; Wood 2017 Sec 3.1.7).- Returns:
residuals
- Return type:
array of shape (n_obs,)
- plot_residuals(kind: Literal['response', 'pearson', 'deviance', 'anscombe'] = 'deviance') Any[source]#
Plot residuals vs. fitted values and a normal QQ plot.
Produces a 1x2 matplotlib figure: residual-vs-fitted on the left, Q-Q plot against the standard normal on the right.
- Parameters:
kind (
"response","pearson", or"deviance") – Forwarded toresiduals().- Returns:
fig
- Return type:
matplotlib.figure.Figure
- plot_residuals_vs_predictor(predictor: ArrayLike, *, kind: Literal['response', 'pearson', 'deviance', 'anscombe'] = 'deviance', name: str | None = None) Any[source]#
Plot residuals against a predictor.
Useful for diagnosing fit adequacy: a clear pattern (curvature, funnel shape, etc.) when
predictoris a feature already in the model suggests an unmodeled non-linearity or heteroscedasticity. The same plot for a predictor not in the model is a quick check on whether to add it.- Parameters:
predictor (array-like of shape
(n_obs,)) – Predictor values for each training observation. Must match the length of the training data used to fit this model. Categorical (string / object) predictors are plotted as a categorical x-axis.kind (
"response","pearson","deviance", or"anscombe") – Forwarded toresiduals().name (str, optional) – Label for the x-axis and title.
- Returns:
fig
- Return type:
matplotlib.figure.Figure
- deviance(X: DataFrame | None = None, y: ndarray[tuple[Any, ...], dtype[Any]] | None = None, covariate_class_sizes: ndarray[tuple[Any, ...], dtype[Any]] | None = None, w: ndarray[tuple[Any, ...], dtype[Any]] | None = None) float[source]#
Deviance of the fitted model.
With no arguments, returns the training deviance
\[D = 2 \phi \bigl(\ell(y; y) - \ell(\mu; y)\bigr),\]where \(\phi\) is the dispersion (present only to cancel the denominator of the log-likelihood), \(\ell(y; y)\) is the log-likelihood of a model that fits the data perfectly, and \(\ell(\mu; y)\) is the log-likelihood of the fitted model on its training data. This is the quantity ADMM minimizes during fitting.
With
Xandysupplied, returns the deviance on the provided data set. Combined with cross-validation this is the usual way to pick thesmoothingparameter by minimizing deviance on a hold-out set.- Parameters:
X (pandas.DataFrame, optional) – Dataframe of features. The column names must correspond to the names of features added to the model (see
predict()). Only used in the hold-out branch.y (array-like, optional) – Response on the hold-out set. Only used in the hold-out branch.
covariate_class_sizes (array-like, optional) – Array of covariate class sizes for the hold-out data.
w (array-like, optional) – Per-observation weights for the hold-out branch.
- Returns:
D – The deviance of the model.
- Return type:
- dispersion(formula: str = 'deviance') float[source]#
Dispersion of the fitted model.
Returns the dispersion associated with the model. Some families (binomial, poisson) have a fixed dispersion; for others the dispersion is estimated from the data.
Different estimators may be appropriate for different families. The current implementation is the deviance-based estimator of [GAMr], Eqn. 3.10 on p. 110. That section notes the estimator works poorly for overdispersed Poisson data with a small mean response and offers alternatives that have not yet been implemented. This is not currently a concern because overdispersion is not supported – and without overdispersion the Poisson dispersion is exactly 1.
- Parameters:
formula (str) –
Formula for the dispersion. Options:
"deviance"(default)"pearson""fletcher"
- aic() float[source]#
Akaike Information Criterion.
Useful for choosing smoothing parameters. The AIC computed here is off by a constant factor, which simplifies the computation without affecting model-selection rankings.
Different authors throw in multiplicative or additive factors willy-nilly since they do not affect model selection.
- aicc() float[source]#
AIC corrected for finite sample size.
Applies the Hurvich-Tsai small-sample correction to
aic():AICc = AIC + 2 * p * (p + 1) / (n - p - 1)
where
pis the effective number of parameters (dof(), plus one if the dispersion is being estimated) andnis the number of observations. This is Eqn 6.32 on p. 304 of [GAMr]. Asngrows the correction term vanishes and AICc reduces to AIC. AICc is preferable to AIC whennis small relative top.Returns
+inffor an overparameterized fit (n <= p + 1), where the correction term is undefined and the model has no useful AICc.
- bic() float[source]#
Bayesian Information Criterion.
Computed as
BIC = deviance / dispersion + log(n) * p
where
pis the effective parameter count (dof(), plus one when dispersion is estimated, matchingaic()andaicc()) andnis the number of observations. As withaic(), the BIC is off by the same constant factor; only differences are meaningful for model selection.
- null_deviance() float[source]#
Deviance of the intercept-only model on the training data.
The null model predicts the same mean response for every observation:
mean(y)(orsum(y) / sum(covariate_class_sizes)when binomial covariate classes were supplied at fit time).
- r_squared() float[source]#
Deviance-based pseudo-R^2.
Returns
1 - deviance() / null_deviance(). For a normal family with identity link this is the conventional R^2; for other families it is the deviance pseudo-R^2 of McCullagh and Nelder. Returnsnanwhen the null deviance is zero (the response is constant), where pseudo-R^2 is undefined.
- ubre(gamma: float = 1.0) float[source]#
Un-Biased Risk Estimator.
Returns the Un-Biased Risk Estimator as discussed in Sections 6.2.1 and 6.2.5 of [GAMr]. Useful for choosing the smoothing parameter when the dispersion is known.
As discussed in Section 6.2.5 of [GAMr], sometimes it is helpful to force smoother fits by exaggerating the effective degrees of freedom; a value of
gamma > 1may be desirable in that case.
- gcv(gamma: float = 1.0) float[source]#
Generalized Cross Validation score.
Useful for choosing the smoothing parameter when the dispersion is unknown.
As discussed in Section 6.2.5 of [GAMr], sometimes it is helpful to force smoother fits by exaggerating the effective degrees of freedom; a value of
gamma > 1may be desirable in that case.
- summary() None[source]#
Print summary statistics for the fitted model.
Prints statistics for the overall model and for each individual feature (see the
__str__method on each feature class for the per-feature output).The overall model section reports:
phiEstimated dispersion parameter. Omitted when dispersion was supplied at construction time or is fixed by the family (e.g. Poisson).
edofEstimated degrees of freedom.
DevianceTwice the dispersion times the difference between the log-likelihood of the saturated model and that of the fitted model.
AICAkaike Information Criterion.
AICcAIC corrected for finite data sets.
BICBayesian Information Criterion.
R^2Deviance-based pseudo-\(R^2\).
UBREUnbiased Risk Estimator (when dispersion is known).
GCVGeneralized Cross Validation (when dispersion is estimated).
See the corresponding methods (
aic(),aicc(),bic(),r_squared(),ubre(),gcv()) for definitions.
- gamdist.gamdist.fit_adaptive_lasso(gam: GAM, X: DataFrame, y: ndarray[tuple[Any, ...], dtype[Any]], *, gamma: float = 1.0, eps: float = 1e-06, **fit_kwargs: Any) GAM[source]#
Fit
gamvia the two-stage adaptive lasso (Zou, 2006).The first stage fits
gamto(X, y)using whatever L1 regularization the user configured on each feature – producing a pilot estimate of every coefficient. The second stage rewrites each L1-regularized feature’s per-coefficient L1 weight tobase / (|pilot| + eps) ** gammaand refits. Coefficients with large pilot magnitudes are penalized less; tiny pilots (likely noise) are penalized more, recovering the “oracle” sparsity pattern with less bias on the truly active coefficients than plain L1.Both stages are convex (they are ordinary weighted-L1 fits at fixed weights), so this is just a thin wrapper over two
GAM.fitcalls.- Parameters:
gam (GAM) – Fully configured but not-yet-fit GAM. At least one feature must carry an
regularization={"l1": {"coef": ...}}configuration; other features and other regularizers (l2, group_lasso, …) ride along untouched. The samegamobject is mutated in place and returned.X (data) – Forwarded to both
gam.fitcalls.y (data) – Forwarded to both
gam.fitcalls.gamma (float) – Adaptive-lasso exponent (typically 1.0). Larger
gammamakes the second-stage weights swing harder around the pilot magnitudes.eps (float) – Stabilizer added to the pilot magnitude before exponentiation, so a pilot of exactly zero produces a finite (though large) second-stage weight rather than a divide-by-zero.
**fit_kwargs – Forwarded to both
gam.fitcalls. Both stages see the samesmoothing,max_its,weights, etc.
- Returns:
gam – The same model, now fit with adaptive L1 weights.
- Return type:
Feature ABC#
Feature ABC.
Defines _Feature, the abstract base class every concrete
feature type plugs into the ADMM loop with. The interface is the
modular seam called out in CLAUDE.md: a feature exposes
initialize / optimize / compute_dual_tol / num_params /
dof / predict / _save / _load, and the GAM
orchestrator never needs to know whether the feature is linear,
categorical, spline, or something user-defined.
Linear Feature#
Linear feature: a single slope with closed-form regularized solves.
_LinearFeature contributes m * (x - mean(x)) to the linear
predictor. The per-feature primal step has a closed-form solution at
every supported regularization combination – ridge folds into the
denominator, L1 / group lasso / L_inf group lasso into a 1-D
soft-threshold, Huber into a clipped quadratic – so this branch never
calls cvxpy. Sign / lower / upper convex constraints clip the
unconstrained optimum onto the feasible interval.
Categorical Feature#
Categorical feature: one parameter per level with a zero-sum constraint.
_CategoricalFeature contributes a per-level offset to the
linear predictor. The per-feature primal step is solved as a small
QP / SOCP via cvxpy; available regularizers are L1, L2, group lasso,
the L_inf group lasso variant, network lasso, network ridge, and
Huber. Convex shape constraints (sign / lower / upper / monotonic /
convex / concave on a user-supplied category ordering) compose with
those regularizers inside the same cvxpy program.
Spline Feature#
Cubic regression spline feature with a curvature smoothness penalty.
_SplineFeature represents a smooth univariate term in the
linear predictor as a cubic regression spline on adaptively-chosen
knots. The fit is regularized by an integrated squared second
derivative penalty; the smoothing parameter is determined to hit a
target relative-DOF budget (rel_dof), not by cross-validation.
Two solver paths share the same objective:
Closed-form Cholesky path – used when no shape constraints and no group-lasso terms are active.
cvxpy path – used when group lasso, L_inf group lasso, or shape constraints (sign / lower / upper / monotonic / convex / concave on the values evaluated at the knots) are present.
Helper functions _sample(), _evaluate_spline_basis(), and
_omega_curvature() build the basis matrix and curvature penalty
matrix used in both paths.
Proximal Operators#
Proximal operators for each supported (family, link) pair.
The ADMM dual step in gamdist.GAM._optimize() calls the proximal
operator of the per-observation negative log-likelihood
for the model’s family and link. Per the convexity-only design rule (CLAUDE.md), every dispatch entry corresponds to a convex subproblem and a dedicated solver:
normal + identity, gamma + reciprocal, quantile + identity, and huber + identity have closed forms.
binomial + logit, poisson + log, and inverse-gaussian + reciprocal-squared use damped Newton iterations on the scalar dual update with a backtracking safeguard.
All operators broadcast over their v, y array inputs. Optional
covariate-class sizes (ccs), observation weights (w), and the
shape parameter for huber / quantile losses are passed through
keyword arguments.
Multi-Task GAM#
Multi-task GAM: K supervised-learning tasks fit jointly under a shared feature set with optional convex coupling regularization.
Per the design discussion on issue #39 (and CLAUDE.md’s seam principle):
Per-task primal step is unchanged: each task’s task-local features run their existing
optimize(fpumz_k, rho)independently. A multi-task feature wrapsKof those into a singleoptimize_multi(fpumz_list, rho)call, which is the only place the cross-task coupling penalty enters.Per-task proximal step is unchanged:
Kindependent prox calls, each with its own(family_k, link_k, y_k)– so tasks can use different family/link pairs.MultiTaskGAMis a separate class whose state (f_bar,z_bar,u, residual histories, offset, …) is a length-Klist. Single-taskGAMis unchanged.
The first concrete coupling penalty is group-lasso-across-tasks on
linear features (see
_MultiTaskLinearFeature).
Persistence, MultiTaskGAM.summary(), plotting, model-selection
statistics, and per-task tasks= routing are out of scope for this
slice.
- class gamdist.multi_task_gamdist.MultiTaskGAM(families: list[Literal['normal', 'binomial', 'poisson', 'gamma', 'exponential', 'inverse_gaussian', 'quasi_binomial', 'quasi_poisson', 'quantile', 'huber']], links: list[Literal['identity', 'logistic', 'probit', 'complementary_log_log', 'log', 'reciprocal', 'reciprocal_squared'] | None] | None = None, dispersions: list[float | None] | None = None, name: str | None = None)[source]#
Generalized Additive Model fit jointly across K tasks.
Each task
khas its own(family_k, link_k)pair, its own responsey_k(lengths may differ across tasks), and its own feature data; coefficients are tied together by the coupling regularizer attached to each multi-task feature.- Parameters:
families (list[str]) – Length-K list of family names. Each entry must be one of the single-task families supported by
GAM. Tasks may use different families.links (list[str | None] or None) – Length-K list of link names; defaults to the canonical link of each family.
Noneentries fall back to the canonical link.dispersions (list[float | None] or None) – Optional per-task fixed dispersion.
Noneentries follow the single-taskGAMdefaults (e.g. binomial / poisson are fixed at 1).name (str or None) – Optional model name. Persistence is not supported in this slice; the name is currently informational only.
Notes
Several features of single-task
GAMare intentionally not yet wired up for the multi-task case in this slice and raiseNotImplementedErrorif invoked:save_flag,load_from_file,summary,plot,aic/aicc/gcv/ubre,confidence_intervals, robust covariance,quantile/huberfamilies, covariate classes, and theestimate_overdispersionpath.- add_feature(name: str, type: Literal['linear'], transform: Callable[[ndarray[tuple[Any, ...], dtype[Any]]], ndarray[tuple[Any, ...], dtype[Any]]] | None = None, regularization: dict[str, Any] | None = None) None[source]#
Add a feature shared across all K tasks.
The first slice supports only
type='linear'. Each taskkcontributesm_k * (x_k - mean(x_k))to its own linear predictor; the K slopesm_1, ..., m_Kmay be coupled by a convex coupling regularizer (see_MultiTaskLinearFeaturefor the supported penalties).
- fit(Xs: list[DataFrame], ys: list[ndarray[tuple[Any, ...], dtype[Any]]], weights: list[ndarray[tuple[Any, ...], dtype[Any]] | None] | None = None, smoothing: float = 1.0, verbose: bool = False, max_its: int = 100) None[source]#
Fit the multi-task GAM via K-coupled ADMM.
- Parameters:
Xs (list[pandas.DataFrame]) – Length-K list of per-task design matrices. Column names must include every feature added via
add_feature. Per-task row counts may differ.ys (list[ndarray]) – Length-K list of per-task response arrays.
len(ys[k])must equallen(Xs[k]).weights (list[ndarray | None] or None) – Optional per-task observation weights.
None(or aNoneentry) means unit weights for that task.smoothing (float) – Multiplicative scale applied to every feature’s regularization coefficient (matches single-task
GAM).verbose (bool) – Print per-iteration progress.
max_its (int) – Maximum ADMM iterations (per the joint stopping rule).
- predict(X: DataFrame | list[DataFrame]) list[ndarray[tuple[Any, ...], dtype[float64]]][source]#
Predict per-task mean responses.
- Parameters:
X (pandas.DataFrame or list[pandas.DataFrame]) – If a single DataFrame is passed, the same predictor matrix is used for every task and the K predictions all have the same length. If a list of K DataFrames is passed, each task’s prediction uses its own DataFrame.
- Returns:
mus – Length-K list of predicted mean responses;
mus[k]has shape(len(X[k]),).- Return type:
list of arrays
- deviance(X: list[DataFrame] | None = None, y: list[ndarray[tuple[Any, ...], dtype[Any]]] | None = None, weights: list[ndarray[tuple[Any, ...], dtype[Any]] | None] | None = None) list[float][source]#
Per-task deviance.
With no arguments, returns the training deviance of each task evaluated against the current ADMM iterate. Otherwise, the deviance is computed against the supplied per-task data.
Multi-Task Features#
Feature classes that span multiple tasks in a MultiTaskGAM.
A _MultiTaskFeature owns one set of coefficients shared across
the K tasks and, optionally, a convex coupling regularizer that
ties those K coefficient vectors together. Coupling lives entirely
inside the feature – the multi-task orchestrator never sees a penalty
coefficient – so the seam principle from CLAUDE.md is preserved.
The first concrete subclass is _MultiTaskLinearFeature: a
linear contribution \(m_k (x_k - \bar{x}_k)\) for each task
\(k\). The only coupling penalty implemented in this slice is
group-lasso across tasks,
the K-task generalization of the existing single-task linear
group-lasso. With \(\lambda = 0\) the K subproblems decouple
and the feature behaves like K independent
_LinearFeature instances stitched
together.