Usage Guide

This guide shows how to work with groupedpaneldatamodels after installation.

Quick start 

First import the library and prepare the data:

import numpy as np
import groupedpaneldatamodels as gpdm

# --- simulate a small toy panel (N entities, T periods, K regressors) ---
N, T, K = 100, 10, 2
rng = np.random.default_rng(seed=1)

# regressors
X = rng.normal(size=(N, T, K))

# true coefficients for the two regressors
beta_true = np.array([1.0, -0.5]).reshape(1, 1, K)

# disturbances
u = rng.normal(scale=0.2, size=(N, T, 1))

# generate dependent variable
Y = (X * beta_true).sum(axis=-1, keepdims=True) + u

# --- estimate the grouped fixed‑effects model ---
model = gpdm.GroupedFixedEffects(dependent=Y, exog=X, G=3)
model.fit(max_iter=100, tol=1e-4)
model.summary()

All estimators expect 3D NumPy arrays: dependent with shape (N, T, 1) and exog with shape (N, T, K). Pandas DataFrame support is on the roadmap; for now convert your DataFrame to the required 3D arrays with df.to_numpy().reshape(N, T, K).

Grouped Fixed Effects 

Two grouped‑fixed‑effects estimators are available:

Bonhomme & Manresa (2015) – clustering approach (default)
Su, Shi & Phillips (2016) – C‑Lasso penalisation

Both are exposed through one class:

model_gfe = gpdm.GroupedFixedEffects(
    dependent=y,
    exog=x,
    G=3,                       # maximum number of groups
    use_bootstrap=False        # analytical s.e. by default
)
model_gfe.fit(max_iter=100, tol=1e-4)
model_gfe.summary()

Common constructor arguments 

Argument	Type	Default	Purpose
`dependent`	`ndarray`	—	3‑D response array `(N, T, 1)`
`exog`	`ndarray`	—	3‑D regressor array `(N, T, K)`
`G`	`int`	—	Maximum number of latent groups to consider
`use_bootstrap`	`bool`	`False`	If True compute bootstrap s.e.’s after fitting
`model`	`str`	`"bonhomme_manresa"`	Choose `"bonhomme_manresa"` or `"su_shi_phillips"`
`heterogeneous_beta`	`bool`	`True`	Make the slope coefficients group‑specific
`entity_effects`	`bool`	`False`	Include individual fixed effects \(\\alpha_i\)

Model‑specific fit options 

Bonhomme & Manresa 

Parameter	Type	Default
`n_boot`	`int`	`50`
`max_iter`	`int`	`10000`
`tol`	`float`	`1e-6`
`gfe_iterations`	`int`	`100`
`enable_vns`	`bool`	`False`

Su, Shi & Phillips 

Parameter	Type	Default
`kappa`	`float`	`0.1`
`n_boot`	`int`	`50`
`max_iter`	`int`	`1000`
`tol`	`float`	`1e-6`
`only_bfgs`	`bool`	`True`

Inspecting results 

beta = model_gfe.params                       # coefficients
se   = model_gfe.params_standard_errors       # bootstrap or analytical
ci   = model_gfe.get_confidence_intervals()   # 95 % by default

Key result attributes and methods:

params – group‑specific coefficient arrays: shape (G, K) for GFE or (G, K, R) for GIFE
params['g'] – dictionary mapping each group label to the list of entity indices assigned to that group
params_standard_errors – estimated sampling variability for every coefficient (bootstrap or analytical)
IC – information‑criteria dict with keys "AIC", "BIC", "HQIC", and residual variance "sigma^2"
get_confidence_intervals(level=0.90) – return lower/upper bounds at any confidence level
to_dict(store_bootstrap_iterations=True) – serialise the fitted model for saving or post‑processing

You can also call summary(confidence_level=0.99, standard_errors='analytical') to customise the printed table.

To see more details about this model, view the groupedpaneldatamodels.GroupedFixedEffects documentation.

Grouped Interactive Fixed Effects 

Interactive estimators extend the fixed‑effects ideas with unobserved common factors. Two variants are implemented:

Ando & Bai (2016) – clustering (default)
Su & Ju (2018) – C‑Lasso penalisation

Typical workflow:

model_gife = gpdm.GroupedInteractiveFixedEffects(
    dependent=y,
    exog=x,
    G=3,
    use_bootstrap=False
)
model_gife.fit(max_iter=100, tol=1e-4)
model_gife.summary()

Constructor arguments 

Only the additions relative to the fixed‑effects constructor are shown.

Argument	Type	Default
`model`	`str`	`"ando_bai"`
`R`	`int`	`1`
`GF`	`array`	`[1,…,1]`

Fit‑time options match those in the thesis:

n_boot, max_iter, tol, gife_iterations
kappa and gamma (SCAD penalty, Ando & Bai)
only_bfgs (Su & Ju)

Results are accessed the same way as for the fixed‑effects estimator.

To see more details about this model, view the groupedpaneldatamodels.GroupedInteractiveFixedEffects documentation.

Information‑criterion model search 

Use grid_search_by_ic to choose G (or any other hyper‑parameter) automatically:

best_model, best_ic = gpdm.grid_search_by_ic(
    gpdm.GroupedFixedEffects,
    param_ranges={"G": [2, 3, 4, 5, 6]},
    init_params={"dependent": y, "exog": x},
    fit_params={"max_iter": 100},
    ic_criterion="BIC"
)

The helper tests every candidate, computes the requested criterion (BIC, AIC or HQIC) and returns the best‑scoring fitted model. Generally, it it preferred to use the BIC criterion, as it has shown to be the most reliable in practice (as it is the best at selecting the true number of groups in simulations).

To see more details about this function, view the groupedpaneldatamodels.grid_search_by_ic() documentation.

Advanced configuration 

Bootstrap control, pass use_bootstrap=True to the constructor and n_boot=500 (plus boot_n_jobs=-1 for parallelisation) in fit for high‑precision inference.
Reproducibility, provide random_state=42 to seed all stochastic components including bootstrap resampling.
Silent mode, set hide_progressbar=True to suppress progress bars and disable_analytical_se=True when only bootstrap SEs are needed.
Prediction & diagnostics, after fitting, you can inspect residuals via model.resid.