Hosmer-Lemeshow Test

Hosmer-Lemeshow Test

Overview

Definition

The Hosmer-Lemeshow Test is a statistical test for goodness of fit for logistic regression models. It assesses whether the observed event rates match expected event rates in subgroups of the model population.


1. Procedure

  1. Predict Probabilities: Calculate predicted probabilities for all observations.
  2. Group Data: Sort observations by predicted probability and divide them into g groups (typically deciles, g=10).
  3. Compare: In each group, calculate the expected number of events versus observed events.
  4. Chi-Square Statistic:H=j=1g(OjEj)2Njπj(1πj)Where Oj is observed events, Ej is expected events, and πj is the average predicted probability in group j.

2. Hypothesis

Interpretation:

Limitation

The test is sensitive to grouping method and sample size. It is often recommended to use it alongside calibration plots.


3. Python Implementation

Note: Not available in standard sklearn. Custom implementation or libraries like scikit-learn-extra or statistical packages are needed.

# Conceptual implementation
# Group data by deciles of predicted probability
# Calculate Chi-square between observed and expected counts