Multinomial Logistic Regression (MNLogit)

Multinomial Logistic Regression

Definition

Core Statement

Multinomial Logistic Regression (MNLogit) extends Binary Logistic Regression to handle outcomes with more than two unordered categories (e.g., Transportation Mode: Car, Bus, Bike). It models the probability of each category relative to a reference category.

Purpose

Model nominal (unordered) multi-class outcomes.
Understand which factors predict category membership.
Calculate predicted probabilities for each class.

When to Use

Use MNLogit When...

Outcome has 3+ unordered categories.
Predictors can be continuous, categorical, or mixed.
Categories are mutually exclusive and exhaustive.

Do NOT Use When...

Outcome is ordinal (use Ordinal Logistic Regression).
Outcome is binary (use Binary Logistic Regression).

Theoretical Background

The Model

For $J$ categories, MNLogit estimates $J - 1$ sets of coefficients (one category is the reference).

\ln (\frac{P (Y = j)}{P (Y = ref)}) = β_{j 0} + β_{j 1} X_{1} + \dots

Relative Risk Ratio (RRR)

R R R = e^{β_{j}}

"A 1-unit increase in $X$ multiplies the odds of being in category $j$ versus the reference by $R R R$ ."

Independence of Irrelevant Alternatives (IIA)

Critical Assumption

MNLogit assumes that the odds of choosing between any two categories are independent of other categories. (e.g., Preference for Car vs Bus is unaffected by adding a new "Train" option).
If violated, use Nested Logit or Mixed Logit.

Assumptions

Nominal Outcome: Categories are unordered.
Independence of Observations.
IIA: Independence of Irrelevant Alternatives.
Linearity between log-odds and predictors.
Sufficient Sample Size: Each category needs enough observations.

Limitations

Pitfalls

IIA Violation: If adding a new category changes the relative odds of existing categories, the model is misspecified.
Complexity: Interpretation requires $J - 1$ sets of coefficients.
Sparse Categories: Rare categories can cause estimation problems.

Python Implementation

import statsmodels.api as sm

# Fit MNLogit
X = sm.add_constant(df[['income', 'distance']])
y = df['transport_mode']  # Categories: 'car', 'bus', 'bike'

model = sm.MNLogit(y, X).fit()
print(model.summary())

# Relative Risk Ratios
import numpy as np
print("\n--- Relative Risk Ratios ---")
print(np.exp(model.params))

R Implementation

library(nnet)

# Fit MNLogit (multinom function)
# Relevel to set reference category
df$transport_mode <- relevel(factor(df$transport_mode), ref = "car")

model <- multinom(transport_mode ~ income + distance, data = df)
summary(model)

# Relative Risk Ratios
exp(coef(model))

# Confidence Intervals (z-test)
z <- summary(model)$coefficients / summary(model)$standard.errors
p <- (1 - pnorm(abs(z), 0, 1)) * 2
print(p)

Worked Numerical Example

Transportation Choice Model

Outcome: How do students commute? (Walk, Bike, Car)
Predictors: Distance_to_School (miles), Income ($1000s)
Base Category: Walk

Results:

Outcome	Predictor	Coefficient	RRR	Interpretation
Walk (base)	-	0	1.0	Reference category
Bike vs Walk	Distance	0.15	1.16	Each mile → 16% more likely to bike than walk
Bike vs Walk	Income	-0.02	0.98	Higher income → slightly less likely to bike
Car vs Walk	Distance	0.40	1.49	Each mile → 49% more likely to drive than walk
Car vs Walk	Income	0.08	1.08	Each $1K income → 8% more likely to drive

Predictions for a student 5 miles away, income $40K:

P(Walk) = 0.15
P(Bike) = 0.25
P(Car) = 0.60
Most likely choice: Car

Interpretation Guide

Output	Interpretation	Edge Case Notes
RRR (Car vs Walk) = 2.5	Car 2.5× more likely than Walk per unit increase.	Relative to base only! Doesn't tell you Car vs Bike.
RRR = 1.0	No effect on relative odds between categories.	Coefficient ≈ 0 (exactly 0 if no effect).
RRR = 0.5	Outcome 50% less likely than base.	RRR < 1 means negative association.
All RRRs > 1 for one predictor	Predictor increases all non-base categories.	Common for variables like "service quality" in choice models.
Base category has P = 0.90	Model may be unstable (rare outcomes).	Consider combining rare categories or multinomial ordered logit.

Common Pitfall Example

IIA Violation: The Red Bus / Blue Bus Problem

Classic Example:

Initial choices: Car (70%), Bus (30%)
MNLogit assumes: P(Car)/P(Bus) = constant regardless of other options

IIA Assumption: If you add a "Red Bus" option:

IIA predicts: Car (70%), Blue Bus (15%), Red Bus (15%)
Reality: Car (70%), Blue Bus (20%), Red Bus (10%)
Why? Red/Blue buses are close substitutes, violate independence!

Real Scenario:

Brand choice: Coke, Pepsi, Store Brand
Add "Diet Coke" → should draw more from Coke than Pepsi
But MNLogit assumes all alternatives equally affected

Test IIA:

Hausman test: Fit model with all options, then drop one
If coefficients change substantially → IIA violated

Solutions if IIA violated:

Nested Logit (group similar alternatives)
Mixed Logit (random coefficients)
Multinomial Probit (no IIA assumption)

When IIA is OK:

Alternatives are truly distinct (Car, Bike, Walk)
No obvious substitution patterns

Interpretation Guide

Comparison	Coef	RRR	Interpretation
Bus vs Car (Income)	-0.5	0.61	Higher income decreases odds of choosing Bus over Car by 39%.
Bike vs Car (Distance)	-1.2	0.30	Longer distance decreases odds of Bike over Car by 70%.

Binary Logistic Regression
Ordinal Logistic Regression - For ordered outcomes.
Generalized Linear Models (GLM)

Multinomial Logistic Regression

Definition

Purpose

When to Use

Theoretical Background

The Model

Relative Risk Ratio (RRR)

Independence of Irrelevant Alternatives (IIA)

Assumptions

Limitations

Python Implementation

R Implementation

Worked Numerical Example

Interpretation Guide

Common Pitfall Example

Interpretation Guide

Related Concepts