跳到主要内容

Simple Difference-in-Differences

The difference-in-differences (DID) estimator is a popular method in econometrics to estimate causal effects.

In the simplest regression model, we can have the following equation:

yit=α+βDt+ϵit,i=1,,N,t=0,1y_{it} = \alpha + \beta D_t + \epsilon_{it}, \quad i = 1, \ldots, N, \quad t = 0, 1

where:

  • yity_{it} is the outcome variable for individual ii at time tt,
  • DtD_t is the treatment variable, which equals 1 (postintervention) if t=1t = 1 and 0 (preintervention) if t=0t = 0,

β\beta can be estimated by the following regression:

β^=i=1Nyi1yi0N\hat{\beta} = \frac{\sum_{i=1}^N y_{i1} - y_{i0}}{N} yi1=α+β+ϵi1yi0=α+ϵi0yi1yi0=β+ϵi1ϵi0\begin{aligned} y_{i1} = \alpha + \beta + \epsilon_{i1} \\ y_{i0} = \alpha + \epsilon_{i0} \\ y_{i1} - y_{i0} = \beta + \epsilon_{i1} - \epsilon_{i0} \end{aligned}

The above regression can be modified to include an untreated control group:

yitj=α+α1Dt+α1Dj+βDtj+ϵitj,i=1,,N,t=0,1y_{it}^j = \alpha + \alpha_1 D_t + \alpha^1 D^j + \beta D_{t}^j + \epsilon_{it}^j, \quad i = 1, \ldots, N, \quad t = 0, 1

where:

  • yitjy_{it}^j is the outcome variable for individual ii in group jj at time tt,
  • Dj=1D^j = 1 if individual ii is in the treatment group and 0 if individual ii is in the control group,
  • Dtj=1D_t^j = 1 if t=1t = 1 and j=1j = 1 and 0 otherwise

This regression can be known as the difference-in-differences estimator because it estimates the difference between the treatment and control groups before and after the intervention.

yi11=α+α1+α1+β+ϵi11y_{i1}^1 = \alpha + \alpha_1 + \alpha^1 + \beta + \epsilon_{i1}^1 \\

The preinvention period is t=0t = 0 and j=1j = 1, so Dtj=0D_{t}^j = 0 and Dt=0D_t = 0.

yi01=α+α1+ϵi01y_{i0}^1 = \alpha + \alpha^1 + \epsilon_{i0}^1 \\

The difference between the postintervention period and the preintervention period is:

yi11yi01=α1+β+ϵi11ϵi01y_{i1}^1 - y_{i0}^1 = \alpha_1 + \beta + \epsilon_{i1}^1 - \epsilon_{i0}^1

For the control group: The postintervention period is t=1t = 1 and j=0j = 0, so Dtj=0D_{t}^j = 0 and Dt=1D_t = 1.

yi10=α+α1+ϵi10y_{i1}^0 = \alpha + \alpha_1 + \epsilon_{i1}^0 \\

The preinvension period is t=0t = 0 and j=0j = 0, so Dtj=0D_{t}^j = 0 and Dt=0D_t = 0.

yi00=α+ϵi00y_{i0}^0 = \alpha + \epsilon_{i0}^0 \\

The difference between the postintervention period and the preintervention period is:

yi10yi00=α1+ϵi10ϵi00y_{i1}^0 - y_{i0}^0 = \alpha_1 + \epsilon_{i1}^0 - \epsilon_{i0}^0

Finally, we take the difference between the treatment difference and control groups difference:

(yi11yi01)(yi10yi00)=(α1+β+ϵi11ϵi01)(α1+ϵi10ϵi00)=β+ϵi11ϵi01ϵi10+ϵi00\begin{aligned} (y_{i1}^1 - y_{i0}^1) - (y_{i1}^0 - y_{i0}^0) &= (\alpha_1 + \beta + \epsilon_{i1}^1 - \epsilon_{i0}^1) - (\alpha_1 + \epsilon_{i1}^0 - \epsilon_{i0}^0) \\ &= \beta + \epsilon_{i1}^1 - \epsilon_{i0}^1 - \epsilon_{i1}^0 + \epsilon_{i0}^0 \end{aligned}

Then, assume that the error terms are independent and identically distributed (i.i.d) with mean 0 and variance σ2\sigma^2:

E[ϵi11ϵi01ϵi10+ϵi00]=0E[\epsilon_{i1}^1 - \epsilon_{i0}^1 - \epsilon_{i1}^0 + \epsilon_{i0}^0] = 0

Therefore, the difference-in-differences estimator is:

β^=i=1N(yi11yi01)(yi10yi00)N\hat{\beta} = \frac{\sum_{i=1}^N (y_{i1}^1 - y_{i0}^1) - (y_{i1}^0 - y_{i0}^0)}{N}