Simple Difference-in-Differences

The difference-in-differences (DID) estimator is a popular method in econometrics to estimate causal effects.

In the simplest regression model, we can have the following equation:

y_{it} = \alpha + \beta D_t + \epsilon_{it}, \quad i = 1, \ldots, N, \quad t = 0, 1

where:

$y_{it}$ is the outcome variable for individual $i$ at time $t$ ,
$D_t$ is the treatment variable, which equals 1 (postintervention) if $t = 1$ and 0 (preintervention) if $t = 0$ ,

$\beta$ can be estimated by the following regression:

\hat{\beta} = \frac{\sum_{i=1}^N y_{i1} - y_{i0}}{N}

\begin{aligned} y_{i1} = \alpha + \beta + \epsilon_{i1} \\ y_{i0} = \alpha + \epsilon_{i0} \\ y_{i1} - y_{i0} = \beta + \epsilon_{i1} - \epsilon_{i0} \end{aligned}

The above regression can be modified to include an untreated control group:

y_{it}^j = \alpha + \alpha_1 D_t + \alpha^1 D^j + \beta D_{t}^j + \epsilon_{it}^j, \quad i = 1, \ldots, N, \quad t = 0, 1

where:

$y_{it}^j$ is the outcome variable for individual $i$ in group $j$ at time $t$ ,
$D^j = 1$ if individual $i$ is in the treatment group and 0 if individual $i$ is in the control group,
$D_t^j = 1$ if $t = 1$ and $j = 1$ and 0 otherwise

This regression can be known as the difference-in-differences estimator because it estimates the difference between the treatment and control groups before and after the intervention.

y_{i1}^1 = \alpha + \alpha_1 + \alpha^1 + \beta + \epsilon_{i1}^1 \\

The preinvention period is $t = 0$ and $j = 1$ , so $D_{t}^j = 0$ and $D_t = 0$ .

y_{i0}^1 = \alpha + \alpha^1 + \epsilon_{i0}^1 \\

The difference between the postintervention period and the preintervention period is:

y_{i1}^1 - y_{i0}^1 = \alpha_1 + \beta + \epsilon_{i1}^1 - \epsilon_{i0}^1

For the control group: The postintervention period is $t = 1$ and $j = 0$ , so $D_{t}^j = 0$ and $D_t = 1$ .

y_{i1}^0 = \alpha + \alpha_1 + \epsilon_{i1}^0 \\

The preinvension period is $t = 0$ and $j = 0$ , so $D_{t}^j = 0$ and $D_t = 0$ .

y_{i0}^0 = \alpha + \epsilon_{i0}^0 \\

The difference between the postintervention period and the preintervention period is:

y_{i1}^0 - y_{i0}^0 = \alpha_1 + \epsilon_{i1}^0 - \epsilon_{i0}^0

Finally, we take the difference between the treatment difference and control groups difference:

\begin{aligned} (y_{i1}^1 - y_{i0}^1) - (y_{i1}^0 - y_{i0}^0) &= (\alpha_1 + \beta + \epsilon_{i1}^1 - \epsilon_{i0}^1) - (\alpha_1 + \epsilon_{i1}^0 - \epsilon_{i0}^0) \\ &= \beta + \epsilon_{i1}^1 - \epsilon_{i0}^1 - \epsilon_{i1}^0 + \epsilon_{i0}^0 \end{aligned}

Then, assume that the error terms are independent and identically distributed (i.i.d) with mean 0 and variance $\sigma^2$ :

E[\epsilon_{i1}^1 - \epsilon_{i0}^1 - \epsilon_{i1}^0 + \epsilon_{i0}^0] = 0

Therefore, the difference-in-differences estimator is:

\hat{\beta} = \frac{\sum_{i=1}^N (y_{i1}^1 - y_{i0}^1) - (y_{i1}^0 - y_{i0}^0)}{N}