---
title: "Estimating Interactions with Post-double Selection"
date: "`r Sys.Date()`"
link-citations: yes
bibliography: inters.bib
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Estimating Interactions with Post-double Selection}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r loadpkg, echo = FALSE, include = FALSE}
knitr::opts_chunk$set(fig.width = 8, fig.height = 5, fig.align = "center")
require(ggplot2)
library(lmtest)
library(fixest)
library(inters) 
```


In this vignette, we demonstrate how to use the `inters` package to conduct post-double selection for interactions with linear models. We use the remittances data of @EscMesWri2018 to illustrate the method, as shown in @BlaOls21. The goal of this study was to evaluate how remittances affect political protest differently in democracies and non-democracies. 

To begin, we load the data and run two alternative models. The first is a simple single-interaction model that includes the treatment (remittances, `remit`), the moderator (a binary variable for autocracy, `dict`), an interaction between these two, and a series of control variables. We use the `feols` function from the `fixest` package to handle country and period fixed effects along with clustering at the country level. 

```{r single}
data(remit)
single <- feols(Protest ~ remit*dict + l1gdp + l1pop + l1nbr5 + l12gr + l1migr
               + elec3 | period + cowcode, data = remit)
coeftable(single, cluster = ~ caseid)[c("remit", "remit:dict"),]
```

Next, we compare this single-interaction model to a model that fully interacts the moderator with the entire set of controls, including the fixed effects. @BlaOls21 call this the *fully moderated model*.

```{r fully}
fully <- feols(Protest ~ dict * (remit + l1gdp + l1pop + l1nbr5 + l12gr + l1migr +
                                   elec3 + factor(period) + factor(cowcode)),
               data = remit)
coeftable(fully, cluster ~ caseid)[c("remit", "dict:remit"),]
```

Finally, we compare both of these approaches to that of the post-double-selection estimator of @BelCheHan14, which uses the lasso to select variables that are important to the outcome, treatment, or the treatment-moderator interaction, then runs a standard least squares regression on those variables selected by the various lasso steps. The `post_ds_interactions` function implements this procedure and takes character strings with the names of various variables. Furthermore, it can also handle clustered data, which importantly changes the calculation of the penalty parameter in the lasso steps.  

```{r post-double}
controls <- c("l1gdp", "l1pop", "l1nbr5", "l12gr", "l1migr", "elec3")
post_ds_out <- post_ds_interaction(data = remit, treat = "remit", moderator = "dict",
                                outcome = "Protest", control_vars = controls,
                                panel_vars = c("cowcode", "period"),
                                cluster = "caseid")
lmtest::coeftest(post_ds_out, vcov = post_ds_out$clustervcv)[c("remit", "remit_dict"),]
```

With these results in hand, we can compare the different methods to see that the fully moderated and post-double-selection approaches both provide similar point estimates, with the post-double-selection estimator having slightly less uncertainty. The single-interaction model, on the other hand, leads to a dramatically different conclusion. 

```{r calcs, echo = FALSE}
cis <- rbind(coefci(single, parm = c("remit", "remit:dict"),
                    vcov = single$clustervcv),
             coefci(fully, parm = c("remit", "dict:remit"),
                    vcov = fully$clustervcv),
             coefci(post_ds_out, parm = c("remit", "remit_dict"),
                    vcov = post_ds_out$clustervcv))
colnames(cis) <- c("lower", "upper")
coefs <- c(coef(single)[c("remit", "remit:dict")],
           coef(fully)[c("remit", "dict:remit")],
           coef(post_ds_out)[c("remit", "remit_dict")])
coefficients_df <- data.frame(
  qoi  = factor(rep(c("Marginal Effect, Democracy",
                 "Interaction"), times = 3)),
  method =  factor(rep(c("Single Interaction", "Fully Moderated",
                  "Post-double Selection"), each = 2)),
  est = coefs,
  cis = cis
)
```


```{r coefplot, echo = FALSE}
ggplot(coefficients_df, aes(x = qoi, y = est,
                            group = method, colour = method,
                            shape = method)) +
  geom_hline(yintercept = 0, linetype = 2, size = 1, colour = "indianred") +
  geom_point(size = 3, position = position_dodge(width = 0.5)) +
  geom_errorbar(size = 1, aes(ymin = cis.lower, ymax = cis.upper),
                position = position_dodge(width = 0.5), width = 0) +
  scale_colour_grey() +
  xlab("Quantity of Interest") +
  labs(color = "Method", shape = "Method") +
  ylab("Estimate") +
  theme_minimal()
```


## References