Blog Post Five: The Air War

October 11, 2022

Model Update: Part One

This week, I will be updating my model to take “the air war,” campaign advertising, into account. I will also move from predicting vote share at the state-level to predicting vote share at the district-level, using the insights I drew from my state-level models of previous weeks.

Many elements of this model are the same as they were in last week’s model. We again only consider those races in which there is both a Democrat and a Republican running, and third party candidates acquire less than 10% of the vote, since other cases obscure the two-party dynamic we are trying to pick up on. To predict the vote share in each district, we use the vote share from the previous election to set a baseline for how liberal or conservative the particular district is. We include the average of generic ballot poll results within two months of the election, to see what national sentiment is like; and include flags for whether the candidate is an incumbent (and what his party is), to account for the incumbency advantage.

We also include the statewide unemployment rate to recognize that many voters are influenced by the state of the economy - though in contrast to last week’s model, the Q8 unemployment rate is a better predictor than the Q7-Q8 percent change in unemployment rate this time around. Recall that Wright, in his paper, finds that both measures of the unemployment rate are predictive of voter outcomes, but that the flat Q8 rate is slightly more predictive. Our return to this conclusion is interesting, since over the course of the last few weeks we’ve seen that depending on which other predictors we use, the measurement of unemployment that is more predictive changes. Perhaps this lack of robustness indicates that the state of the local economy, while relevant, is a lower-order priority in the mind of voters. Or, perhaps this is purely due to random chance, since we’re looking at many different (but related) models and both the absolute Q8 rate and the Q7-Q8 percent change are intended to measure the same underlying effect.

Another departure from last week’s model is that we no longer allow the presence of a Democrat president interact with the coefficients that predict the Democrat candidate’s vote share in the district. We make this choice simply due to limitations presented by the data - we only have data on the air war from the last few elections, in which there is not sufficient variation in the party of the sitting president to justify stratifying our predictors like that. If we had more data, we would certainly bring this element of our model back, since we recall Wright’s conclusions that voters do reward or punish the Democrat congressional candidate differently depending on whether a Democrat president is in office.

Finally, we augment our model as compared to last week by including the effect of campaign advertisements. We have data on each specific ad run in the last few elections, which includes information such as the tone/purpose of the advertisement and the estimated amount spent on the advertisement. This means that we have a few decisions to make regarding how to use this data. Huber and Arceneaux, in an observational study of a “natural” experiment, find that advertisements do not actually educate voters on the issues at hand, and that both political and personal messages have important persuasive effects on voters. This suggests that we should not stratify the advertisements based on their content, and should instead only consider the total spending (which is a proxy for the reach of the ads). On the other hand, Gerber et al., in an actual experiment, find that the persuasive effects of campaign advertisements decay very quickly, instead of leaving a permanent imprint on voters’ beliefs. This suggests that we should only consider those advertisements that aired very close to Election Day, and we settle on a one month window (instead of the one week cutoff that Gerber et al. suggest) so that we have sufficient data for our analysis.

Below is output that summarizes our model’s fit:

## 
## Call:
## lm(formula = DemPct ~ DemPctPrev + DemPolls + UnempQ4 + DemIncumbent + 
##     RepIncumbent + DemSpending + RepSpending, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -16.1753  -3.0401   0.0761   2.7567  27.9571 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -4.206e+01  1.118e+01  -3.762 0.000196 ***
## DemPctPrev        2.906e-01  3.386e-02   8.581 2.69e-16 ***
## DemPolls          1.384e+02  2.002e+01   6.914 2.10e-11 ***
## UnempQ4           8.832e-01  1.981e-01   4.459 1.10e-05 ***
## DemIncumbentTRUE  4.771e+00  9.307e-01   5.126 4.80e-07 ***
## RepIncumbentTRUE -2.819e+00  7.088e-01  -3.977 8.41e-05 ***
## DemSpending       1.550e-06  2.693e-07   5.758 1.81e-08 ***
## RepSpending      -8.437e-07  3.030e-07  -2.785 0.005635 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.366 on 367 degrees of freedom
##   (13159 observations deleted due to missingness)
## Multiple R-squared:  0.5286, Adjusted R-squared:  0.5196 
## F-statistic: 58.79 on 7 and 367 DF,  p-value: < 2.2e-16

We find that all our coefficients are significant, which is great (and what we would expect based on the theoretical story of our model). The signs of the coefficient also all make sense. The vote share achieved by the Democrat candidate is positively related to the previous election’s vote share, how well Democrats are doing in the generic ballot, the unemployment rate (since the Democrats “own” this issue), if the Democrat candidate is the incumbent, and the amount spent on Democrat ads. The vote share achieved by the Democrat candidate is negatively related to the presence of a Republican incumbent, and the amount spent on Republican ads.

Our R^2 went down from where it was last week, dropping to around 53%, but this is not cause for alarm. In last week’s model, we predicted state-wide vote shares, using a larger number of elections. In this week’s model, we are predicting district-wide vote shares, using a smaller number of elections. And since there are many more congressional districts than states, the variance of our dependent variable has increased quite a bit, so we would expect our R^2 to correspondingly decrease. The standard error of our residuals actually decreased slightly as compared to last week, dropping to 5.4, and this is confirmed by our bootstrapped estimate of the root mean squared error dropping to 5.5. So, our predictions actually got slightly more precise as compared to last week, which is heartening.

Interestingly, the coefficients with the smallest magnitudes are far-and-away the coefficients for campaign spending. This does make sense in the context of Gerber et al., since they show that including the cumulative effect of many weeks’ advertisements (which we are essentially doing by calculating the total spending in the final month of the campaign) lessens the measured effect of the air war. Perhaps in future weeks, I should consider advertisement spending more carefully, using only those ads aired in the last week of the campaign (and inferring this spending based on previous spending when this data is not available).

Here is the usual plot of predicted vote share versus actual vote share, with the points decently close to the 45 degree line. Side-by-side is a histogram of the residuals, which again looks approximately normal and validates our statistical assumptions about the model.

Model Update: Part Two

Our updated model certainly looks useful, but a predictive model is only useful to the extent that we know the values of the predictors “ahead of time.” And here, we run into a problem. It’s great that we have a model that uses the total spending on advertisements in the month leading up to Election Day to predict vote share, but we won’t know the total spending on advertisements until after Election Day! And at that point, the election will have already happened, so we won’t need to use our model to predict it anymore.

So, in order to use our model to predict the 2022 midterms, we need a way to predict how much spending there will be on campaign advertisements this October. We could simply use the 2018 values (since they are the most recent values we have), but we will instead opt to train a simple model that predicts advertisement spending in one year from advertisement spending in the previous year.

## 
## Call:
## lm(formula = Spending ~ SpendingPrev + Incumbent, data = df_ads)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3764689  -674991  -279154   449235  6784029 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    6.908e+05  9.610e+04   7.188 2.73e-12 ***
## SpendingPrev   5.625e-01  4.010e-02  14.028  < 2e-16 ***
## IncumbentTRUE -2.723e+05  1.198e+05  -2.274   0.0234 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1279000 on 453 degrees of freedom
##   (26612 observations deleted due to missingness)
## Multiple R-squared:  0.3075, Adjusted R-squared:  0.3044 
## F-statistic: 100.6 on 2 and 453 DF,  p-value: < 2.2e-16

Above is a printout of the fit of our simple model. We use data on both Democrat spending and Republican spending, and predict the spending in one election from the spending in the previous election, adding a flag for whether the candidate is an incumbent or not. We see exactly what we might expect - spending is positively correlated with past spending, but incumbents tend to spend less because they have an advantage. Both variables are significant, and the R^2 is just over 30%, which is not super high but is an improvement over just using the previous election’s spending as our prediction for the next election’s spending.

We unfortunately do not have data on 2020 campaign spending, so we will have to use the 2018 spending in this model to predict the 2022 spending. This is not ideal, and adds an additional source of error to our final prediction, but there is not much we can do about it.

2022 Prediction

With all the pieces in place, we can finally make our district-level vote share prediction for 2022. At the very bottom of this post is a table showing our predictions in each district for which we have advertising data, complete with lower and upper bounds forming a 95% prediction interval.

To produce a nationwide vote share prediction, we can make use of the fact that districts are of approximately equal size. This means that in aggregating the district vote shares to estimate a nationwide vote share prediction, we should use weights that are roughly equal among the districts. So, we will take the average of predictions for all of our districts.

If we further assume that errors between districts are uncorrelated, we can also take the average of the lower and upper bounds, to form a 95% prediction interval for our prediction. As per FiveThirtyEight, this might not be the most reasonable assumption to make, so our prediction interval is likely underestimating the true error in our prediction. Nonetheless, it is a useful baseline for us to consider.

Our nationwide prediction, then, is 46.6% of the two-party vote share going to Democrats, with a lower bound of 36% and an upper bound of 57%. It’s important to remember that we formed this prediction based only on those districts for which we have data on advertising, which tend to be the more competitive districts. This is a somewhat representative sample of all voting districts, since it is not biased towards more liberal or more conservative districts, but it is not a perfectly representative sample, since some districts are very noncompetitive. So, in the future, we might think about forming one model (such as this model) for the competitive districts, and a second model for the noncompetitive districts, and combining those models’ predictions to form our final nationwide prediction.

##     district_id LowerBound Predicted UpperBound
## 1          AZ01       41.9      52.6       63.2
## 2          AZ02       42.7      53.4       64.0
## 3          AZ08       30.8      41.4       52.0
## 4          AZ09       44.1      54.8       65.5
## 5          AR02       32.3      43.0       53.6
## 6          CA04       33.4      44.0       54.6
## 7          CA10       45.1      55.8       66.5
## 8          CA16       44.2      54.8       65.5
## 9          CA21       35.7      46.3       57.0
## 10         CA22       34.0      44.6       55.2
## 11         CA24       44.0      54.7       65.3
## 12         CA25       40.3      51.1       61.8
## 13         CA36       44.5      55.1       65.8
## 14         CA39       36.7      47.4       58.0
## 15         CA45       43.7      54.3       64.9
## 16         CA48       43.6      54.7       65.8
## 17         CA50       34.5      45.1       55.7
## 18         CO03       33.1      43.7       54.3
## 19         CO06       46.5      57.2       67.9
## 20         FL06       30.8      41.4       52.1
## 21         FL07       42.5      53.2       63.9
## 22         FL15       32.5      43.1       53.7
## 23         FL16       32.3      42.9       53.5
## 24         FL18       31.6      42.3       52.9
## 25         FL19       30.0      40.7       51.3
## 26         FL26       36.5      47.2       57.8
## 27         FL27       33.9      44.5       55.2
## 28         GA06       43.3      53.9       64.6
## 29         GA12       30.8      41.4       52.1
## 30         HI01       47.6      58.3       69.1
## 31         IL06       45.3      56.0       66.6
## 32         IL12       32.2      42.8       53.4
## 33         IL13       34.7      45.3       55.9
## 34         IL14       44.7      55.3       66.0
## 35         IN02       29.6      40.2       50.9
## 36         IN03       27.5      38.1       48.8
## 37         IN05       31.9      42.6       53.2
## 38         IN09       28.8      39.5       50.2
## 39         IA01       33.5      44.1       54.8
## 40         IA03       41.6      52.3       63.0
## 41         IA04       30.0      40.6       51.2
## 42         KS02       31.9      42.5       53.2
## 43         KS03       42.2      52.9       63.7
## 44         KS04       28.6      39.3       49.9
## 45         KY03       44.6      55.3       66.0
## 46         KY06       31.9      42.5       53.1
## 47         ME01       44.1      54.8       65.5
## 48         ME02       42.7      53.4       64.1
## 49         MI02       31.5      42.1       52.7
## 50         MI06       32.4      43.0       53.6
## 51         MI07       32.8      43.4       54.0
## 52         MI08       45.4      56.0       66.7
## 53         MI11       44.3      55.0       65.6
## 54         MN01       32.8      43.5       54.2
## 55         MN02       41.7      52.4       63.1
## 56         MN03       42.1      52.9       63.6
## 57         MN08       28.8      39.5       50.2
## 58         MO02       32.5      43.2       53.8
## 59         MT00       31.4      42.0       52.7
## 60         NE02       31.6      42.3       52.9
## 61         NV03       44.9      55.6       66.2
## 62         NV04       46.1      56.7       67.4
## 63         NH01       40.5      51.3       62.0
## 64         NJ03       44.2      54.8       65.5
## 65         NJ07       42.1      52.7       63.4
## 66         NJ11       45.5      56.2       66.9
## 67         NM01       44.7      55.4       66.0
## 68         NM02       36.4      47.0       57.6
## 69         NY19       45.2      55.8       66.5
## 70         NY21       32.1      42.7       53.3
## 71         NY22       36.1      46.7       57.3
## 72         NY23       32.3      42.8       53.4
## 73         NY24       33.7      44.3       54.9
## 74         NY25       44.6      55.3       65.9
## 75         NY27       31.6      42.2       52.8
## 76         NC02       45.5      56.2       66.9
## 77         NC05       28.3      38.9       49.5
## 78         NC07       30.7      41.3       52.0
## 79         NC09       34.7      45.4       56.0
## 80         NC13       29.6      40.2       50.8
## 81         ND00       26.7      37.3       48.0
## 82         OH01       33.5      44.1       54.7
## 83         OH04       28.4      39.0       49.6
## 84         OH10       31.8      42.4       53.0
## 85         OH12       32.6      43.2       53.8
## 86         OH14       31.3      41.9       52.5
## 87         OH15       30.1      40.7       51.3
## 88         OH16       30.2      40.8       51.4
## 89         OK01       28.4      39.1       49.7
## 90         OK05       32.7      43.4       54.0
## 91         OR02       30.3      40.9       51.5
## 92         PA01       34.3      44.9       55.5
## 93         PA08       42.5      53.1       63.7
## 94         PA09       30.1      40.7       51.3
## 95         PA10       34.4      45.0       55.6
## 96         PA11       31.0      41.6       52.2
## 97         PA14       30.4      41.0       51.6
## 98         PA16       32.6      43.2       53.8
## 99         PA17       43.0      53.6       64.2
## 100        SC01       33.8      44.4       55.0
## 101        TN02       28.2      38.8       49.4
## 102        TN03       28.1      38.7       49.4
## 103        TX02       32.8      43.4       54.0
## 104        TX07       44.8      55.5       66.2
## 105        TX17       32.1      42.7       53.3
## 106        TX21       33.5      44.1       54.7
## 107        TX23       35.2      45.8       56.4
## 108        TX25       32.3      42.9       53.5
## 109        TX31       34.2      44.8       55.4
## 110        TX32       46.6      57.3       68.0
## 111        UT04       33.1      43.8       54.4
## 112        VA02       42.7      53.4       64.1
## 113        VA05       33.1      43.8       54.4
## 114        VA06       29.1      39.7       50.3
## 115        VA07       42.1      52.7       63.4
## 116        VA10       44.9      55.6       66.3
## 117        WA03       33.4      44.0       54.6
## 118        WA05       31.0      41.6       52.2
## 119        WA08       44.5      55.2       65.8
## 120        WV02       29.9      40.5       51.2
## 121        WV03       28.5      39.2       49.8
## 122        WI01       30.5      41.1       51.7
## 123        WI06       30.9      41.6       52.2
## 124        WI08       29.0      39.6       50.2