An updated analysis of national data remains consistent with the hypothesis that Covid-19 infection rates are higher in larger households .
In a post in May 2020 I presented the results of a regression analysis on data from 14 Western European countries tending to support the following hypothesis:
Rates of Covid-19 infection will be higher, other things being equal, in larger households, that is, households with more occupants.
With Western Europe now well into a second wave of Covid-19 infection, it is timely to assess whether an updated analysis continues to support the hypothesis.
A reminder of some key features of the analysis:
- Rates of death from Covid-19 are used as a proxy for rates of infection, actual rates of infection being difficult to measure. Published statistics on confirmed infections are heavily dependent on differences in testing arrangements at different times and between countries.
- Data used are at national level, with no allowance for variations within countries.
- Estimation of the regression is by weighted least squares, with weighting by population.
The regression model is:
DP = C + (B x PH) + E
where: DP is cumulative death rate from Covid-19 per million population; C is the regression constant; B is the slope coefficient; PH is average population per household; and E is the error term.
The estimated regression line based on data to 3 December 2020 was:
DP = -2,004 + (1,209 x PH)
The precise values of the estimated coefficients are not important. Nor is it surprising that the slope coefficient is higher than estimated in May: this is to be expected since cumulative death rates have increased while average population per household is stable. The important point is that, as in May, the estimated slope coefficient is positive, consistently with the hypothesis (and is sufficiently large that the null hypothesis that its true value is zero or less is rejected at the 5% significance level (1)).
A spreadsheet containing the underlying data and full regression output may be downloaded here:
- This can be inferred from the fact that the 95% confidence limits of the estimated slope coefficient are both positive.