Here’s a published study from the CDC using serologic test results to estimate how many people had been infected by the coronavirus. Tests were conducted on convenience samples at outpatient doctor visits during one- to two-week intervals for each of ten locations between late March and early May.
Four of the ten samples were statewide: Louisiana, Missouri, Utah, and Connecticut. How do the CDC estimates compare with my lagging indicator method of estimating prevalence? I took the state-specific death totals as of one week after the end of the CDC’s data collection period, divided total deaths by the estimated mortality rate of .006, then divided that number by the state’s population. Here’s the comparisons of covid prevalence:
- Louisiana: CDC estimate = 5.8% (range 3.9-8.2); my estimate = 3.9%
- Missouri: CDC estimate = 2.7% (range 1.7-3.9); my estimate = 0.9%
- Connecticut: CDC estimate = 4.9% (range 3.6-6.5); my estimate = 14.0%
- Utah: CDC estimate = 2.2% (range 1.2-3.4); my estimate = 0.4%
Not very close. On the other hand, both the CDC and I estimate that the prevalence in the population is far higher than the diagnosis case counts, but still a long way from herd immunity. So we’re in the same fairly large ballpark.
The CDC report includes results for a sample collected in New York City between March 23 and April 1. The estimated infection prevalence was 6.9% (range 5.0-8.9). In a separate study, New York conducted a random sample in NYC from April 19-28 — 4 weeks after the CDC convenience sample — and estimated an infection prevalence of 22.7%. That’s three times the CDC estimate. Could the 4-week difference in data collection timing account for such a wide difference? Yes.
My algorithm, derived from death rates, was based largely on the New York serology survey, so my prevalence estimate is pretty close to the New York numbers. Why then do my numbers deviate so far on the other states? Were there wide differences in convenience sampling methods by state? That’s happened before on other local serology studies. There’s a lot of variability in the accuracy of covid antibody tests: was the serology test used in the CDC study prone to false positives, thus overestimating prevalence? Probably not: per the CDC, their ELISA serology test has a specificity of 99%, which is pretty darned good.
My estimate for Connecticut is quite a bit higher than the CDC’s, while my estimates for Missouri and Utah are lower: why? Mortality is highly age-dependent: did Connecticut have a higher age-adjusted mortality rate than Missouri or Utah? Utah’s median age is 10 years younger than that of Connecticut, which would result in a wide discrepancy between those two states. But the median age for Missouri is only 2 years younger than Connecticut: that would close some but not all of the gap. Connecticut was hit harder and earlier than either Missouri or Utah. Did lack of clinical experience in treating the disease effectively, coupled with undercapacity for handling a huge surge of patients, result in a higher death rate in Connecticut? Probably.
Again, what’s needed is a random sample of immunity across the national population, repeated often, with results reported a week or so after data collection. This CDC report of local convenience samples just got released, reporting results that are three months old. Not very helpful in knowing the current state of the epidemic nationwide, or in evaluating current trends and interventions.