Data-Driven Dilemmas posed by COVID-19

This commentary was submitted on April 21, 2020 as a proposed Op-Ed for the NYT, by Prof. Alyssa A. Goodman, Harvard University. It represents the personal views of the author, not an official position of Harvard University.

I teach “Prediction.” At Harvard. But I cannot predict the outcome of the current pandemic. I am equipped, as a scientist, to understand, evaluate, and potentially act upon, the infection and death statistics we all now read every day. But as a person, I can also act out of fear. The constant dialogue in my mind between my rational self and my emotional self helps me appreciate the dilemma facing our leaders now, as they try quite literally, to save the world.

I am trained as a physicist and astronomer. I specialize in data science, and data visualization, and I teach some epidemiology in my Prediction class. While this background does not qualify me to expertly advise leaders on COVID-19 strategy, it does put me at 1 or 2 degrees of separation from many experts quoted in the press every day. And, it’s very clear to me from this privileged vantage point that even true experts’ predictions do not agree. Traditional mathematical models of epidemics use the now-infamous “R_0” reproduction number, lethality rates, understanding of infection mechanisms, analysis of co-morbidities, and other medical measures to estimate outcomes. Bold data-science approaches eschew understanding of infectious disease, and base predictions purely on “training data” that amounts to information about what has actually happened in countries farther along in their epidemic curves than others.

Both groups—epidemiologists using infectious disease expertise to model a pandemic’s course and data scientists making predictions using algorithms trained only on real-world actions and outcomes, suffer at this point from a severe lack of reliable data to input to their forecasts. In the understand-to-predict disease spread approach, uncertainty is reduced as more is known about mechanisms of infection and recovery, about true numbers of people susceptible and immune to the disease, and about the properties of the virus and of the people upon whom it has a range of effects. In the least medically-oriented of the data-science approaches, what’s needed is a wide variety of circumstances (e.g. ranges of policies on social distancing, travel restrictions, population density, population demographics), measured over long-enough time spans, to let algorithms base forecasts on what happened elsewhere in the past. We simply do not have enough data at this point for either of these approaches to work with high precision, but either is good enough to forecast extremes.

Physicists are taught to always consider limiting cases. At one limit, if no country had done anything to slow the epidemic’s spread when it began, humanity, after millions of COVID-19-related deaths worldwide, we would have acquired “herd immunity” quite quickly, in a matter of months. The disease would have caused tens of millions of deaths—potentially as much as 1% of the population. At the other limit, we could prevent all deaths in the short-run with total lockdowns, but the social fabric of humanity would tear. While either is unrealistic now, it’s important to look at both rational (data-driven) and irrational (emotional) responses to these extreme options.

Humans are well-known, in the words of behavioral economists, who admit irrationality into their predictions, to “discount the future” over the present. So, faced with a choice of anyone dying now (herd-immunity) versus no one dying now (full lockdown), we’ll always, emotionally, choose the lockdown.

But, if we could look rationally—without our “irrational” human biases—at statistical likelihoods in the longer-term future, the most life-saving choice might be different. Studies show that unemployment raises mortality rates, ten years later, by 85% for men and 50% for women. More than 20 million Americans have filed for unemployment since the start of the current pandemic. The odds of dying—from anything—are 40% higher if you don’t have insurance. Before the pandemic, about 30 million Americans were already uninsured. Domestic abuse kills 30,000 women a year in the US: the current lockdowns could easily double that. Suicides in the US, per year, number about 40,000, without the stress of a pandemic. Depriving students of good education for significant periods of time will ultimately reduce their earning power, and thus their access to good health care, further increasing mortality rates—without even considering the collateral health detriment of depriving roughly 50% of American children of a free school lunch for months on end.

As of April 20, a total of 40,000 people had died in the US from COVID-19. On average, about 8000 Americans die on a typical day. So, without COVID-19, roughly 400,000 Americans would have died in 50 days since March 1, 2020. Thus, COVID-19 has raised the US death rate this Spring by about 10%, which is surely terrible. But—if one estimates the long-term deaths caused by the sum of unemployment, domestic abuse, depression, and lack of education associated with a very long stay-at-home policy, the numbers grow more staggering by the day.

But, humans— even data scientists—are not cold, calculating, rational creatures, and our choices are not so simple as the limits offered here. As many recent articles and studies have explained, we should not just haphazardly “end the shutdowns” now, for both rational and irrational reasons. On the rational side, we need to give our medical researchers and hospitals the time they need to prepare, so that they can offer patients the best treatment once we decide to return to some sense of normal everyday life. On the irrational side, we have accidentally created a world of fearful germaphobes—and we need to re-set expectations.

I attended a colleague’s talk on medical anthropology at Harvard 20 years ago where I learned that to be well-adjusted psychologically, we need an “appropriate level of denial,” when it comes to the dangers of everyday life. Before the pandemic, I traveled far and wide, and I would visit my 85-year-old parents without worrying about giving them any contagion I was surely carrying. Now, though, even as I type the calculations I offer here, I wonder how I will visit my parents in the coming weeks and months and NOT spend all of my time worrying about infecting them, asymptomatic as I may be. We need to learn, again, to balance risk and reward.