The complexity of predicting COVID-19 fatality rates

The complexity of predicting COVID-19 fatality rates

Mathematical models have existed for the longest time ever known and for many reasons. Key and most important, they have been used to predict the future. As impressive as it sounds, it’s not always easy to make a perfect model as the outcomes vary a lot from reality. Often, there are many factors involved which change and are controlled by many secondary variables.

Today I want to briefly explain how challenging it might be to create a model that predicts pandemic outcomes. This is not meant to discourage anyone; no single model is wrong. Any measure that will reduce the fatality rates by even 0.1% is worth trying. The number of people who die is not statistics. Each death leaves behind a story. A story that affects lives and leaves emotions rolling down hundreds of thousands of families.

Getting started

I want to believe; this brief can help you understand the logic between modeled numbers and an accurate picture we are seeing.

To start off, let us explore some of the factors that determine the fatality rates resulting from COVID 19? Like many other predictive exercises, coming up with smart predictors early enough could guarantee quick success. However, picking the right predictors requires lots of problem understanding and personal intelligence. Ideally, one would run a model with all possible predictors only to remove those that are not significant. When there are many predictors, we need some strategy for selecting the best predictors. The complexity of modeling COVID 19 lies in the unavailability of descriptive predictors.

Before we look at the predictors, what is the fatality rate of COVID 19? A simple question, is it! To get the fatality rate, divide the number of people who have died from the disease by the number of people infected with the disease. Let us explore some variables that may go into predicting fatality rates for COVID 19. You are right to say fatality rate is a factor of:

  • The daily rate of the number of people already infected
  • How many people could eventually become infected
  • The mode of spread and how quickly we can prevent the spread
  • Number of people the virus can kill
  • Availability of a cure or preventive medicine
  • Availability of funds to procure PPES and conduct mass testing
  • How quickly the authorities can identify, track, and trace possible contacts of an infected person.

This may sound straight forward to many; however, each variable mentioned above could be or depends on several secondary variables. As we usually say, lack of data is data.

The inaccuracy of the dependent variable

Fatality rates is our dependent variable which is a challenge to get. We do not have a single confirmed fatality rate for COVID 19. Countries have their own calibrations. The fatality rate varies with age, demography, race, location, etc. As quoted above, to get the fatality rate, the number of infected people if a factor. Unfortunately, it is not known how many people have been infected. The best way to determine this number is by testing the 50M people in a country such as Kenya. This is unfeasible given the challenges experienced in testing.

To get an accurate tally, it might be important to emulate one of the cruise ships that got quarantined after a COVID-19 outbreak. Nearly everyone on board was tested. The close confines help the virus to spread, but closed environments are also an ideal place to study how the new coronavirus behaves. Unfortunately, the world isn’t a confined ship. When COVID-19 was detected among passengers on the cruise ship Diamond Princess, the vessel offered a rare opportunity to understand features of the new coronavirus that are hard to investigate in the wider population. Some of the first studies from the ship — where some 700 people were infected — have revealed how easily the virus spreads, provided estimates of the disease’s severity, and allowed researchers to investigate the share of infections with no symptoms. The results of this unusual setup suggest that there are many people walking around with COVID-19 who don’t know it — and, consequently, that the death rates are lower than other data has suggested. Lack of accuracy in calculating the predictor breaks the York. Have you ever boiled an egg with a cracked York?  Well, if you are hungry, it doesn’t matter.

Messiness of data

At the moment, there have been rumors and blame games over cases of under-reporting or misrepresentation of the impact of COVID 19 in China. This is a typical example of how messy the data situation is Global. When data is extracted from disparate databases, the inevitable result is data inconsistencies, and nobody trusts the numbers. A lack of centralized global process, data management, and inadequate data strategies towards combating COVID 19 has contributed widely towards inaccurate data. Countries and regions collect data in different ways. There’s no single spreadsheet everyone is filling out that can easily allow us to compare cases and deaths around the world (fivethirtyeight).

The “nature” of things

The challenge of foreseeing the future of such a pandemic guided by data has many inconsistencies. Apart from being structural in nature, for some countries, testing is stratified. Kenya is testing people in isolation camps as well as targeting areas with a high potential of community infections, while other countries are testing everyone.

The virus itself discriminates in nature. Africa whose majority population consists of the youth has recorded high rates of infection among people between ages 10 and 60 yet in other western countries, COVID 19 has killed the aged.

There are many tracked and untracked factors that affect the fatality rates because of a pandemic some of which are:

  • Hospital Capacity - Ability to prevent death once someone is grievously ill which depends on hospital capacity
  • Infection rate - This depends on the willingness of the population to wash hands, maintain social distance, and report suspected cases.
  • Rate of contact - how many people an infected person interacts within a given time period
  • Rate of transmission per contact
  • Symptomaticity ratio
  • How long the virus can survive on a surface
  • How far it can be flung through the air
  • Duration of infectiousness

Our team in Nairobi is working hard to create a centralized database of the COVID pandemic and hopefully, in due course, we shall make it public.