Research Questions

How do the global demographics influence the spread and death rate of the COVID-19 virus?​
How does the expected death rate compare to the actual death rate of individual countries?

Background

In December, 2019, an outbreak of an initially unknown disease causing pneumonia was detected in Wuhan (Hubei, China). It did not take long, until the virus was identified as a novel coronavirus, taking its name COVID-19 from it (Dong & Hongru, 2020). From Wuhan the virus took its way around the world, causing an international health emergency. Between 2002 and 2003, a Coronavirus (CoV) causing atypical pneumonia first came into spotlight, and was named severe acute respiratory syndrome (SARS). This virus, first detected in Guangdong Province, spreaded to Hong Kong and from there (due to international travel) to 26 countries, infecting more than 8000 people having a fatality rate of 10%. Another well-known coronavirus is Middle East respiratory syndrome coronavirus (MERS-CoV), having even a higher fatality rate of almost 35% (Sun, et al., 2020). As a matter of fact, we know that the mortality rate of COVID-19 differs drastically between different age groups and changes constantly over place and time. An average mortality rate for different age groups can be found in Table 1:

Table 1: Average Mortality Rate for different age Groups (Roser, Ritchie, Ortiz-Ospina & Hasell, 2020)
Age Group Mortality Rate [%]
0-14 0
15-24 0.02
25-54 0.37
55-64 1.75
65+ 8

Goal

In our research project, we want to answer the question how the demographics of individual countries influence the spread and death rate of the COVID-19 Virus. We also want to find out how the expected death rate and the actual death rate of individual countries are related. To answer those questions, we will use the age structure and gender distribution for individual countries, as well as information about COVID-19 cases. The main part of our work is a global map, showing demographics and calculated indices in the background as a choropleth map, overlayed with circles indicating confirmed, recovered and deaths caused by COVID-19. To support the map, we have additional graphics below the map, showing age pyramids of the twenty countries most affected by the virus. Furthermore, there is a line and bar graph showing the development of confirmed, recovered and deaths caused by COVID-19 for those countries as well as other metrics.

Methods

Mortality index

To find out how the expected and actual mortality rate are related, we calculated an index comparing them. To retrieve the expected mortality rate, we assume that the confirmed COVID-19 cases are equally spread over the different age groups and then use the mortality rates fromTable 1 to find out how many people in each of the age groups are expected to die with a certain number of infections. The “actual” death rate is calculated by directly looking at the deaths per age group (also assuming an even spread across the population). The difference, which is also normalized over the population, between these two values thus gives us a number of “unexpected/unexplained” deaths, since these do not align with the values we would expect based on the confirmed cases. This can serve as an indicator for an underreporting of cases or a bad handling of the situation leading to more deaths than expected based on averages.

Cases per Million

To compare the COVID-19 cases in each country, we need to normalize it over the world. To do so, we took the number of confirmed cases and divided it by the population of each country. Therefore, we got the number of cases per inhabitant, which is a fairly small number and so we multiplied it with one million to get the number of confirmed cases per one million inhabitants in a country.

Age Pyramids

Age Pyramids are used to visualize the age and gender distribution of countries. There are several different shapes of those pyramids visualized in figure X. After Kuls and Kemper (2000), there are 5 different shapes:

Isosceles triangle: Long periods of high birth and death rates

Pyramid with wide base and curved sides: Child mortality rate begins to fall, birth rate remains high

Beehive-Shape: Countries with low birth and death rates

Bell-Shape: After a long period of relatively low birth and death rates, fertility is increasing again

Urn-Shape: Birth rate is steadily decreasing

Special
Case

Asymmetric: Due to labor migration, male and females are unequally distributed

DATA VISUALIZATION
FINDINGS

Spread

Our data set of COVID-19 cases starts at the 21. of January 2020. As you can see in the map the COVID-19 from this day on, the virus was infecting more and more people in China, especially in the city of Wuhan. At this time the spread was very locally and only Thailand, Korea and Taiwan had some confirmed cases. From this time on the cases were raising exponentially in China, while in the surrounding countries the cases only raised slowly.
On the 31st of January the first two cases are reported in Italy, which is the origin of the spread in Europe. Until the end of February the increase of confirmed cases was just very slow but because of the exponential growth, with every infection step more people got sick. Interestingly, Italy and surrounding countries of China have a similar increase of confirmed cases. We can also see, that the number of dead and recovered people lags around two to three weeks because of the infection period. During March, China had by far the most cases, whereas it was closely followed by Italy, Iran and Korea. All other countries do not have reported any cases or just a few until then. It is interesting to see, that during the same time the number of recovered people in Iran is much bigger than in Italy, wherefore in Italy it was not until the end of March, that people started to recover. From that point we got a lot of infections in Italy and COVID-19 and spread real quick in whole Europe and finally also got to the United States. At the time where the big wave of infections started in Europe and in the US, China has had almost no more new confirmed cases and most of them already were recovering. Further, the surrounding countries of China always had a small increase of confirmed cases. Comparing the spread of the virus in Europe and the US with the spread of most Asian countries, major discrepancies can be detected. While the spread in Europe and the US was fast and fatal, the spread in surrounding countries of China was almost neglectable. This could be explained by the fact, that Europe and the US have a good health system, and more tests are available and cases are reported more reliable to the public.

Million Cases

Until the 13th of March the confirmed cases per million inhabitants stay between 0-319 cases. Because Iceland has a very small population and people got infected early, it is the country, which has the highest number of cases per million people followed by Italy in March. Many western countries have because of the high number of infections and the small number of the population a very high number of cases per million inhabitants. Iran is the first country, which follows the trend of the European countries but interestingly China has still a very small number. The number rises in North America and then also the cases per million in South America rises, whereas the number in the whole asian and african region is very small because of the amount of people living there. In the end of the measuring period, we can see that Europe and North America have a very high number of cases per million, where else in the world the number stays under 319 cases per million inhabitants and even in China with over 80’000 cases stays in the lowest category.

Mortality Index

Looking at the choropleth map displaying the Mortality index as well as on the plot showing the top 20 countries with the highest mortality index, we can see that the index is especially high in Spain, Belgium, Italy, the United Kingdom and France. The Index seems to be generally high in european countries and in North America. Countries such as Iceland, Russia, Australia, most countries in Africa and Asia have a low Mortality index. By looking at the choropleth map, displaying the Median age of the countries we can see that Spain, Portugal, Italy, Germany, Greece and Japan have the highest median age with over 44 years. In general, the median age is high with values between 40 and 44 in Europe and Canada. In Russia, China, Australia, the United States and Iceland, the median age is between 35 and 40. The lowest life expectancy can be found in Africa with a median age of 30 or lower. The life expectancy seems to be distributed likewise.
It is interesting to see that at least some of the countries with a high mortality index, also have a median age and life expectancy that is comparably high. This is the case for countries such as Belgium, Spain and Italy. Those three countries both have an age pyramid shaped like an urn, indicating that the birth rates are decreasing, while the mortality rate remains low until old age. This results in a high number of people over 65, for which the virus leads to death more frequently than for younger people. This could be a leading factor when it comes to the Mortality Index, as the index stays very low in the younger part of the population and increases with age.
There were also a lot of countries where the mortality index, the life expectancy and median age is low as e.g. in Africa. As COVID-19 is especially deadly for elderly people, we think that this could influence the death rates, wherefore they might be higher in countries with an older population. Still it is important to keep in mind that the ratio between deaths and confirmed cases is dependent on the amount of tests that are being conducted. To give an example, the mortality index in Iceland is very low, while the amount of cases per million is comparably high due to the extensive testing in Iceland (Hermann, 2020). Despite the median age being in the middle range with 37 and a life expectancy of 83 suggesting a generally higher vulnerability to the virus, there are hardly any unexplained deaths since almost every death has been accounted for by the number of identified cases. This case would thus support the hypothesis that the index can help identify which countries have underreported cases based on the death statistic (the higher the index, the more underreported cases). It is worth noting that an underreporting does not necessarily signify an act of actively manipulating data but more often a limitation of infrastructure and resources.


Discussion

To come back to our research question if demographics of individual countries influence the spread of COVID-19 as well as the deaths caused by the virus, we can say that there could be an influence of the age distribution on the amount of deaths. This statement is based on the fact that the mortality rate varies for different age groups, getting higher with higher age. Therefore, it is to be expected that in countries with a high life expectancy and more elderly people, the death rate is higher than in countries with a low median age. To make clear statements if the demographics of individual countries have an influence on the spread and death rate of COVID-19, it would be important to have specific information about infected persons. Our research is based on demographic information of a whole country and assumptions that in each age group the same amount of people is infected. The data is further biased by the fact that the total amount of tests that is being conducted, directly influences the amount of confirmed cases. Therefore, in countries such as Iceland where a lot of tests are conducted, the death rate compared to confirmed cases is low.
Regarding the research question about the index, we can conclude that countries such as Italy or Spain, which seemed to struggle quite heavily during the pandemic, have a lot more unexplained deaths (higher index) than other countries such as Germany or Austria. This could suggest that Italy and others (1) underreported cases, leading to a higher than average ratio of deaths to confirmed cases or (2) dealt with the pandemic worse leading to more deaths than in a country that may not have been as overwhelmed. A third interesting point is looking at countries such as Sweden which went for a herd immunity approach. Here, we can also observe a higher index than in countries which took more precautions. Iceland, which tested much of its population displays a very low index, inversely supporting the link between underreporting cases and a high index. When checking the index for the different age groups, it becomes obvious that many of these unaccounted for deaths occur in the older half of the population.

REFERENCES AND DATA SOURCES

Description of Data used for Visualization

COVID-19 Data: To display the confirmed cases of COVID-19, as well as the amount of recovered people and deaths, we took data from Johnson Hopkins School of Public Health. We worked with data containing information from the 22.01.20-06.05.20 (Johns Hopkins School of Public Health, 2020)
Median Age: In order to create a choropleth map showing the median age distribution of individual countries, we used data from our world in data for the year 2020 which are estimates based on historical data that lasts until 2015. The data was published by the UN Population division (UN Population Division, 2017).
Population distribution by Age: To calculate an index which relates COVID-19 infections and age structures of individual countries, we took data from the world factbook, providing 5 different age groups (The World Factbook, 2020).
Population Distribution by Age and by Gender: To create population pyramids we used a data set containing population numbers spitted by age increments and divided by gender. We Took the data from Population Pyramids that provides separate data for individual countries (Population Pyramid, 2019).


References

Dong, E., & Hongru, D. (2020). An interactive web-based dashboard to track COVID-19 in real time. Baltimore: Department of Civil and System Engineering, Johns Hopkins Univeristy.
Sun, J., He, W.-T., Wang, L., Lai, A., Ji, X., Zhai, X., . . . Su, S. (2020). COVID-19: Epidemiology, Evolution, and Cross-Disciplinary Perspectives. Trends in Molecular Medicine, 483-495.
Kuls, W., & Kemper, F.-J. (2000). Bevörkerungsgeographie. Eine. Stuttgard.
Hermann, R. (2020, 03 23). Island setzt auf grossflächiges Testen. From Neue Zürcher Zeitung: https://www.nzz.ch/international/island-setzt-auf-grossflaechiges-testen-ld.1547623

Data Sources

Johns Hopkins School of Public Health. (2020, May 07). Novel Coronavirus (COVID-19) Cases Data. From OCHA Services: https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases
Population Pyramid. (2019). Population Pyramids of the World from 1950-2100. From PooulationPyramid.net: https://www.populationpyramid.net/africa/2019/
The World Factbook. (2020). The World Factbook: Age Structure. From Central Intelligence Agency: https://www.cia.gov/library/publications/the-world-factbook/fields/341.html
UN Population Division. (2017). Age Structure. From Our World in Data: https://ourworldindata.org/age-structure
Roser, M., Ritchie, H., Ortiz-Ospina, E., & Hasell, J. (2020, 05 19). Mortality Risk of COVID-19. From Our World in Data: https://ourworldindata.org/mortality-risk-covid

Images and Graphics

Mail Icons made by Freepik and Vectors Market from www.flaticon.com.
All other graphics and images from www.pixabay.com (no attribution required under Pixabay License).
Contact images provided by authors.
Website Layout inspired by www.w3schools.com.