Institute for Health Metrics and Evaluation

About the data

The Global Burden of Disease (GBD) study, conducted by the Institute for Health Metrics and Evaluation (IHME), evaluates mortality and disability caused by 371 diseases and injuries, along with 88 risk factors across the globe. IHME meticulously collects and analyzes the data, offering insights into trends over time, the effectiveness of interventions, and disparities in the impact of malaria among different populations. The IHME provides extensive malaria data that includes prevalence, incidence, mortality rates, intervention coverage, and the overall burden of disease in 204 countries and territories, with subnational estimates for 21 regions. This comprehensive dataset is a valuable resource for policymakers, researchers, and public health professionals in developing strategies to reduce the global malaria burden and monitor progress toward elimination goals.

Accessing the data

The GBD Results Tool enables you to download estimates from the study, along with relevant materials such as survey instruments, protocols, and summary information, are accessible on the IHME Data Page through the Global Health Data Exchange (GHDx). You can search for datasets directly or use the GDB Results Tool for more detailed queries. IHME also offers API access to interact with large datasets programmatically.

Search terms such as "IHME malaria deaths estimates", "malaria prevalence", "malaria incidence", or "bednet usage" can be used to find datasets compiled by IHME related to malaria on the GHDx homepage.

Alternatively, you can navigate to the Data Tools tab on the homepage. However, you will need to create an account or sign in to access and download data. If you do not have an account, click on sign in at the top of the Results tab. If you already have an account, click sign in and enter your credentials, then select GBD results tool under IHME Data Visualization Tools. In the results tool interface, select the choose the disease metrics and apply filters to narrow the search by location (country/region), year, age group, and sex.

IHME GBD Results Visualization Tool Page.

You can also visualize data as graphs or tables before downloading. You can right-click on the title of files within the GHDx and download them. Datasets can be downloaded as .csv and .xls files from the GHDx site.

What does the data look like?

The data can be downloaded from the GHDx IHME ITN Data Page as a .xls file. This dataset provides estimates of bed net ownership, ownership among populations at risk, bed net use in children under five in at-risk populations, and trends in the scale-up of LLIN in 44 African countries during the 1999 to 2008 period.

IHME estimates of insecticide treated bednets distributed.

Datasets dowloaded as .csv files from the GDB Results Tool have a standard format which contains mortality, prevalence and incidence in number, percent and rate by location, sex, age, cause of injury, annual value as well as the upper and lower bounds.

IHME estimates of malaria related deaths, prevalence and incidence.

Key points to consider

IHME (Institute for Health Metrics and Evaluation) provides detailed data on health metrics, including estimates of disease burden such as incidence, mortality, and risk factors for malaria, which can inform transmission models. When integrating IHME data estimates into malaria transmission models, several key aspects should be considered regarding data collection, storage, and its usage to ensure accurate and effective modeling.

IHME uses statistical models and expert review to produce comprehensive Global Burden of Disease (GBD) estimates of malaria burden by country, age, and sex.
Data is updated annually or periodically depending on the source (e.g., monthly surveillance data or annual surveys).
These data might not always have the same resolution as the model. For instance, while a model may require fine-grained data at the village level or daily intervals, IHME may provide data aggregated at the national level or annually.
IHME estimates often come with uncertainty intervals due to limitations in the data sources. This uncertainty should be accounted for in model simulations, especially for stochastic models or sensitivity analyses.

How to use this data?

These datasets can be explored to gain insights into the ownership of ITNs, distribution of LLINs, and assess the impact of malaria in Elimination 8 (E8) countries during the 2000 to 2008 period.

Inspect the data beforehand to understand how it is structured and identify indicators that are relevant to our analysis, we can display our data in a table as follows:

Show the code

library(readxl)
library(tidyverse)

# Load insecticide treated nets data
ihme_nets <- read_excel(path = "data/IHME_INSECTICIDE_TREATED_BEDNETS_SUB_SAHARAN_AFRICA_1999_2008.xls")

# Pull the names of countries
ihme_nets |>  
  pull(Country) |>  
  unique()

 [1] "Angola"                   "Benin"                   
 [3] "Botswana"                 "Burkina Faso"            
 [5] "Burundi"                  "Cameroon"                
 [7] "Central African Republic" "Chad"                    
 [9] "Comoros"                  "Congo"                   
[11] "Cote d'Ivoire"            "Dem. Rep. of Congo"      
[13] "Djibouti"                 "Equatorial Guinea"       
[15] "Eritrea"                  "Ethiopia"                
[17] "Gabon"                    "Ghana"                   
[19] "Guinea"                   "Guinea-Bissau"           
[21] "Kenya"                    "Liberia"                 
[23] "Madagascar"               "Malawi"                  
[25] "Mali"                     "Mauritania"              
[27] "Mozambique"               "Namibia"                 
[29] "Niger"                    "Nigeria"                 
[31] "Rwanda"                   "SaoTome & Principe"      
[33] "Senegal"                  "Sierra Leone"            
[35] "Somalia"                  "South Africa"            
[37] "Sudan"                    "Swaziland"               
[39] "Tanzania"                 "The Gambia"              
[41] "Togo"                     "Uganda"                  
[43] "Zambia"                   "Zimbabwe"

The E8 countries are Angola, Botswana, Eswatini, Mozambique, Namibia, South Africa, Zambia and Zimbabwe. To filter the data for these countries, you can use the following code to display our data in a table as follows:

Show the code

library(gt)

# define the E8 countries
e8_countries <- c("Angola", "Botswana","Eswatini", "Mozambique", "Namibia", "South Africa", "Zambia",  "Zimbabwe")

ihme_nets |>  
  mutate(Country = ifelse(Country == "Swaziland", "Eswatini", Country)) |>  
  filter(Country %in% e8_countries) |>  
  head() |>  
  gt()

Country	ISO	Year	% ITN Ownership	% ITN Ownership Lower Bound	% ITN Ownership Upper Bound	% ITN Ownership Pop. At Risk	% ITN Ownership Pop. At Risk Lower Bound	% ITN Ownership Pop. At Risk Upper Bound	% ITN Use U5	% ITN Use U5 Lower Bound	%ITN Use U5 Upper Bound	% ITN Use U5 Pop. At Risk	% ITN Use U5 Pop. At Risk Lower Bound	%ITN Use U5 Pop. At Risk Upper Bound	LLINs Distributed Per Capita	LLINs Distributed Per Capita Lower Bound	LLINs Distributed Per Capita Upper Bound
Angola	AGO	1999	4.205	1.070	8.260	4.205	1.070	8.260	1.823343	0.2811190	10.45575	1.823343	0.2811190	10.45575	0.0008978	0.0006312	0.0011968
Angola	AGO	2000	5.470	2.615	8.670	5.470	2.615	8.670	2.560483	0.5011920	11.92120	2.560483	0.5011920	11.92120	0.0009440	0.0006260	0.0012728
Angola	AGO	2001	6.960	4.295	9.735	6.960	4.295	9.735	3.370769	0.7293910	14.14694	3.370770	0.7293910	14.14694	0.0009652	0.0006216	0.0013144
Angola	AGO	2002	6.795	3.620	10.220	6.795	3.620	10.220	3.264936	0.6668079	14.27508	3.264936	0.6668079	14.27508	0.0009839	0.0006222	0.0013374
Angola	AGO	2003	5.930	2.405	10.160	5.930	2.405	10.160	2.777820	0.5249189	13.33062	2.777820	0.5249189	13.33062	0.0010066	0.0006548	0.0013854
Angola	AGO	2004	5.670	2.540	10.095	5.670	2.540	10.095	2.642282	0.5099266	12.98258	2.642282	0.5099266	12.98258	0.0078725	0.0027322	0.0115660

Deaths, prevalence and incidence estimates

This data set contains number/percents/rates of malaria deaths, prevalence and incidence in the following countries:

Show the code

observed_cases <- read_csv("data/IHME-GBD_2021_DATA-05f6b66f-1.csv", show_col_types = FALSE) 

observed_cases |>  
  pull(location_name) |>  
  unique()

[1] "Republic of Angola"       "Republic of South Africa"
[3] "Kingdom of Eswatini"      "Republic of Zambia"      
[5] "Republic of Zimbabwe"     "Republic of Mozambique"  
[7] "Republic of Botswana"     "Republic of Namibia"

We can look at the data to identify indicators to use for analysis:

Show the code

library(gt)

observed_cases |>  
  filter(location_name == "Republic of South Africa") |> 
  head() |>  
  gt()

measure_id	measure_name	location_id	location_name	sex_id	sex_name	age_id	age_name	cause_id	cause_name	metric_id	metric_name	year	val	upper	lower
1	Deaths	196	Republic of South Africa	3	Both	22	All ages	345	Malaria	1	Number	2000	2.147995e+01	4.741927e+01	7.290654e+00
1	Deaths	196	Republic of South Africa	3	Both	22	All ages	345	Malaria	2	Percent	2000	4.000841e-05	8.857476e-05	1.354524e-05
1	Deaths	196	Republic of South Africa	3	Both	22	All ages	345	Malaria	3	Rate	2000	4.731551e-02	1.044540e-01	1.605968e-02
1	Deaths	196	Republic of South Africa	3	Both	22	All ages	345	Malaria	1	Number	2001	3.244936e+02	7.395696e+02	1.102811e+02
1	Deaths	196	Republic of South Africa	3	Both	22	All ages	345	Malaria	2	Percent	2001	5.616515e-04	1.284622e-03	1.917376e-04
1	Deaths	196	Republic of South Africa	3	Both	22	All ages	345	Malaria	3	Rate	2001	7.038036e-01	1.604074e+00	2.391920e-01

How to plot this data?

To access the data used in the next two plots:

Navigate to the ‘Sub-Saharan Africa Insecticide-Treated Bed Nets 1999-2008’ page on the IHME website.
Select the tab labelled ‘Files (1)’.
Click on the link names ‘Insecticide-treated bed nets (Sub-Saharan Africa) 1999-2008’ to download the file.

Show the code

# Load insecticide treated nets data
ihme_nets <- read_excel(path = "data/IHME_INSECTICIDE_TREATED_BEDNETS_SUB_SAHARAN_AFRICA_1999_2008.xls")

# Pull the names of countries
ihme_nets |> 
  pull(Country) |> 
  unique()

 [1] "Angola"                   "Benin"                   
 [3] "Botswana"                 "Burkina Faso"            
 [5] "Burundi"                  "Cameroon"                
 [7] "Central African Republic" "Chad"                    
 [9] "Comoros"                  "Congo"                   
[11] "Cote d'Ivoire"            "Dem. Rep. of Congo"      
[13] "Djibouti"                 "Equatorial Guinea"       
[15] "Eritrea"                  "Ethiopia"                
[17] "Gabon"                    "Ghana"                   
[19] "Guinea"                   "Guinea-Bissau"           
[21] "Kenya"                    "Liberia"                 
[23] "Madagascar"               "Malawi"                  
[25] "Mali"                     "Mauritania"              
[27] "Mozambique"               "Namibia"                 
[29] "Niger"                    "Nigeria"                 
[31] "Rwanda"                   "SaoTome & Principe"      
[33] "Senegal"                  "Sierra Leone"            
[35] "Somalia"                  "South Africa"            
[37] "Sudan"                    "Swaziland"               
[39] "Tanzania"                 "The Gambia"              
[41] "Togo"                     "Uganda"                  
[43] "Zambia"                   "Zimbabwe"

Show the code

library(gt)

# define the E8 countries
e8_countries <- c("Angola", "Botswana","Eswatini", "Mozambique", "Namibia", "South Africa", "Zambia",  "Zimbabwe")

ihme_nets |> 
  mutate(Country = ifelse(Country == "Swaziland", "Eswatini", Country)) |> 
  filter(Country %in% e8_countries) |> 
  head() |> 
  gt()

Country	ISO	Year	% ITN Ownership	% ITN Ownership Lower Bound	% ITN Ownership Upper Bound	% ITN Ownership Pop. At Risk	% ITN Ownership Pop. At Risk Lower Bound	% ITN Ownership Pop. At Risk Upper Bound	% ITN Use U5	% ITN Use U5 Lower Bound	%ITN Use U5 Upper Bound	% ITN Use U5 Pop. At Risk	% ITN Use U5 Pop. At Risk Lower Bound	%ITN Use U5 Pop. At Risk Upper Bound	LLINs Distributed Per Capita	LLINs Distributed Per Capita Lower Bound	LLINs Distributed Per Capita Upper Bound
Angola	AGO	1999	4.205	1.070	8.260	4.205	1.070	8.260	1.823343	0.2811190	10.45575	1.823343	0.2811190	10.45575	0.0008978	0.0006312	0.0011968
Angola	AGO	2000	5.470	2.615	8.670	5.470	2.615	8.670	2.560483	0.5011920	11.92120	2.560483	0.5011920	11.92120	0.0009440	0.0006260	0.0012728
Angola	AGO	2001	6.960	4.295	9.735	6.960	4.295	9.735	3.370769	0.7293910	14.14694	3.370770	0.7293910	14.14694	0.0009652	0.0006216	0.0013144
Angola	AGO	2002	6.795	3.620	10.220	6.795	3.620	10.220	3.264936	0.6668079	14.27508	3.264936	0.6668079	14.27508	0.0009839	0.0006222	0.0013374
Angola	AGO	2003	5.930	2.405	10.160	5.930	2.405	10.160	2.777820	0.5249189	13.33062	2.777820	0.5249189	13.33062	0.0010066	0.0006548	0.0013854
Angola	AGO	2004	5.670	2.540	10.095	5.670	2.540	10.095	2.642282	0.5099266	12.98258	2.642282	0.5099266	12.98258	0.0078725	0.0027322	0.0115660

As these data are estimated and lower and upper bounds are provided, a line plot is used to show the change in estimated ITN ownership over time, with ribbons representing the confidence intervals.

Show the code

# Modify the Country names to replace "Swaziland" with "Eswatini"
ihme_nets |> 
  mutate(Country = ifelse(Country == "Swaziland", "Eswatini", Country)) |> 
  # Filter the data for Frontline 4 countries
  filter(Country %in% c("Namibia", "South Africa", "Eswatini", "Botswana")) |> 
  # Create a ggplot object with the filtered data
  ggplot(aes(x = Year, y = `% ITN Ownership`, color = Country)) +
  # Add lines to the plot, coloured by Country
  geom_line(linewidth = 1) +
  # Add ribbons to the plot to represent the confidence intervals, filled by country
  geom_ribbon(aes(ymin = `% ITN Ownership Lower Bound`, ymax = `% ITN Ownership Upper Bound`, fill = Country), alpha = 0.2) +
  # Apply custom colour scale for lines
  scale_colour_manual_health_radar() +  
  # Apply custom fill scale for ribbons
  scale_fill_manual_health_radar() +   
  # Apply custom theme
  theme_health_radar() +
  # Add labels and title to the plot
  labs(
    title = "Estimated ITN ownership in Frontline 4 countries (1999-2008)",
    x = "Year",
    y = "ITN Ownership (%)",
    color = "Country",
    fill = "Country",  
    caption = str_wrap(
      "The estimated ITN ownership in the Frontline 4 countries from 1999 to 2008. Lines indicate the ownership estimates, while shaded areas represent the confidence bands. The uncertainty in ITN ownership increases over time for all countries. South Africa and Eswatini had the lowest levels of ITN ownership, but after 2004, there is a significant rise in ownership in Botswana and Namibia. Source: Institute for Health Metrics and Evaluation (IHME). Available from: https://doi.org/10.6069/DEE3-E887", 
      width = 100))

No bounds are provided for estimated ITN ownership in at-risk populations, so a simple line graph is used to represent these data.

Show the code

# Modify the Country names to replace "Swaziland" with "Eswatini"
ihme_nets |> 
  mutate(Country = ifelse(Country == "Swaziland", "Eswatini", Country)) |> 
  # Filter the data for E8 countries
  filter(Country %in% e8_countries) |> 
  # Create a ggplot object with the filtered data
  ggplot(aes(x = Year, y = `% ITN Ownership Pop. At Risk`, color = Country)) +
  # Add lines to the plot, coloured by Country
  geom_line(linewidth = 1) +
  # Apply custom colour scale
  scale_colour_manual_health_radar() + 
  # Apply custom theme
  theme_health_radar() +
  # Add labels and title to the plot
  labs(
    title = "Estimated ITN ownership in at-risk populations (1998-2008)",
    x = "Year",
    y = "ITN Ownership (%)",
    colour = "E8 Country",
    caption = str_wrap(
      "The estimated ITN ownership among populations at-risk for malaria in Elimination 8 (E8) countries, from 1998 to 2008. The rise in ITN ownership is estimated in all countries, with Zambia, Botswana and Zimbabwe showing the highest percent ownership in 2008, followed by Namibia, Mozambique and Angola. South Africa and Eswatini maintain less than 20% ownership in at-risk populations. Source: Institute for Health Metrics and Evaluation (IHME). Available from: https://doi.org/10.6069/DEE3-E887", 
      width = 90))

The ‘observed_cases’ data can be accessed as follows:

Navigate to the IHME data portal.
In the search panel on the left-hand side, make the following selections:
- Set GBD Estimate to ‘Cause of death or injury’.
- Select the measures ‘Prevalence’ , ‘Incidence’ and ‘Deaths’.
- Select the metrics ‘Number’, ‘Percent’ and ‘Rate’.
- Set Cause to ‘Malaria’.
- For Location, select the E8 Countries.
- Set Age to ‘All Ages’.
- Set Sex to ‘Both’.
- Select the years from 2000 to 2021.
Press the download button and follow the prompts to download your file.

Show the code

observed_cases <- read_csv("data/IHME-GBD_2021_DATA-05f6b66f-1.csv", show_col_types = FALSE) |>
  mutate(location_name = ifelse(location_name == "Swaziland", "Eswatini", location_name))

observed_cases |> 
  pull(location_name) |> 
  unique()

[1] "Republic of Angola"       "Republic of South Africa"
[3] "Kingdom of Eswatini"      "Republic of Zambia"      
[5] "Republic of Zimbabwe"     "Republic of Mozambique"  
[7] "Republic of Botswana"     "Republic of Namibia"

Show the code

library(gt)

observed_cases |> 
  filter(location_name == "South Africa") |>
  head() |> 
  gt()

measure_id	measure_name	location_id	location_name	sex_id	sex_name	age_id	age_name	cause_id	cause_name	metric_id	metric_name	year	val	upper	lower

A stacked bar plot has been used to show the estimated number of malaria deaths in the E8 countries from 2000 to 2021.

Show the code

# Convert the observed_cases data to a data frame
observed_cases |> 
  as.data.frame() |> 
  # Filter the data for deaths and rate metrics
  filter(measure_name == "Deaths", metric_name == "Rate") |> 
  # Group the data by location
  group_by(location_name) |> 
  # Create a ggplot object with the filtered and grouped data
  ggplot() +
  # Add stacked bar plots to the chart, filled based on location
  geom_bar(aes(x = year, y = val, fill = location_name), stat = "identity", position = "stack") +
  # Apply custom fill colour scale
  scale_fill_manual_health_radar() +   
  # Apply custom theme
  theme_health_radar() +
  # Add labels and title to the plot
  labs(
    title = "Estimated malaria death rate (2000-2021)",
    x = "Year",
    y = "Deaths (rate per 100 000)",
    fill = "E8 Country",
    caption = str_wrap(
      "The Institute of Health Metrics and Evaluation's (IHME) estimates of the malaria death rate in Elimination 8 (E8) countries for the 2000 to 2021 period. The highest death rate is estimated for Mozambique, followed by Angola. Substantially fewer deaths are estimated for Namibia, Botswana, South Africa, and Eswatini during this period, but it should be noted that the populations at risk of malaria in countries such as South Africa, are much lower than those in countries such as Angola and Mozambique. Source: Institute for Health Metrics and Evaluation (IHME). Global Burden of Disease Study 2021 (GBD 2021) Results. Seattle, WA: IHME, University of Washington, 2024. Available from: https://vizhub.healthdata.org/gbd-results/", 
      width = 100))

Upper and lower bounds were again available for the estimated malaria incidence rate, so a line plot with ribbons was chosen to visualise these data.

Show the code

# Convert the observed_cases data to a data frame
observed_cases |> 
  as.data.frame() |> 
  # Filter the data for incidence and rate metrics
  filter(measure_name == "Incidence", metric_name == "Rate") |> 
  # Create a ggplot object with the filtered data
  ggplot(aes(x = year, y = val, color = location_name)) +
  # Add lines to the plot, coloured by location name
  geom_line(linewidth = 1) +
  # Add ribbons to the plot to represent the confidence intervals, filled based on location
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = location_name), alpha = 0.2) +
  # Apply custom colour scale for lines
  scale_colour_manual_health_radar() +  
  # Apply custom fill scale for ribbons
  scale_fill_manual_health_radar() +
  # Apply custom theme
  theme_health_radar() +
  # Add labels and title to the plot
  labs(
    title = "Estimated malaria incidence rate (2000-2021)",
    x = "Year",
    y = "Incidence (rate per 100 000)",
    colour = "E8 Country",
    fill = "E8 Country",  
    caption = str_wrap(
      "The estimated malaria incidence rate per 100 000 people from 2000 to 2021 in Elimination 8 (E8) countries. The lines show estimated incidence rate and the shaded areas show the uncertainty intervals. Mozambique is estimated to have experienced malaria incidence of 47 452.43 cases per 100 000 people in 2000, which decreased to 32 050.40 cases per 100 000 by 2021. Zimbabwe is estimated to have experienced erratic malaria incidence, with cases fluctuating over the 21 year period depicted. Note that the low incidence rates estimated in countries such as South Africa and Botswana may be related to the low proportion of their total populations living in regions in which malaria is common. Source: Institute for Health Metrics and Evaluation (IHME). Global Burden of Disease Study 2021 (GBD 2021) Results. Seattle, WA: IHME, University of Washington, 2024. Available from: https://vizhub.healthdata.org/gbd-results/", 
      width = 85))

How can this data be used in disease modelling?

The Global Burden of Disease (GBD) data on historical trends in on various disease metrics can serve to validate the accuracy of predictions made by transmission models. Malaria models frequently require calibration to align with IHME-derived estimates, such as adjusting transmission rates based on observed incidence or prevalence.

Preparing the data

The Global Burden of Disease study has produced estimates of prevalence for malaria worldwide. Prevalence is defined as the total number of cases of a given cause in a specified population at a designated time. It is differentiated from incidence, which refers to the number of new cases in the population at a given time. Observations from Plasmodium falciparum parasite rate (PfPR) surveys and national routine surveillance systems of confirmed and unconfirmed diagnoses are augumented into a model of disease burden. More information on this methodology is provided here.

Show the code

# Load data from GBD
observed_df <- read_csv("data/IHME-GBD_2021_DATA-05f6b66f-1.csv", show_col_types = FALSE) |>  
  as.data.frame() |>  
  filter(measure_name == "Prevalence", metric_name == "Number", location_name == "Republic of South Africa") |>  
  rename(value = val)

observed_df |>  
ggplot() +
  geom_point(aes(x = year, y = value, colour = measure_name)) +
  geom_line(aes(x = year, y = value, colour = measure_name)) +
  scale_colour_manual_health_radar() +
  theme_health_radar() +
  labs(title = "Observed malaria point prevalence for South Africa",
    x = "Year",
    y = "Number of Cases",
    caption = str_wrap("Source: Institute for Health Metrics and Evaluation IHME | Global Burden of Disease Study (GBD) 2021.")) +
  guides(colour = "none")

Modeling prevalence

A good use of prevalence data is to calibrate transmission models. This is particularly useful if incidence data is limited, and in diseases with a significant asymptomatic proportion of the population, such as malaria. Fitting the model to both incidence and prevalence data provides a more complete picture of disease dynamics.

Point prevalence data is obtained from cross-sectional surveys. As such, we may fail to capture dynamic trends in transmission and malaria immunity. In addition, capturing the asymptomatic population (as a proportion of total prevalence) depends on the sensitivity of the diagnostic tool used.

Malaria models set in endemic countries also assume that humans can be exposed to and infected by an infectious mosquito while recovering from an initial infection (secondary infection). It may be difficult to estimate the true proportion of the population that has recovered from an initial infection already, and is now susceptible to a secondary infection.

Model calibration

To calibrate the model, firstly, build a transmission model that reflects the underlying health system dynamics in the country, coverage levels for any control interventions deployed, as well as relevant mosquito, parasite and human behaviours. We then select key parameters to fine-tune, causing the model output to better mimic the observed data.

We minimise the difference between the model output and observed data using sum of least squares or maximum likelihood estimation. An example of this function is shown in the code snippet below, and a simulated plot thereafter.

Show the code

#|eval: false
#|echo: true
#|warning: false

# Objective or cost function
objective_function <- function(initial_parameters, observed_df) {
  
  # Parameters to fine tune
  a <- initial_parameters[1] # human biting rate
  pa <- initial_parameters[2] # probability of asymptomatic infection
  irs_eff <- initial_parameters[3] # effectiveness of IRS at reducing transmission
  delta <- initial_parameters[4] # natural recovery rate
  r <- initial_parameters[5] # rate of loss of infectiousness
  
  # Run the ODE model
  output_df <- ode(y = initial_state, 
           times = times, 
           func = seacr, 
           parms = initial_parameters,
           irs_cov = irs_cov)
    
  # Calculate the values for prevalence
  prev_df <- output_df |>  
  as.data.frame() |>  
  mutate(Prv = c(0, diff(CPrv))) |> 
  pivot_longer(!time, names_to = "state", values_to = "value") |>  
  mutate(year = 1995 + ceiling(time/365)) |>  
  filter(state %in% "Prv") |>  
  group_by(year) |> 
  slice_tail(n = 1) |>  
  filter(year >= 2000) # adjust timeframe to match data
   
  # Actual GBD data
  observed_prev <- filter(observed_df, measure_name == "Prevalence")$value
  
  # Model projections
  predicted_prev <- prev_df$value
  
  # Calculate distance using sum of least squares or Poisson log- likelihood
  error <- (observed_prev - predicted_prev)^2
  #error <- dpois(round(observed_prev, 0), predicted_prev, log = TRUE)
  
  total <- sum(error)
 
  return(-total)
}

# Run optimization to mimimise total

optim_result <- optim(par = initial_parameters, 
                      fn = objective_function, 
                      observed_df = observed_df, 
                      method = "L-BFGS-B",
                      lower = c(a = 0.1, rho = 1/500, pa = 0.01, delta = 1/280, irs_eff = 0.1, r = 1/21),
                      upper = c(a = 0.9, rho = 1/40, pa = 0.9, delta = 1/90, irs_eff = 0.9, r = 1/3)
                     )

# Best-fit parameters from calibration
cal_parameters <- optim_result$par

# Run model with new calibrated parameters 
calibrated_output <- ode(y = initial_state, 
           times = times, 
           func = seacr, 
           parms = cal_parameters,
           irs_cov = irs_cov)

# Plot prevalence of new model fit with ggplot

Show the code

set.seed(123)

# Simulated values
predicted_prev <- data.frame(
  year = 2000:2021,
  value = observed_df$value + rnorm(length(observed_df$value), mean = 3000, sd = 2000)) |>  
  mutate(
    key = "Initial model output") |> 
  bind_rows(
      data.frame(
  year = 2000:2021,
  value = observed_df$value + rnorm(length(observed_df$value), mean = 700, sd = 500)) |>  
  mutate(
    std_error = 800, # for example
    lower = value - (1.96 * std_error),
    upper = value + (1.96 * std_error),
    key = "Calibrated Projected prevalence")
    )

merged_df <- bind_rows(
  predicted_prev,
  mutate(observed_df, key = "GBD estimates")
  ) 

ggplot() +
   geom_line(data = filter(merged_df, key == "GBD estimates"), aes(x = year, y = value, colour = key)) +
   geom_pointrange(data = filter(merged_df, key == "GBD estimates"), aes(x = year, y = value, ymin = lower, ymax = upper, colour = key)) + 
  geom_line(data = filter(merged_df, key == "Initial model output"), aes(x = year, y = value, colour = key)) +
geom_line(data = filter(merged_df, key == "Calibrated Projected prevalence"), aes(x = year, y = value, colour = key)) +
  geom_ribbon(data = filter(merged_df, key == "Calibrated Projected prevalence"), aes(x = year, y = value, ymin = lower, ymax = upper), fill = "grey", alpha = 0.4) +
  scale_colour_manual_health_radar() +
  theme_health_radar() +
  labs(title = "Malaria point prevalence for South Africa from 2000 to 2021",
       subtitle = "Simulation of projected prevalance from the transmission model alongside prevalence data",
    x = "Year",
    y = "Number of Cases",
    Colour = "Variable",
    caption = str_wrap("The calibrated model simulations align more closely with the observed prevalence than the initial projections. These results are compared against prevalence data from the Global Burden of Disease (GBD) 2021 study. The gray ribbon represents the uncertainty in the model estimates, and its overlap with the data's confidence intervals suggests a strong fit.")
  )

Calibrating disease models to real-world data can be challenging. Methods like Approximate Bayesian Computation can be better when considering large datasets or complex models, but may be computationally expensive. You can find some examples of calibration in the literature below:

Awine, T., & Silal, S. P. Assessing the effectiveness of malaria interventions at the regional level in Ghana using a mathematical modelling application. PLOS global public health, 2(12), e0000474 (2022). https://doi.org/10.1371/journal.pgph.0000474

Policy implications

In general, calibration ensures the model accurately reflects local transmission dynamics and the burden of disease. Specifically, calibrating to prevalence data ensures that we are also representing the asymptomatic burden of infection, and correctly accounting for its contribution to ongoing transmission. This allows policymakers to achieve more credible forecasting and simulate the impact of interventions targeted at the asymptomatic population such as mass testing and treating, or mass drug administration.