# Reproduction number of a SIR model with mortality We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

We know that reproduction number \$mathcal{R}_0\$ is \$frac{alpha}{eta}\$ for the following system, such that if \$mathcal{R}_0>1\$, there is an epidemic in the population. Now, assume the system below with mortality rate of \$delta\$: I'm wondering which one of options below represents \${R}_0\$?

1. \${R}_0 = frac{alpha}{eta + delta}\$
2. \${R}_0 = frac{alpha}{eta delta}\$
3. \${R}_0 = frac{alpha}{eta}+frac{alpha}{delta}\$

Could you mark the right option with its justification?

Thanks

The only answer that makes numerical sense is 1. The product of two rates beta and delta (recovery * death) doesn't mean anything in SIR. And in answer three you're doubling the rate of infection (alpha). Looking at the other way, for R_0 it doesn't matter how people leave the Infected class, once you're either dead or recovered you no longer are transmitting the disease. Therefore you can say that some value Zeta is the exit from Infected class and Zeta would be the sum of all rates which remove a person from being Infectious

Granted, I haven't worked with SIR-models, but to me the answer is definitely nr. 1.

Fundamentally, \$R_0\$ is defined as the number of secondary infections from a single individual in an uninfected population. It is sometimes described as:

\$ R_0 = gamma *c * d \$,

where \$gamma\$ is the probability of transmitting the disease, \$c\$ is the average contact rate (encounter rate with other individuals), and \$d\$ the average length being infectious (which is \$1/eta\$, there \$eta\$ is the recovery rate). In your expression above \$alpha = gamma*c\$.

In this simple model of \$R_0\$ adding a mortality rate basically equates to adding another way of being removed from being infectious (you can either recover or die). Therefore, the expected time being infectious (previously \$1/eta\$) is now the sum of two rates, \$1/(eta + delta)\$, where \$delta\$ is the death rate (multiplying the two rates would be similar to estimating the probability that someone both recovers and dies from the infection). This means that \$R_0\$ with a death rate is:

\$\$ R_0 = frac {gamma c}{eta+delta} = frac {alpha}{eta+delta}\$\$

However, there are also more complicated (and realistic) ways of modelling situations with a death rate.

(I began my answer before @Artems nice answer, which is why I'm posting this as a complementary answer.)

## Estimating the time-varying reproduction number of COVID-19 with a state-space method

After slowing down the spread of the novel coronavirus COVID-19, many countries have started to relax their confinement measures in the face of critical damage to socioeconomic structures. At this stage, it is desirable to monitor the degree to which political measures or social affairs have exerted influence on the spread of disease. Though it is difficult to trace back individual transmission of infections whose incubation periods are long and highly variable, estimating the average spreading rate is possible if a proper mathematical model can be devised to analyze daily event-occurrences. To render an accurate assessment, we have devised a state-space method for fitting a discrete-time variant of the Hawkes process to a given dataset of daily confirmed cases. The proposed method detects changes occurring in each country and assesses the impact of social events in terms of the temporally varying reproduction number, which corresponds to the average number of cases directly caused by a single infected case. Moreover, the proposed method can be used to predict the possible consequences of alternative political measures. This information can serve as a reference for behavioral guidelines that should be adopted according to the varying risk of infection.

## The SIR Model for Spread of Disease - The Differential Equation Model

As the first step in the modeling process, we identify the independent and dependent variables. The independent variable is time  t,  measured in days. We consider two related sets of dependent variables.

The first set of dependent variables counts people in each of the groups, each as a function of time:

 S = S(t) is the number of susceptible individuals, I = I(t) is the number of infected individuals, and R = R(t) is the number of recovered individuals.

The second set of dependent variables represents the fraction of the total population in each of the three categories. So, if  N  is the total population (7,900,000 in our example), we have

 s(t) = S(t)/N, the susceptible fraction of the population, i(t) = I(t)/N, the infected fraction of the population, and r(t) = R(t)/N, the recovered fraction of the population.

It may seem more natural to work with population counts, but some of our calculations will be simpler if we use the fractions instead. The two sets of dependent variables are proportional to each other, so either set will give us the same information about the progress of the epidemic.

Under the assumptions we have made, how do you think  s(t)  should vary with time? How should  r(t)  vary with time? How should  i(t)  vary with time?

Next we make some assumptions about the rates of change of our dependent variables:

No one is added to the susceptible group, since we are ignoring births and immigration. The only way an individual leaves the susceptible group is by becoming infected. We assume that the time-rate of change of  S(t),  the number of susceptibles, 1 depends on the number already susceptible, the number of individuals already infected, and the amount of contact between susceptibles and infecteds. In particular, suppose that each infected individual has a fixed number  b  of contacts per day that are sufficient to spread the disease. Not all these contacts are with susceptible individuals. If we assume a homogeneous mixing of the population, the fraction of these contacts that are with susceptibles is  s(t).  Thus, on average, each infected individual generates  b s(t)  new infected individuals per day. [With a large susceptible population and a relatively small infected population, we can ignore tricky counting situations such as a single susceptible encountering more than one infected in a given day.]

Let's see what these assumptions tell us about derivatives of our dependent variables.

The Susceptible Equation . Explain carefully how each component of the differential equation

 (1)

follows from the text preceding this step. In particular,

Why is the factor of  I(t)  present?

 (3)

follows from one of the assumptions preceding Step 4.

 (4)

What assumption about the model does this reflect? Now explain carefully how each component of the equation

 (5)

follows from what you have done thus far. In particular,

Finally, we complete our model by giving each differential equation an initial condition. For this particular virus -- Hong Kong flu in New York City in the late 1960's -- hardly anyone was immune at the beginning of the epidemic, so almost everyone was susceptible. We will assume that there was a trace level of infection in the population, say, 10 people. 2  Thus, our initial values for the population variables are

 S(0) =ه,900,000 I(0) =㺊 R(0) =ـ

In terms of the scaled variables, these initial conditions are

 s(0) =ف i(0) =ف.27 x㺊 - 6 r(0) =ـ

(Note: The sum of our starting populations is not exactly  N,  nor is the sum of our fractions exactly  1.  The trace level of infection is so small that this won't make any difference.) Our complete model is

We don't know values for the parameters  b  and    yet, but we can estimate them, and then adjust them as necessary to fit the excess death data. We have already estimated the average period of infectiousness at three days, so that would suggest  k =ف/3.  If we guess that each infected would make a possibly infecting contact every two days, then  b  would be  1/2.  We emphasize that this is just a guess. The following plot shows the solution curves for these choices of  b  and  k

In steps 1 and 2, you recorded your ideas about what the solution functions should look like. How do those ideas compare with the figure above? In particular,

What do you think about the relatively low level of infection at the peak of the epidemic?

In Part 3, we will see how solution curves can be computed even without formulas for the solution functions.

1  Note that we have turned the adjective "susceptible" into a noun. It is common usage in epidemiology to refer to "susceptibles," "infecteds," and "recovereds" rather than always use longer phrases such as "population of susceptible people" or even "the susceptible group."

2  While I(0) is normally small relative to N, we must have I(0) >ـ for an epidemic to develop. Equation (5) says, quite reasonably, that if I =ـ at time 0 (or any time), then dI/dt =ـ as well, and there can never be any increase from the 0 level of infection.

David Smith and Lang Moore, "The SIR Model for Spread of Disease - The Differential Equation Model," Convergence (December 2004)

## Deriving the formulas

We now want to get the number of infected, susceptible and recovered for all days, just from β, γ and N. Now, it is difficult to obtain a direct formula for S(t), I(t) and R(t). However, it is quite simple to describe the change per day of S, I, and R, that is, how the number of susceptible/infected/recovered changes depending on the current numbers. Again, we’ll derive the formulas by example:

We are now on day t after outbreak of disease X. Still, the expected amount of people an infected person infects per day is 1 (so β=1) and the number of days that an infected person has and can spread the disease is 7 (so γ=1/7 and D=7).

Let’s say that on day t, 60 people are infected (so I(t)=60), the total population is 100 (so N=100), and 30 people are still susceptible (so S(t)=30 and R(t)=100–60–30=10). Now, how do S(t) and I(t) and R(t) change to the next day?

We have 60 infected people. Each of them infects 1 person per day (that’s β). However, only 30/100 =30% of people they meet are still susceptible and can be infected (that’s S(t) / N). So, they infect 60 ⋅ 1 ⋅ 30/100 = 18 people (again, think about it until it really makes sense: 60 infected that infect on average 1 person per day, but only 30 of 100 people can still be infected, so they do not infect 60⋅1 people, but only 60⋅1⋅30/100 = 18 people). So, 18 people of the susceptibles get infected, so S(t) changes by minus 18. Plugging in the variables, we just derived the first formula:

Change of S(t) to the next day = - β ⋅ I(t) ⋅ S(t) / N.

If you’re familiar with calculus, you know we have a term for describing the change of a function: the derivative S’(t) or dS/dt. (After we have derived and understood all the derivatives S’(t), I’(t) and R’(t), we can calculate the values of S(t), I(t) and R(t) for each day.)

Now, how does the amount of infected change? That’s easy: There are some new people infected, we just saw that. Exactly the amount of people that “leave” S(t) “arrive” at I(t). So, we have 18 new infected and we already know that the formula will be similar to this: I’(t) = + β ⋅ I(t) ⋅ S(t) / N (of course, we can omit the plus, it’s just to show you that we gain the exact amount that S(t) loses, so we just change the sign). There’s just one thing missing: some people recover. Remember, we have γ for that, it’s the proportion of infected recovering per day, that’s just what we need!

We have 60 infected and γ=1/3, so one third of the 60 recovers. That’s 1/3 ⋅ 60 = 20. Finally, we obtain the formula:

Again, think about this for a minute The first part is the newly infected from the susceptibles. The second part is the recoveries.

Finally, we get to the last formula, the change in recoveries. That’s easy: the newly recovered are exactly the 20 we just calculated there are no people leaving the “recovered”-compartment. Once recovered, they stay immune:

Great, we have now derived (and understood) all the formulas we need! Here they are again with a more common notation for the derivative and the “(t)” left out as is often done: Such equations are called ordinary differential equations (ODEs) (you won’t need any knowledge about them to follow this series).

We can now describe the change in the number of people susceptible, infected, and recovered. From these formulas, luckily, we can calculate the numbers we’re really interested in: S(t), I(t) and R(t), the number of people susceptible, infected, and recovered for each day t. Even more luckily, we do not have to do one bit ourselves, python provides many tools for solving ODEs!

## Reproduction number of a SIR model with mortality - Biology

The inter-epidemic period is calculated from the largest imaginary part of eigenvalues of the Jacobian matrix evaluated at the endemic equilibrium.

Infection trajectories show a numerical solution to the SEIRS equations with 1,000 time steps and initial parameters

\$S(0) = 0.999 - p\$ \$E(0) = 0.001\$ \$I(0) = 0\$ \$R(0) = p\$ \$S + E + I + R = N = 1\$

where p is the vaccination fraction.

### Points of Significance: The SEIRS model for infectious disease dynamics

Ottar Bjørnstad 1,2 , Katriona Shea 1 , Martin Krzywinski 3 , Naomi Altman 4

1. Department of Biology, The Pennsylvania State University, State College, PA, USA.

2. Department of Entomology, The Pennsylvania State University, State College, PA, USA.

4. Department of Statistics, The Pennsylvania State University, State College, PA, USA.

#### Version history

23 May 2020 v1.0.0 &mdash initial public release

#### Related columns

Shea, K., Bjørnstad, O., Krzywinski, M. & Altman, N. Points of Significance: Uncertainty and the management of epidemics. (2020) Nature Methods 17 (in press). (interactive figures, download code)

For very deadly diseases, this compartment is very important. For some other situations, you might want to add completely different compartments and dynamics (such as births and non-disease-related deaths when studying a disease over a long time) these models can get as complex as you want! Let’s think about how we can take our current transitions and add a Dead state. When can people die from the disease? Only while they are infected! That means that we’ll have to add a transition I → D. Of course, people don’t die immediately We define a new variable ρ (rho) for the rate at which people die (e.g. when it takes 6 days to die, ρ will be 1/6). There’s no reason for the rate of recovery, γ, to change. So our new model will look somehow like this: The only thing that’s missing are the probabilities of going from infected to recovered and from infected to dead. That’ll be one more variable (the last one for now!), the death rate α. For example, if α=5%, ρ = 1 and γ = 1 (so people die or recover in 1 day, that makes for an easier example) and 100 people are infected, then 5% ⋅ 100 = 5 people will die. That leaves 95% ⋅ 100 = 95 people recovering. So all in all, the probability for I → D is α and thus the probability for I → R is 1-α. We finally arrive at this model: Which naturally translates to these equations: ## How epidemiological models of COVID-19 help us estimate the true number of infections

Our World in Data presents the data and research to make progress against the world’s largest problems.
Our main publication on the pandemic is here: Coronavirus Pandemic (COVID-19).

We are grateful to the researchers whose work we cover in this post for giving helpful feedback and suggestions. Thank you.

We update the model estimates with the latest available data every week, usually on Monday. Last page update: 22 June 2021. 1

A key limitation in our understanding of the COVID-19 pandemic is that we do not know the true number of infections. Instead, we only know of infections that have been confirmed by a test – the confirmed cases. But because many infected people never get tested, 2 we know that confirmed cases are only a fraction of true infections. How small a fraction though?

To answer this question, several research groups have developed epidemiological models of COVID-19. These models use the data we have – confirmed cases and deaths, testing rates, and more – plus a range of assumptions and epidemiological knowledge to estimate true infections and other important metrics.

The chart here shows the mean estimates of the true number of daily new infections in the United States from four of the most prominent models. 3 For comparison, the number of confirmed cases is also shown.

Two things are clear from this chart: All four models agree that true infections far outnumber confirmed cases. But the models disagree by how much, and how infections have changed over time.

When the number of confirmed cases in the US reached a peak in late July 2020, the IHME and LSHTM models estimated that the true number of infections was about twice as high as confirmed cases, the ICL model estimated it was nearly three times as high, and Youyang Gu’s model estimated it was more than six times as high. Back in March the estimated discrepancy between confirmed cases and true infections was even many times higher.

In this post we examine these four models and how they differ by unpacking their essential elements: what they are used for, how they work, the data they are based on, and the assumptions they make.

We also aim to make the model estimates easily accessible in our interactive charts, allowing you to quickly explore different models of the pandemic for most countries in the world. To do this simply click 𠇌hange country” on each chart.

Three of the four models we look at are “SEIR” 4 models, 5 which simulate how individuals in a population move through four states of a COVID-19 infection: being Susceptible, Exposed, Infectious, and Recovered (or deceased). How individuals move through these states is determined by different model “parameters,” of which there are many. Two key ones are the effective reproduction number (Rt) 6 – how many other people a person with COVID-19 infects at a given time – and the infection fatality rate (IFR) – the percent of people infected with a disease who die from it.

Click to open interactive version

### Imperial College London (ICL)

##### Age-structured SEIR model focused on low- and middle-income countries (details as of 23 August 2020)

This chart shows the ICL model’s estimates of the true number of daily new infections in the United States. To see the estimates for other countries click 𠇌hange country.” The lines labeled “upper” and “lower” show the bounds of a 95% uncertainty interval. For comparison, the number of confirmed cases is also shown.

##### Regions covered

164 countries and territories across the world

##### Time covered

The first date covered is the estimated start of the pandemic for each country. The model makes projections that extend 90 days past the latest date of update. 7

##### What is the model?

The model is a stochastic SEIR variant with multiple infectious states to reflect different COVID-19 severities, such as mild or asymptomatic versus severe.

##### What is the model used for?

ICL describes its model as a tool to help countries understand at what stage the country is in its epidemic (e.g., before or after a peak) and how healthcare demand might change in the future under three policy scenarios. These scenarios are designed to provide a counterfactual of what could happen if current interventions were maintained, increased, or relaxed and are therefore not intended to forecast future mortality.

ICL uses the model estimates to write reports for individual low- and middle-income countries (LMICs) that are relatively early in their epidemics these reports are focused on the next 28 days. The downloadable model estimates additionally include data for some high-income countries later in their epidemics (e.g., the US and EU countries) and projections 90 days into the future.

Based on the model ICL publishes estimates of the following metrics:

• True infections (to-date and projected)
• Confirmed deaths (projected)
• Hospital and ICU demand (to-date and projected)
• Effective reproduction number, Rt (to-date and projected)
##### What data is the model based on?

The model is 𠇏it” to data on confirmed deaths 8 by using an estimated IFR to �k-calculate” how many infections would have been likely over the previous weeks to produce that number of deaths. It uses mobility data – from Google or, if unavailable, inferred from ACAPS government measures data – to modulate the Rt, the key parameter on how transmission is changing.

Additionally, the model uses age- and country-specific data on demographics, patterns of social contact, hospital availability, and the risk of hospitalization and death, though the availability of this data varies by country.

##### What are key assumptions and potential limitations?

The model uses an estimated IFR for each country calculated by applying age-specific IFRs observed in China and Europe (of about 0.6𠄱%) to that country’s age distribution. In countries like many LMICs with younger populations than in China and Europe, this results in IFR estimates of typically 0.2𠄰.3% because younger populations have lower associated mortality rates. These lower mortality rates, however, assume access to sufficient healthcare, which might not always be the case in LMICs. Differences between the estimated and true IFRs could impact the accuracy of model estimates.

The model assumes that the number of confirmed deaths is equal to the true number of deaths. But research on excess mortality and known limitations to testing and reporting capacity suggest that confirmed deaths are often fewer than true deaths. Where this is the case the model likely underestimates the true health burden.

The model assumes that the change in transmission over time is a function of average mobility trends for places like stores and workplaces but not parks and residential areas. 9 If these assumptions about mobility and transmission do not hold, the model might not accurately track the pandemic.

Like all models, this one makes many assumptions, and we cover only a few key ones here. For a full list see the model methods description.

Click to open interactive version

### Institute for Health Metrics and Evaluation (IHME)

##### Hybrid statistical/SEIR model (details as of 23 August 2020)

This chart shows the IHME model’s estimates of the true number of daily new infections in the United States. To see the estimates for other countries click 𠇌hange country.” The lines labeled “upper” and “lower” show the bounds of a 95% uncertainty interval. For comparison, the number of confirmed cases is also shown.

##### Regions covered

159 countries and territories across the world including subnational data for the US and several other countries

##### Time covered

The first date covered varies by country. The model makes projections that extend approximately 90� days past the latest date of update.

##### Update frequency

About once a week (though not all countries are updated each time)

##### What is the model?

The model is a hybrid with two main components: a statistical �th model” component produces death estimates that are used to fit an SEIR model component.

Note that the model has had two significant updates since its initial publication:

##### What is the model used for?

IHME describes its model as a tool to help government officials understand how different policy decisions could impact the course of the pandemic and to plan for changing healthcare demand.

The model makes deaths projections that have been highly publicized and sometimes criticized. 10 Though much of the criticism was leveled at a previous version of the model, known as 𠇌urveFit,” that was used before the SEIR component was added on 4 May. The projections are made under currently three scenarios. 11

Based on the model IHME publishes estimates of the following metrics:

• True infections (to-date and projected)
• Confirmed deaths (projected)
• Hospital, ICU, and ventilator demand (to-date and projected)
• Effective reproduction number, Rt (to-date and projected)
• Testing levels (projected)
• Mobility, as a proxy for social distancing (projected)
##### What data is the model based on?

The death model uses data on confirmed cases, confirmed deaths, 12 and testing. 13

The SEIR model is fit to the output of the death model by using an estimated IFR to back-calculate the true number of infections.

The model uses several other types of data to simulate transmission and disease progression: mobility, social distancing policies, population density, pneumonia seasonality and death rate, air pollution, altitude, smoking rates, and self-reported contacts and mask use. Details on the sources of these data can be found on the model FAQs and estimation updates pages.

##### What are key assumptions and potential limitations?

The model uses an estimated IFR based on data from the Diamond Princess cruise ship and New Zealand. Though IHME does not give numbers for these, the Diamond Princess IFR has been estimated at 0.6% (95% uncertainty interval of 0.2𠄱.3%). 14 Differences between the estimated and true IFRs could impact the accuracy of model estimates.

The death model makes several assumptions about the relationship between confirmed deaths, confirmed cases, and testing levels. For example, that a decreasing case fatality rate (CFR) – the ratio of confirmed deaths to confirmed cases 15 – is reflective of increasing testing and a shift toward testing mild or asymptomatic cases. But the CFR could also decrease for other reasons, such as improved treatment or a decline in the average age of infected people.

The model assumes that the change in transmission over time is a function of several data inputs (listed above), like mobility and population density. If these assumptions do not hold – for example, because the data is less relevant or its relationship with transmission is misspecified – the model might not accurately track the pandemic.

More details are discussed in the model FAQs and in different estimation update reports.

Click to open interactive version

### Youyang Gu (YYG)

##### Update: Youyang Gu announced that 5 October 2020 is the final model update

This chart shows the YYG model’s estimates of the true number of daily new infections in the United States. To see the estimates for other countries click 𠇌hange country.” The lines labeled “upper” and “lower” show the bounds of a 95% uncertainty interval. For comparison, the number of confirmed cases is also shown.

##### Regions covered

71 countries across the world including subnational data for the US and Canada

##### Time covered

The first date covered varies by country. The model makes projections that extend approximately 90 days past the latest date of update.

##### What is the model?

The model consists of an SEIR base with a machine learning layer on top to search for the parameters that minimize the error between the model estimates and the observed data.

##### What is the model used for?

Youyang describes his model as making projections of true infections and deaths that optimize for forecast accuracy. Though he also stresses that his projections cover a range of possible outcomes, and that projections are not “wrong” if they help shape a different outcome in the future.

Based on the model Youyang publishes estimates of the following metrics:

• True infections (to-date and projected)
• Confirmed deaths (projected)
• Effective reproduction number, Rt (to-date and projected)
• Tests per day targets (projected)

The model does not focus on projections under different scenarios, but has explored what would have happened if the US had mandated social distancing one week earlier or one week later, or if 20% of infected people immediately self-quarantined.

##### What data is the model based on?

The model is fit to data on confirmed deaths 16 by using an estimated IFR to back-calculate the true number of infections. Confirmed cases and hospitalization data are sometimes used to help set bounds for the machine learning parameter search.

##### What are key assumptions and potential limitations?

The model uses an estimated IFR for each region based initially on that region’s observed CFR. The IFR is then decreased 17 linearly over the span of three months until it is 30% of its initial value to reflect the lower average age of infections and improving treatments. Currently, the IFR is estimated to be 0.2𠄰.4% in most of the US and Europe. Differences between the estimated and true IFRs could impact the accuracy of model estimates.

The model assumes there will be unreported deaths for the 𠇏irst few weeks” of a region’s pandemic, and that this underreporting will decrease until the number of confirmed deaths equals true deaths. As noted before, this is often not the case, and thus the model might underestimate the true health burden.

The model makes assumptions about how reopening will affect social distancing and ultimately transmission. For example, if reopening causes a resurgence of infections, the model assumes regions will take action to reduce transmission, which is modeled by limiting the Rt. It also assumes a reopening date for regions (especially outside the US and Europe) where the true date is unknown.

The model was created and optimized for the US. Thus for other countries the model estimates might be less accurate.

For a full list of assumptions and limitations see the model �out” page.

Click to open interactive version

### London School of Hygiene & Tropical Medicine (LSHTM)

##### Statistical model estimating underreporting of infections (details as of 23 August 2020)

This chart shows the LSHTM model’s estimates of the true number of daily new infections in the United States. To see the estimates for other countries click 𠇌hange country.” The lines labeled “upper” and “lower” show the bounds of a 95% uncertainty interval. For comparison, the number of confirmed cases is also shown.

##### Regions covered

159 countries and territories across the world (those with at least 10 confirmed deaths out of a total of 210)

##### Time covered

The first date covered varies by country. The model does not make projections.

##### What is the model?

The model starts with a country’s CFR and adjusts it for the fact that there is a delay of roughly 2𠄳 weeks between case confirmation and death (or recovery). 18 This delay-adjusted CFR is then compared to a baseline, delay-adjusted CFR to estimate the 𠇊scertainment rate” – the proportion of all symptomatic infections that have actually been confirmed. 19

This estimated ascertainment rate is then used to adjust the number of confirmed cases 20 to estimate the true number of symptomatic infections. To finally estimate total infections, the symptomatic infections estimate is adjusted to include asymptomatic infections, which are estimated to compose between 10�% (median 50%) of total infections. 21

##### What is the model used for?

LSHTM describes its model as a tool to help understand the level of undetected epidemic progression and to aid response planning, such as when to introduce and relax control measures.

Based on the model LSHTM publishes estimates of the ascertainment rate.

##### What data is the model based on?

The model is based on data on confirmed deaths and confirmed cases. 22

##### What are key assumptions and potential limitations?

The model assumes a baseline, delay-adjusted CFR of 1.4% and that any difference between that and a country’s delay-adjusted CFR is entirely due to under-ascertainment. But many other factors likely play a role, such as the burden on the healthcare system, COVID-19 risk factors in the population, the ages of those infected, and more.

The assumed baseline CFR is based on data from China and does not account for different age distributions outside China. This causes the ascertainment rate to be overestimated in countries with younger populations and underestimated in countries with older populations. 23

The model assumes that the number of confirmed deaths is equal to the true number of deaths. As noted before, this is often not the case, and thus the model might underestimate the true health burden.

Reported deaths data is sometimes changed retroactively, which can be challenging for the model and might affect its estimates.

More assumptions and limitations are discussed in the full report.

Click to open interactive version

### How should we think about these models and their estimates?

All four models we looked at agree that true infections far outnumber confirmed cases, but they disagree by how much. We now have some insight into these differences: The models all differ to some degree in what they are used for, how they work, the data they are based on, and the assumptions they make.

Making these differences transparent helps us understand how we should think about these models and their estimates. For example, understanding that some models are used for scenario planning and not forecasting (like ICL’s) while others are optimized for forecast accuracy (like Youyang’s) puts their estimates in context. And the models all make different assumptions that each have limitations we can decide if those limitations are relevant to a given situation.

In the end, though, we still want to have confidence that models can track the pandemic accurately. We can calibrate our confidence in different models by giving their estimates a reality check.

One way to do this is to compare model estimates against some observed “ground truth” data. For example, if a model is forecasting the number of deaths four weeks from now, we can wait four weeks and compare the forecast to the deaths that actually occur. 24

But sometimes the ground truth is not easily observed, as is the case with the true number of infections. Here we have to look for converging evidence from other research, such as from seroprevalence studies that test for COVID-19 antibodies in the blood serum to estimate how many people have ever been infected. 25

By gaining a deeper, more nuanced understanding of these models and their strengths and weaknesses, we can use them as valuable tools to help make progress against the pandemic.

### Endnotes

The date this page was updated does not necessarily match the date the model estimates themselves were updated. For model estimate update dates see the individual model charts below. Check the model websites for the most recent updates.

Infected people might not get tested for several reasons, such as not having easy access to testing or not even knowing they are infected because they have no symptoms (though they are still able to transmit the virus). Such asymptomatic infections are estimated to be 10–70% of total infections. Source: CDC COVID-19 Pandemic Planning Scenarios.

There are many models in use besides these four, including other ones by the research groups we cover here. We chose these four models because they are prominent, have been used by policymakers, and have been updated regularly. We use them more for illustration than completeness.

Pronounced by saying each letter, “S-E-I-R.”

The London School model is not an SEIR model.

Also called “time-varying” reproduction number.

While projections are an important aspect of what this and some other models are used for, we do not cover them in this article.

As reported by the European Centre for Disease Prevention and Control (ECDC).

The model assumes that in parks “significant contact events are negligible” and that an “increase in residential movement will not change household contacts.”

For more details about the scenarios see the model FAQs.

Confirmed cases and deaths data as reported by Johns Hopkins University and several official sources.

As reported by the COVID Tracking Project (for US), official sources (Brazil and Dominican Republic), and Our World in Data (all other countries).

Russell et al (2020). Estimating the infection and case fatality ratio for coronavirus disease (COVID-19) using age-adjusted data from the outbreak on the Diamond Princess cruise ship. Eurosurveillance, 25(12). https://doi.org/10.2807/1560-7917.ES.2020.25.12.2000256

The CFR is similar to the IFR but uses the confirmed deaths and cases reported by countries. In contrast, the IFR uses true deaths and infections, which are generally not known and have to be estimated.

As reported by Johns Hopkins University. The data is smoothed before fitting.

Except in “later-impacted regions like Latin America, we wait an additional 3 months before beginning to decrease the IFR.”

The typical CFR calculation divides confirmed deaths by confirmed cases reported on the same day, but those deaths were actually caused by cases confirmed roughly 2–3 weeks before.

All but a trivial number of confirmed cases are assumed to be symptomatic.

This data is first smoothed.

In accordance with this methodology and in consultation with the LSHTM researchers, we perform these calculations to produce the estimates of total infections presented here.

Both as reported by the ECDC.

In a secondary analysis the LSHTM researchers do adjust the baseline CFR for different age distributions. But this has its own assumptions and limitations and is thus not clearly a better approach. More details can be found in the full report.

Though we still need to consider that such forecasts might not track what actually occurs if they help shape a different outcome in the future.

Some current efforts to score forecasts for accuracy are by Youyang Gu, IHME, The Zoltar Project, and Covid Compare.

The LSHTM researchers, for example, compared their model estimates to seroprevalence estimates and found good agreement. You can read more about this in their full report.

### Reuse our work freely

All visualizations, data, and code produced by Our World in Data are completely open access under the Creative Commons BY license. You have the permission to use, distribute, and reproduce these in any medium, provided the source and authors are credited.

The data produced by third parties and made available by Our World in Data is subject to the license terms from the original third-party authors. We will always indicate the original source of the data in our documentation, so you should always check the license of any such third-party data before use and redistribution.

## Using a different parameterization

The optim function allows you to read out the hessian

The hessian can be related to the variance of the parameters (In R, given an output from optim with a hessian matrix, how to calculate parameter confidence intervals using the hessian matrix?). But note that for this purpose you need the Hessian of the log likelihood which is not the same as the the RSS (it differs by a factor, see the code below).

Based on this you can see that the estimate of the sample variance of the parameters is very large (which means that your results/estimates are not very accurate). But also note that the error is a lot correlated. This means that you can change the parameters such that the outcome is not very correlated. Some example parameterization would be:

such that the old equations (note a scaling by 1/N is used):

which is especially appealing since you get this approximate \$I^prime = cI\$ for the beginning. This will make you see that you are basically estimating the first part which is approximately exponential growth. You will be able to very accurately determine the growth parameter, \$c = eta - gamma\$ . However, \$eta\$ and \$gamma\$ , or \$R_0\$ , can not be easily determined.

In the code below a simulation is made with the same value \$c=eta - gamma\$ but with different values for \$R_0 = eta / gamma\$ . You can see that the data is not capable to allow us differentiate which different scenario's (which different \$R_0\$ ) we are dealing with (and we would need more information, e.g. the locations of each infected individual and trying to see how the infection spread out).

It is interesting that several articles already pretend to have reasonable estimates of \$R_0\$ . For instance this preprint Novel coronavirus 2019-nCoV: early estimation of epidemiological parameters and epidemic predictions (https://doi.org/10.1101/2020.01.23.20018549) ## The SEIR Model¶

In the version of the SEIR model, all individuals in the population are assumed to be in a finite number of states.

The states are: susceptible (S), exposed (E), infected (I) and removed (R).

This type of compartmentalized model has many extensions (e.g. SEIRS relaxes lifetime immunity and allow transitions from \$ R o S \$).

• Those in state R have been infected and either recovered or died. Note that in some variations, R may refer only to recovered agents.
• Those who have recovered, and live, are assumed to have acquired immunity.
• Those in the exposed group are not yet infectious.

### Changes in the Infected State¶

Within the SEIR model, the flow across states follows the path \$ S o E o I o R \$.

We will ignore birth and non-covid death during our time horizon, and assume a large, constant, number of individuals of size \$ N \$ throughout.

With this, the symbols \$ S, E, I, R \$ are used for the total number of individuals in each state at each point in time, and \$ S(t) + E(t) + I(t) + R(t) = N \$ for all \$ t \$.

Since we have assumed that \$ N \$ is large, we can use a continuum approximation for the number of individuals in each state.

The transitions between those states are governed by the following rates

• \$ eta(t) \$ is called the transmission rate or effective contact rate (the rate at which individuals bump into others and expose them to the virus).
• \$ sigma \$ is called the infection rate (the rate at which those who are exposed become infected)
• \$ gamma \$ is called the recovery rate (the rate at which infected people recover or die)

The rate \$ eta(t) \$ is influenced by both the characteristics of the disease (e.g. the type and length of prolonged contact required for a transmission) and behavior of the individuals (e.g. social distancing, hygiene).

The SEIR model can then be written as

\$ egin frac & = - eta , S , frac frac & = eta , S , frac - sigma E frac & = sigma E - gamma I frac & = gamma I end ag <1>\$

Here, \$ dy/dt \$ represents the time derivative for the particular variable.

The first term of (1), \$ -eta , S , frac \$, is the flow of individuals moving from \$ S o E \$, and highlights the underlying dynamics of the epidemic

• Individuals in the susceptible state (S) have a rate \$ eta(t) \$ of prolonged contacts with other individuals where transmission would occur if either was infected
• Of these contacts, a fraction \$ frac\$ will be with infected agents (since we assumed that exposed individuals are not yet infectious)
• Finally, there are \$ S(t) \$ susceptible individuals.
• The sign indicates that the product of those terms is the outflow from the \$ S \$ state, and an inflow to the \$ E \$ state.

### Basic Reproduction Number¶

If \$ eta \$ was constant, then we could define \$ R_0 := eta / gamma \$. This is the famous basic reproduction number for the SEIR model. See [HSW05] for more details.

When the transmission rate is time-varying, we will follow notation in [FVJ20] and refer to \$ R_0(t) \$ as a time-varying version of the basic reproduction number.

Analyzing the system in (1) provides some intuition on the \$ R_0(t) := eta(t) / gamma \$ expression:

• Individual transitions from the infected to removed state occur at a Poisson rate \$ gamma \$, the expected time in the infected state is \$ 1/gamma \$
• Prolonged interactions occur at rate \$ eta \$, so a new individual entering the infected state will potentially transmit the virus to an average of \$ R_0 = eta imes 1 / gamma \$ others
• In more complicated models, see [HSW05] for a formal definition for arbitrary models, and an analysis on the role of \$ R_0 < 1 \$.

Note that the notation \$ R_0 \$ is standard in the epidemiology literature - though confusing, since \$ R_0 \$ is unrelated to \$ R \$, the symbol that represents the removed state. For the remainder of the lecture, we will avoid using \$ R \$ for removed state.

Prior to solving the model directly, we make a few changes to (1)

• Re-parameterize using \$ eta(t) = gamma R_0(t) \$
• Define the proportion of individuals in each state as \$ s := S/N \$ etc.
• Divide each equation in (1) by \$ N \$, and write the system of ODEs in terms of the proportions

\$ egin frac & = - gamma , R_0 , s , i frac & = gamma , R_0 , s , i - sigma e frac & = sigma e - gamma i frac & = gamma i end ag <2>\$

Since the states form a partition, we could reconstruct the “removed” fraction of the population as \$ r = 1 - s - e - i \$. However, keeping it in the system will make plotting more convenient.

### Implementation¶

We begin by implementing a simple version of this model with a constant \$ R_0 \$ and some baseline parameter values (which we discuss later).

First, define the system of equations

Given this system, we choose an initial condition and a timespan, and create a ODEProblem encapsulating the system.

With this, choose an ODE algorithm and solve the initial value problem. A good default algorithm for non-stiff ODEs of this sort might be Tsit5() , which is the Tsitouras 5/4 Runge-Kutta method).

We did not provide either a set of time steps or a dt time step size to the solve . Most accurate and high-performance ODE solvers appropriate for this problem use adaptive time-stepping, changing the step size based the degree of curvature in the derivatives.

Or, as an alternative visualization, the proportions in each state over time

While maintaining the core system of ODEs in \$ (s, e, i, r) \$, we will extend the basic model to enable some policy experiments and calculations of aggregate values.

### Extending the Model¶

First, we can consider some additional calculations such as the cumulative caseload (i.e., all those who have or have had the infection) as \$ c = i + r \$. Differentiating that expression and substituting from the time-derivatives of \$ i(t) \$ and \$ r(t) \$ yields \$ frac = sigma e \$.

We will assume that the transmission rate follows a process with a reversion to a value \$ ar_0(t) \$ which could conceivably be influenced by policy. The intuition is that even if the targeted \$ ar_0(t) \$ was changed through social distancing/etc., lags in behavior and implementation would smooth out the transition, where \$ eta \$ governs the speed of \$ R_0(t) \$ moves towards \$ ar_0(t) \$.

Finally, let \$ delta \$ be the mortality rate, which we will leave constant. The cumulative deaths can be integrated through the flow \$ gamma i \$ entering the “Removed” state.

Define the cumulative number of deaths as \$ D(t) \$ with the proportion \$ d(t) := D(t)/N \$.

While we could integrate the deaths given the solution to the model ex-post, it is more convenient to use the integrator built into the ODE solver. That is, we add \$ frac

d(t) \$ rather than calculating \$ d(t) = int_0^t delta gamma, i( au) d au \$ ex-post.

This is a common trick when solving systems of ODEs. While equivalent in principle to using the appropriate quadrature scheme, this becomes especially convenient when adaptive time-stepping algorithms are used to solve the ODEs (i.e. there is not a regular time grid). Note that when doing so, \$ d(0) = int_0^0 delta gamma i( au) d au = 0 \$ is the initial condition.

The system (2) and the supplemental equations can be written in vector form \$ x := [s, e, i, r, R₀, c, d] \$ with parameter tuple \$ p := (sigma, gamma, eta, delta, ar_0(cdot)) \$

Note that in those parameters, the targeted reproduction number, \$ ar_0(t) \$, is an exogenous function.

The model is then \$ frac = F(x,t) \$ where,

\$ F(x,t) := egin -gamma , R_0 , s , i gamma , R_0 , s , i - sigma e sigma , e - gamma i gamma i eta (ar_0(t) - R_0) sigma e delta , gamma , i end ag <5>\$

Note that if \$ ar_0(t) \$ is time-invariant, then \$ F(x, t) \$ is time-invariant as well.

### Parameters¶

The parameters, \$ sigma, delta, \$ and \$ gamma \$ should be thought of as parameters determined from biology and medical technology, and independent of social interactions.

As in Atkeson’s note, we set

• \$ sigma = 1/5.2 \$ to reflect an average incubation period of 5.2 days.
• \$ gamma = 1/18 \$ to match an average illness duration of 18 days.
• \$ ar_0(t) = R_0(0) = 1.6 \$ to match a basic reproduction number of 1.6, and initially time-invariant
• \$ delta = 0.01 \$ for a one-percent mortality rate

As we will initially consider the case where \$ R_0(0) = ar_0(0) \$, the parameter \$ eta \$ will not influence the first experiment.

## SIR Epidemic model for influenza A (H1N1): Modeling the outbreak of the pandemic in Kolkata, West Bengal, India, 2010

In this report, the spread of the pandemic influenza A (H1N1) that had an outbreak in Kolkata, West Bengal, India, 2010 is going to be simulated. The basic epidemic SIR model will be used, it describes three populations: a susceptible population, an infected population, and a recovered population and assumes the total population (sum of these 3 populations) as fixed over the period of study.

There are two parameters for this model: namely the attack rate (β) per infected person per day through contacts and the recovery rate (α). Initially there will be a small number of infected persons in the population. Now the following questions are to be answered with the simulation / analysis:

1. Whether the number of infected persons increase substantially, producing an epidemic, or the flue will fizzle out.
2. Assuming there is an epidemic, how will it end? Will there still be any susceptibles left when it is over?
3. How long will the epidemic last?

The Euler’s method will be primarily used to solve the system of differential equations for the SIR model and compute the equilibrium points (along with some
analytic solution attempts for a few simplified special cases). Here are the
conclusions obtained from the simulations:

1. When the recovery rate α is ≈ 0 or very very low compared to the attack rate β, the flu will turn out to be an epidemic and the entire population will be infected first (the higher β is the quicker the epidemic outbreak).

2. To be more precise, when the initial susceptible population S(0) is greater than the inverse of the basic reproduction number 1/ R0 = α / β, a proper epidemic will break out.

3. When the initial susceptible population S(0) is less than the inverse of the basic reproduction number 1/R0=α/β, then a proper epidemic will never break out.

4. If the initial susceptible population is non-zero, in the end (at equilibrium) there will always be some susceptible population.

5. When there is an epidemic, it will eventually end in the equilibrium point with 0 infected population, how fast it reaches the equilibrium depends upon the recovery rate (the higher α is the quicker the infection removal).

6. The time to reach the equilibrium can be computed using Euler’s method, it depends on the parameters α (the higher the quicker) and β (the higher the quicker) and the initial infected populated size I(0) (the higher the quicker). In 2010, the pandemic influenza A (H1N1) had an outbreak in Kolkata, West Bengal, India. An increased number of cases with influenza like illness (ILI) were reported in Greater Kolkata Metropolitan Area (GKMA) during July and August 2010, as stated in . The main motivation for this research project will be to understand the spread of the pandemic, compute the equilibrium points and find the impact of the initial values of the infected rate and the attack / recovery rate parameters on the spread of the epidemic, using simulations using the basic epidemic SIR model.

The Euler’s method will be primarily used to solve the system of differential equations
for SIR model and compute the equilibrium points. First a few simplified special cases will be considered and both analytic and numerical methods (with Euler method) will be used to compute the equilibrium points. Then the solution for the generic model will be found. As described in , the SIR model can also be effectively used (in a broader
context) to model the propagation of computer virus in computer networks, particularly for the networks with Erdos-Renyi type random graph topology.

### SIR Epidemic Model

The SIR model is an epidemiological model that computes the theoretical number of people infected with a contagious illness in a closed population over time. One of the basic one strain SIR models is Kermack-McKendrick Model. The Kermack-McKendrick Model is used to explain the rapid rise and fall in the number of infective patients observed in epidemics. It assumes that the population size is fixed (i.e., no births, no deaths due to disease nor by natural causes), incubation period of the infectious agent
is instantaneous, and duration of infectivity is the same as the length of the disease. It also assumes a completely homogeneous population with no age, spatial, or social structure.

The following figure 2.1 shows an electron microscope image of the re-assorted H1N1influenza virus photographed at the CDC Influenza Laboratory. The viruses are 80 − 120 nm in diameter . 2.1 Basic Mathematical Model

The starting model for an epidemic is the so-called SIR model, where S stands for susceptible population, the people that can be infected. I is the already infected population, the people that are contagious, and R stands for the recovered population, people who are not contagious any more.

2.1.1 Differential Equations

The SIR model can be defined using the following ordinary differential equations 2.1: • The terms dS/dt , dI/dt , dR/dt in the differential equations indicate the rate of change of the susceptible population size, the infected population size and the recovered population size, respectively.

• The terms β and α indicate the attack rate (number of susceptible persons get infected per day) and the recovery rate of the flu (inverse of the number of days a person remains infected), respectively.

• High value of α means a person will be infected by the flu for less number of days and high value of β means that the epidemic will spread quickly.

• Also, as can be seen from below, from the differential equations it can be shown that the population (S + I + R) is assumed to be constant. 2.1.2 Collected data, units and values for the constants

• As can be seen from the following figure 2.2, the focus of this analysis will be limited to the population in Kolkata Metropolitan Corporation (KMC, XII) area where the population can be assumed to be ≈ 4.5 million or 4500 thousands, as per .

Units

– All population (S, I, R) units will be in thousands persons (so that total population
N = 4500).
– As can be derived from the differential equations 2.1, the unit of β will be in
10^(−6) /persons/day (β = 25 will mean 25 persons in a million gets infected by
susceptible-infected contact per infected person per day).
– Similarly, the units of α will be in 10^(−3) / day (α = 167 will mean 167 × 10^(−3) /
day gets recovered from the flu per day).

• The attack rate is 20-29/100000 and the number of days infected (i.e. the inverse of recovery rate) = 5−7 days on average (with a few exceptions), as per .

• Typical values for β and α can be assumed to be 25 /person / day and 10^3/6 ≈ 167 / day, respectively. 2.2 Simplified Model 1 (with α = 0)

• At first a simplified model is is created assuming that α = 0 (/ day) and that R = 0, so once infected, a person stays contagious for ever. Because S(t) + I(t) + R(t) = S(t) + I(t) = N is constant (since population size N is fixed), S(t) can be eliminated and a single differential equation in just I(t) is obtained as shown in the equation below 2.2. • Also, let the (fixed) population size N = 4500 = S(0) + I(0), (in thousand persons), initially the number of persons infected = I(0) = 1 (in thousand persons) and number of persons susceptible S(0) = N −I(0) = 4499 (in thousand persons), respectively. Let β = 25 × 10^(−6) /persons / day) to start with.

2.2.1 Analytic Solution

• The analytic solution can be found by following the steps shown in the Appendix A and the final solution is shown in the below equations 2.3: • The following figure 2.3 shows the logistic (bounded) growth in I(t) (in thousands persons) w.r.t. the time (in days) for different values of attack rate β×10^(−6) (/ person / day). As expected, the higher the attack rate, the quicker all the persons in the population become infected. 2.2.2 Finding the equilibrium points for I

• The equilibrium points are the points where the rate of change in I is zero, the points that satisfy the following equation biol-2022/8175/image_5A02OYot2iOV6hgJA.png?w=150&h=86 150w" sizes="(max-width: 236px) 100vw, 236px" />

• Considering a small neighborhood of the equilibrium point at I = 0, it can be seen from the figure 2.4 that whenever I > 0, dI/dt > 0, so I increases and goes away from the equilibrium point.

• Hence, the equilibrium point at I = 0 is unstable.

• At I = N = 4500 (in thousand persons) it is a stable equilibrium. As can be seen from the following figure 2.4, in a small neighborhood of the equilibrium point at I = 4500, it always increases / decreases towards the equilibrium point.

• In a small ε > 0 neighborhood at I = 4500 (in thousand persons),

1. dI/dt > 0, so I increases when I <= 4500 − ε .
2. dI/dt > 0, so I decreases when I >= 4500 + ε .

• The same can be observed from the direction fields from the figure 2.5.

• Hence, the equilibrium at I = 4500 is stable.  2.2.3 Numerical Solution with the Euler’s Method

• The algorithm (taken from the course slides) shown in the following figure 2.6 will be used for numerical computation of the (equilibrium) solution using the Euler’s method. • As can be seen from the figure 2.6, then the infection at the next timestep can be (linearly) approximated (iteratively) by the summation of the the infection current timestep with the product of the difference in timestep and the derivative of the infection evaluated at the current timestep.

2.2.3.1 Finding the right step size (with β = 25 × 10^(−6)/person/day)

• In order to decide the best step size for the Euler method, first the Euler’s method is run with different step sizes as shown in the figure 2.7.

• As can be seen from the following table 2.1 and the figure 2.7, the largest differences in the value of I (with two consecutive step sizes) occurs around 78 days:  • As can be seen from the table in the Appendix B, the first time when the error becomes less than 1 person (in thousands) is with the step size 1/512 , hence this step size will be used for the Euler method.

2.2.3.2 Computing the (stable) equilibrium point

• Now, this timestep will be used to solve the problem to find the equilibrium time teq(in days). Find teq such that N − I(teq) < ε = 10^(−6) , the solution obtained is teq = 272.33398 days ≈ 273 days.

• Now, from the analytic solution 2.3 and the following figure 2.8, it can be verified that the teq solution that the Euler’s method obtained is pretty accurate (to the ε tolerance). 2.2.3.3 Results with β = 29 × 10^(−6) / person / day, I(0) = 1 person

• Following the same iterations as above, the steepest error is obtained at t = 67 days in this case, as shown in the figure 2.9.

• The first time when error becomes less than one person for t = 67 days with the Euler ‘s method is with step size 1/512 again.

• The solution obtained is teq = 234.76953125 days ≈ 235 days, so the equilibrium (when all the population becomes infected) is obtained earlier as expected, since the attack rate β is higher. 2.2.3.4 Results with β = 25 × 10−6 / person / day, with different initial values for infected persons (I(0))

• Following the same iterations as above, the equilibrium point is computed using the Euler’s method with different values of initial infected population I(0), as shown in the figure 2.10.

• The solutions obtained are teq = 272.33, 258.02, 251.85, 248.23, 245.66, 245.66 days for I(0) = 1, 5, 10, 15, 20 days, respectively. So the equilibrium is obtained earlier when the initial infected population size is higher, as expected. 2.3 Simplified Model 2 (with β = 0)

• Next, yet another simplified model is considered by assuming that β = 0 and that α > 0, so the flu can no more infect anyone (susceptible, if any, possibly because everyone got infected), an infected person recovers from flu with rate α. This situation can be described again with a single differential equation in just I(t) as shown in the equation below 2.4. biol-2022/8175/image_v25E7ScR9do2T4xDhL.png?w=150&h=30 150w" sizes="(max-width: 351px) 100vw, 351px" />

• Also, let the the entire population be infected, N = 4500 = I(0), (in thousand persons), initially the number of persons susceptible = S(0) = 0, respectively. Let α = 167 × 10^(−3)

2.3.1 Analytic Solution

• The analytic solution can be found by following the steps shown in the below equations 2.5: • The following figure 2.11 shows the exponential decay in I(t) (in thousand persons) w.r.t. the time (in days) for different values of recovery rate α × 10^(−3) (/ day). As expected, the higher the recovery rate, the quicker all the persons in the population get rid of the infection.  • Now, I(t) + R(t) = N (since S(t) = 0 forever, since no more infection) and I(0) = N, combining with the above analytic solution I(t) = I(0).exp(−αt) = N.exp(−αt), the following equation is obtained:

• The following figure 2.12 shows the growth in R(t) (in thousand persons) w.r.t. the time (in days) for different values of recovery rate α × 10^(−3) (/ day). As expected, the higher the recovery rate, the quicker all the persons in the population move to the removed state. 2.3.2 Numerical Solution with the Euler’s Method

2.3.2.1 Solution with α = 167 × 10−3 / day

• Following the same iterations as above, the steepest error is obtained at t = 6 in this case, as shown in the figure 2.16.

• The first time when error becomes less than one person for t = 67 with the Euler’s method is with step size 1/256 .

• The solution obtained with the Euler’s method is 133.076171875 days ≈ 133 days to remove the infection from population with 10^(−6) tolerance. From the analytic solution,
I(133) = N.exp(−αt) = 1.016478E−06, similar result is obtained.

2.3.2.2 Results

The following figure 2.16 shows the solutions obtained with different step sizes using the Euler’s method. 2.4 Generic Model (with α, β > 0)

First, the numeric solution will be attempted for the generic model (using the Euler’s method) and then some analytic insights will be derived for the generic model.

2.4.1 Numerical Solution with the Euler’s Method

• The following algorithm 2.14 shown in the next figure is going to be used to obtain the solution using Euler method (the basic program for Euler’s method, adapted to include three dependent variables and three differential equations). • As can be seen from the figure 2.14, first the vector X(0) is formed by combining the three variables S, I, R at timestep 0. Then value of the vector at the next timestep can be (linearly) approximated (iteratively) by the (vector) summation of the vector value at the current timestep with the product of the difference in timestep and the derivative of the
vector evaluated at the current timestep.

2.4.1.1 Equilibrium points

• At the equilibrium point, There will be no infected person at the equilibrium point (infection should get removed).

• As can be seen from the following figure 2.15 also, I = 0 is an equilibrium point, which is quite expected, since in the equilibrium all the infected population will move to the removed state.

• Also, at every point the invariant S + I + R = N holds.

• In this particular case shown in figure 2.15, the susceptible population S also becomes 0 at equilibrium (since all the population got infected initially, all of them need to move to removed state) and R = N = 4500 (in thousand persons). 2.4.1.2 Results with Euler method

• As explained in the previous sections, the same iterative method is to find the right stepsize for the Euler method. The minimum of the two stepsizes determined is
∆t = 1/512 day and again this stepsize is going to be used for the Euler’s method.

• The following figures show the solutions obtained with different values of α, β with the initial infected population size I(0) = 1 (in thousand persons). Higher values for the parameter β obtained from the literature are used for simulation, since β = 25 × 10^(−6) /person /day is too small (with the results not interesting) for the growth of the epidemic using the Euler’s method (at least till ∆t = 1/ 2^15), after which the iterative Euler’s
method becomes very slow).

• As can be seen, from the figures 2.16, 2.17 and 2.19, at equilibrium, I becomes zero.

• The solution (number of days to reach equilibrium) obtained at α = 167×10^(−3) /day and β = 25×10^(−5) /person /day is teq = 143.35546875 ≈ 144 days with I(0) = 1 (in thousand persons), the corresponding figure is figure 2.16.

• The solution (number of days to reach equilibrium) obtained at α = 167 × 10^(−3) /day and β = 5 × 10^(−5) /person /day is teq ≈ 542 days with I(0) = 1 (in thousand persons), the corresponding figure is figure 2.17.

• Hence, higher the β value, the equilibrium is reached much earlier.

• The solution obtained at α = 500 × 10^(−3) /day and β = 25 × 10^(−5) /person /day is
teq ≈ 78 days with I(0) = 1 (in thousand persons), the corresponding figure is figure 2.19.

• Hence, higher the α value, the equilibrium is reached earlier.

• The solution obtained at α = 167×10^(−3) /day and β = 25×10^(−5) /person /day is
teq = 140 days with I(0) = 10. Hence, as expected, higher the number of initial infected population size, quicker the equilibrium is reached.

• At equilibrium, S does not necessarily become close to zero, since sometimes the entire population may not get infected ever, as shown in the figure 2.17, where at equilibrium the susceptible population is non-zero.    • As can be seen from the phase planes from following figure 2.21, at equilibrium, the infected population becomes 0. 2.4.2 Analytic Solution and Insights

2.4.2.1 Basic Reproduction Number (R0)

The basic reproduction number (also called basic reproduction ratio) is defined by
R0 = β / α (unit is /day). As explained in , this ratio is derived as the expected number of new infections (these new infections are sometimes called secondary infections) from a single infection in a population where all subjects are susceptible. How the dynamics of the system depends on R0 will be discussed next.

2.4.2.2 The dynamics of the system as a function of R0

• By dividing the first equation by the third in 2.1, as done in , the following equation is obtained: • Now, at t → ∞, the equilibrium must have been already reached and all infections must have been removed, so that lim (t→∞) I(t) = 0.

• Then from the above equation 2.7, R∞ = N − S(0).exp(R0.(R∞−R(0)))
.
• As explained in , the above equation shows that at the end of an epidemic, unless
S(0) = 0, not all individuals of the population have recovered, so some must remain susceptible.

• This means that the end of an epidemic is caused by the decline in the number of infected individuals rather than an absolute lack of susceptible subjects .

• The role of the basic reproduction number is extremely important, as explained in . From the differential equation, the following equation can be obtained: S(t) > 1/R0dI(t)/dt > 0 ⇒ there will be a proper epidemic outbreak with an increase of the number of the infectious (which can reach a considerable fraction of the population).

S(t) < 1 R0 ⇒ dI(t) dt < 0 ⇒ independently from the initial size of the susceptible population the disease can never cause a proper epidemic outbreak.

• As can be seen from the following figures 2.21 and 2.22 (from the simulation results obtained with Euler method), when S(0) > 1/R0 , there is a peak in the infection curve, indicating a proper epidemic outbreak.

• Also, from the figures 2.21 and 2.22, when S(0) > 1/R0 , the higher the the gap between S(0) and 1/R0 , the higher the peak is (the more people get infected) and the quicker the peak is attained.

• Again, from the figure 2.22, when 4490 = S(0) < 1/R0 = 5000, it never causes a proper epidemic outbreak . biol-2022/8175/image_dx5xCoM76ghoZ2XN.png?w=197&h. 197w" sizes="(max-width: 619px) 100vw, 619px" /> • Again, by dividing the second equation by the first in 2.1, the following equation is obtained:  • As can be noticed from the above figure 2.23 that because the formulas differ only by an additive constant, these curves are all vertical translations of each other.

• The line I(t) = 0 consists of equilibrium points.

• Starting out at a point on one of these curves with I(t) > 0, as time goes on one needs to travel along the curve to the left (because dS/dt < 0), eventually approaching at some positive value of S(t).

• This must happen since on any of these curves, as I(t) → ∞, as S(t) → 0, from equation 2.8.

• So the answer to question (2) is that the epidemic will end as with approaching some positive value and thus there must always be some susceptibles left over.

• As can be seen from the following figure 2.24 (from the simulation results obtained with the Euler’s method), when S(0) > 1/R0 , lesser the the gap between S(0) and 1/R0 , the higher the population remains susceptible at equilibrium (or at t → ∞). Conclusions

In this report, the spread of the pandemic influenza A (H1N1) that had an outbreak in Kolkata, West Bengal, India, 2010 was simulated using the basic epidemic SIR model.Initially there will be a small number of infected persons in the population, most of the population had susceptible persons (still not infected but prone to be infected) and zero removed persons. Given the initial values of the variables and the parameter (attack and recovery rates of the flu) values, the following questions were attempted to be answered with the simulation / analysis:

1. Whether the number of infected persons increase substantially, producing an epidemic, or the flue will fizzle out.

2. Assuming there is an epidemic, how will it end? Will there still be any susceptibles left when it is over?

3. How long will the epidemic last?
The following conclusions are obtained after running the simulations with
different values of the parameters and the initial values of the variables:

1. When the recovery rate α is ≈ 0 or very very low compared to the attack rate β (so that R0 = β / α >> 1) and I(0) > 1, the flu will turn out to be an epidemic and the entire population will be infected first (the higher β is the quicker the epidemic break out).

2. To be more precise, when the initial susceptible population S(0) is greater than the inverse of the basic reproduction number 1/R0 = α / β, a proper epidemic will break out.

3. When the initial susceptible population S(0) is less than the inverse of the basic reproduction number 1/R0 = α/β, then a proper epidemic will never break out.

4. If the initial susceptible population is non-zero, in the end (at equilibrium) there will always be some susceptible population.

5. When there is an epidemic, it will eventually end in the equilibrium point with 0 infected population, how fast it reaches the equilibrium depends upon the recovery rate (the higher α is the quicker the infection removal).

6. The time to reach the equilibrium can be computed using Euler method, it depends on the parameters α (the higher the quicker) and β (the higher the quicker) and the initial infected populated size I(0) (the higher the quicker).

7. Scope of improvement: The SIR model could be extended to The Classic Endemic Model  where the birth and the death rates are also considered for the population (this will be particularly useful when a disease takes a long time to reach the equilibrium state). biol-2022/8175/image_ihrd507z8UFHhYc.png?w=137 137w, biol-2022/8175/image_ihrd507z8UFHhYc.png?w=274 274w" sizes="(max-width: 669px) 100vw, 669px" />