This document provides a brief overview of different types of simulation models, going beyond the ones that we focus on in this course.

- Be familiar with different types of simulation models.

As you learned previously, computational/mathematical models come in many different variations. We already discussed the major classification into phenomenological (statistical) and mechanistic simulation models. Within the category of simulation models, there are again different types. The simple models we have explored so far, and that we will keep focusing on in this course, are all compartmental. Some are deterministic, some stochastic. The following sections provide a brief description of different model types and formulation to place things into context.

There are many ways mechanistic simulation models can be formulated and implemented. Here are some ways of characterizing them:

*Compartmental*or*Agent-based**Discrete-time*or*Continuous-time**Deterministic*or*Stochastic**Non-spatial*or*Spatial**Memory-less (Markov)*or*With Memory**Small*or*Big**Data-free*or*With Data*

The most common model type is one using ordinary differential equations (ODEs). Such models are usually compartmental, continuous time, deterministic, space-less, memory-less, and small(ish).

We’ll briefly go through each of those categories to briefly illustrate the kinds of models that belong to one or the other. This is meant only as a quick survey, so you are familiar with other models. No details will be provided.

**Compartmental models** are models in which the units
we want to track (variables, e.g. bacteria and immune response) are
treated as homogeneous groups (compartments), and one only tracks
population numbers/sizes. This is the simplest type of model. Sometimes
one can perform analytic computations, it is often easy to implement and
run on a computer. Also, most available data comes in the form of total
numbers (e.g. virus load, levels of a specific cytokine), thus
compartmental models are natural choices for fitting to data. Because of
their simplicity, compartmental models are often the best starting
point. However, the assumption that each unit (e.g. each cell) is the
same (homogeneous) and interacts with other entities of the system in a
*well-mixed* manner (like gas molecules bumping into each other
randomly in a container) is clearly wrong. It is often a reasonable
approximation, and thus compartmental models are still the most widely
used. However, for certain questions or systems, this approximation
might not be appropriate and thus other model types are needed.

**Agent/Individual based models (ABM/IBM)** track every
unit/agent/individual instead of just total numbers. Thus, instead of
just keeping track of total virus load, every virion is individually
modeled and its actions tracked. Such models allow (almost) no
mathematical analysis and tend to be harder to implement and run on a
computer. Often, IBM tend to include a lot of details of the system
under study. This means the model has many parameters and is “data
hungry”, i.e. one needs to know values for those parameters to be able
to run the model. Due to the fact that each individual unit needs to be
tracked, such models take long to run on a computer. Further, most data
does not come in the form of individual level data, thus fitting such
models to data is more difficult. A good feature of IBM is that while
they tend to be difficult to build and run, they are often conceptually
easy to understand. One simply specifies all processes that one wants to
include and implements them in the form of computer code. Because IBM
are more detailed, they can potentially the most realistic models. For
systems where we have a lot of information and we want detailed
predictions, IBM might be good options.

In general, it is useful to start with a simple model if the
system/question is new, even if the eventual goal is to move to an IBM.
While one can write IBM in `R`

, it is not that well suited
due to speed limitations. Dedicated software exists for IBM, or modelers
often use general purpose languages such as C.

**Discrete time models** are models where the variables
are updated in discrete time-steps. Such models are good for systems
where there is a “natural??? time step. For instance in malaria
infections, the parasite enters a red blood cell (RBC) and replicates
before bursting within a fairly determined time-window. One could model
such a system by setting the model time-step to the duration of the
replication phase inside the RBC. However, if one tracks the total
population, most processes are better approximated by a continuous
process. For instance in the malaria example, while each parasite spends
a discrete, and similar amount of time inside a RBC, soon after the
infection the entering and exiting of RBC is out of sync and thus on a
population level, these processes occur continuously. Thus, for
biological reasons, a discrete time model is rarely needed (but often an
acceptable approximation). A reason why discrete time models are often
used is the fact that for complex models, such as ABM/IBM, one has to
update the system at discrete times for computational reasons. Note that
as previously discussed, if the time-step becomes small, a discrete-time
model approaches a continuous-time model. The exact meaning of”small”
depends on the system and can be numerically explored.

**Continuous time models** are models that assume that
each process occurs continuously. This is generally the most parsimonous
approximation for compartmental models that track a large number of
individual units. Each unit (e.g. each cell or virus) undergoes certain
processes (birth, death, infection, etc.) at discrete time steps, but
for the population as a whole, these processes occur in an ongoing,
continuous manner. Continuous-time models are usually described by
differential equations. Ordinary differential equation (ODE) models are
the most common and simplest one. Stochastic compartmental models can
also be continuous-time, as described more below. The ODE models you
were introduced to in the previous section are all continuous time
models.

**Deterministic models** always produce the same
outcomes once parameters and initial conditions have been specified, no
matter how often you run the model (or what software you use to do so).
Such models are simple and easy to implement. All ODE models fall into
this category. They run quickly on a computer, are fairly easy to
analyze, and one can sometimes perform analytic calculations without the
need for simulations, at least for simple models. The drawback is that
real biological systems are never deterministic, though sometimes
approximately so. In general, when large numbers of entities (cells,
virus, etc.) are involved, the deterministic approximation tends to be
reasonable. If numbers are low (e.g. at the start or end of an
infection), stochastic processes might be important. All ODE models,
including the models you have encountered so far, are deterministic.

**Stochastic models** have inherent
noise/randomness/stochasticity built in. This means that even for the
same parameter values and starting conditions, you might get different
results. For instance in one simulation, you might get an infection, in
another the infection might not take off. That is due to randomness,
which is implemented in computer models by using random numbers to
decide how the model should react. Stochastic models are closer to the
real world, and stochastic effects are especially important if numbers
are low. For instance if you have 100 bacteria, it doesn’t matter if one
of them first divides, then dies, or the other way around. However, if
you have a single cell, the order in which things happen matters, since
if the cell dies, there is no possibility for later division.

In general, any scientific question of the form “what is the probability for X” requires some amount of stochasticity. Deterministic models only produce a single result, so they can’t help answer any probabilistic questions.

The drawback of stochastic models is that they take longer to run on
a computer, since one now needs to run a model more than once to get a
distribution of outcomes, instead of just a single time for
deterministic models. That also makes model results a little bit harder
to analyze. Another drawback of stochastic models is that they are more
difficult to fit to data. Modern software packages (e.g. the
`pomp`

package in `R`

) make things somewhat
easier, but it is still technically and computationally more challenging
to fit stochastic models to data.

Note that sometimes, you might want to start with a deterministic model, just to quickly have one up and running, and then also build a stochastic one. It is important to keep in mind that for the same model with the same model settings, the average (mean or median) of the stochastic model results does not necessarily need to agree with the deterministic model. This is only true for linear models, and the simulation models we usually use are nonlinear.

Most IBM/ABM are stochastic models. This is not a requirement, but since ABM/IBM generally try to be as realistic as possible, and rarely used for fitting data, it makes sense to include the additional realism that comes from stochasticity into the model. While a deterministic ABM/IBM is certainly possible, they tend to be rare.

**Non-spatial models** are models that do not contain
any explicit notion of space. The modeled entities (e.g. cells, virus)
are assumed to exist in some undefined, homogeneous space in which they
can interact, similar to a mix of gases that randomly move around and
bump into each other. This is also called the well-mixed assumption.
Most compartmental models make this assumption. All the models you have
encountered so far do not explicitly account for any type of spatial
structure.

**Spatial models** are needed if one wants to account
for spatial structure or ask questions that relate to spatial structure.
In such models, space is included in some fashion. There are different
ways one can do that. The simplest one is to stick with compartmental
models (e.g. ODEs) but have different sets of equations for different
locations. E.g. one could have one set of equations describing virus
infection in the lung, and another set describing dentritic cell and
T-cell dynamics in a lymph node, with migration terms between those
sites. Such a model, often called patch or meta-population model is
fairly simple to implement and follows the usual rules for ODE models
(or discrete/stochastic equivalents). The disadvantage is that within a
site, e.g. within the lymph node, the model still makes the well-mixed
assumption. Another type of models that can be used are partial
differential equations (PDEs) that include time as one dimension (as in
our usual ODE models) and then have further dimensions for space (1,2,
or 3d). In general, given the structure of space within an infected
host, those models are not that generally applicable and they also tend
to be hard to work with (implement and run). Thus, PDE models are not
that common in immunology and within-host modeling. The last type of
models that accomodates space are ABM/IBM. Each entity in an ABM needs
to be placed into some type of space. This could be a simple 2- or 3-
dimensional space, or it could be a more complex structure, e.g. a
network. Such network models are common when modeling infectious
diseases at the population level, less so for immunology.

Overall, if the question you want to address has a spatial component, or you suspect spatial structure plays an important role in shaping the dynamics of the system, then including space in some form is useful. My recommendation is to start with a compartmental meta-population model, and once you have explored that model, either use it or decide that it is not detailed enough and a full ABM is needed.

**Memory-less models**, also called (Markovian or Markov
models) are models where the future only depends on the current state on
the system, and no information from the past needs to be explicitly
considered. ODE models are such types of models. This is an
approximation to real systems. For instance, it means that an infected
cell can produce virus, no matter how long ago it became infected.
Sometimes, this is a reasonable approximation, and many ODE models have
been used successfully in answering questions, despite this
approximation. However, sometimes one wants to keep track of the past,
e.g. one wants to track how long ago a cell became infected, and make
processes such as virus production or cell death dependent on this time
since infection. This requires models with memory.

**Models with memory** are needed if we want to keep
track of the past, e.g. if we want to let the chance of recovery depend
on the time since infection. In that case, we can’t use basic ODE
models. There are several ways one can include memory. One approach is
to retain the ODE model type, but include what is often called “dummy
compartments” (also known as “linear chain trick”). This introduces
additional compartments into the model and allows some level of tracking
of time. Some examples of this approach can be found in (Wearing, Rohani, and Keeling 2005; Lloyd
2001).

Another option are PDE, where a variable such as age since infection
is added. As mentioned above, PDEs are hard to work with, thus maybe not
the preferred approach unless your math background is strong. Delay
differential equations (DDEs) are another option, those are somewhat “in
between” ODEs with dummy compartments and PDEs. They are not too hard to
implement, e.g. the `deSolve`

package in `R`

has
functionality that allows for DDEs. However, they often have unstable
dynamic behavior and one thus has to be careful when working with them.
Finally, ABM are again an option. Since ABM are essentially just a list
of computer rules, and each entity is tracked, one can easily assign an
entity (e.g. a cell) features such as “age since infection” and by
keeping track of it, can make future processes (e.g. virus production,
cell death) dependent on the age since infection.

For some further discusssion of the impact of different delays and distributions for within-host models, see e.g. (Holder and Beauchemin 2011).

The categorizations of models described so far are rather clear-cut. A model is either in a category or is not. This categorization is more fuzzy. What constitutes a small versus a large model is somewhat fuzzy. I consider a compartmental with few (not more than 6-7) equations/variables on the small side, everything else is large. It is generally a good idea to start with a small, simple model, and try to capture the most important aspects of the known dynamics of the system. Such a simple model is quick to write and fairly easy to analyze, and one can get a full understanding of the model behavior. This provides quick initial understanding of the model and the system. Simple models can also be fit to data, even if the data is sparse. As you probe the model, you might notice limitations (e.g. a poor fit to the data, or results that don’t match what is known about the system). Remember that for the kinds of models we discuss here, model rejection (e.g. poor agreement with or fit to data) is helpful, it taught you that what you thought was going on in the system is not sufficient to describe the observed data and you need to alter or extend your model.

At some point, you will likely want to extend your model to improve agreement with data, increase realism, or allow answering questions for which the initial model is not detailed enough. At that stage, you can add to the model and increase its complexity. Such increase in complexity can be a larger model. It could also be a different model type, e.g. going from a deterministic to a stochastic model, or from a compartmental model to an ABM. Big models need to be almost entirely analyzed by running simulations and investigating the outputs. It is often hard to understand how the different components of the model influence the result. Careful analysis is needed. Since large models tend to overfit, they are also not the best suited when trying to fit data.

The advantage of large, complex models is that they can include a lot of detail and thus might be the most realistic. Large models are harder to build and analyze. They are warranted if you know a lot about the system and need a model that is realistic enough that you can make detailed predictions. It is my opinion that for most aspects of immunology, we do not yet understand all the pathogen-host interactions in enough detail to be able to build detailed, realistic models that lead to accurate and reliable predictions. The field of population-level modeling (epidemiology) is a bit further ahead in that area (though still not quite there). I expect that as within-host and immunological data will continue to increase in quantity and quality, it will be possible to build complex, detailed models that can be used to make reliable predictions.

We touched on this idea before when discussing model uses. If one wants to use a model for exploration and prediction, it should of course be built based on biological knowledge (data), but it does not need to be fitted to any specific data source to be useful. Such a model can be considered “data-free”, though this term does certainly not imply “reality-free”, the model still needs to be firmly grounded in what is known about a given system. Both simple and complex models can and are used in this way.

If suitable data is available, one can fit simulation models to such data in a statistically rigorous manner. With this approach, one can discriminate between competing mechanisms/hypotheses and estimate parameter values. Models that are being fit to data need to be tailored to the data. They need to include any component for which information is available, and often, to keep the model simple and prevent overfitting, need to make strong simplifications for other components. For instance if one wanted to fit virus load data from some infection, and virus load data was the only available data, building a complicated model with many immune response components and trying to fit all those model parameters would be futile. Instead, one might need to limit oneself to a few immune response components and possibly even fix the parameters for several processes in the model based on a priori biological knowledge to be able to estimate the remaining parameters from the available data.

Simulation models are one type of computational/mathematical model. Within the category of simulation models, many different variants exist. The above paragraphs provided brief overviews. Not all combinations of features are equally common. For instance ODE models are very common. Those are generally deterministic, compartmental, continuous time, no-memory, non-spatial. In contrast, most agent-based models are stochastic, spatial, and contain memory. Some combinations are possible but not used much (e.g. deterministic ABM are possible but rare), others are not possible (ODE are by definition deterministic).

There is no single best kind of model, it depends on the
question/scenario. Choosing the right model type for a given project is
like choosing any tool or approach in science: Pick the one that’s best
and most suitable to answer the question. Choosing an appropriate model
for a given task is part of the *art* of good modeling, there is
unfortunately no recipe. Of course, choosing the best model solely based
on scientific considerations is the ideal idea. In practice, other
considerations come into play. The expertise of the person building the
model will play a role, so do feasibility (computation time, model
complexity), “environment”(what approaches do others use), and
“marketing” (what kind of models are fashionable).

Holder, Benjamin P., and Catherine AA Beauchemin. 2011. “Exploring
the Effect of Biological Delays in Kinetic Models of Influenza Within a
Host or Cell Culture.” *BMC Public Health* 11 (1): S10. https://doi.org/10.1186/1471-2458-11-S1-S10.

Lloyd, A. L. 2001. “The Dependence of Viral Parameter Estimates on
the Assumed Viral Life Cycle: Limitations of Studies of Viral Load
Data.” *Proceedings of the Royal Society of London. Series B:
Biological Sciences* 268 (1469): 847–54. https://doi.org/10.1098/rspb.2000.1572.

Wearing, Helen J, Pejman Rohani, and Matt J Keeling. 2005.
“Appropriate Models for the Management of Infectious
Diseases.” *PLoS Med* 2 (7): e174. https://doi.org/http://dx.doi.org/10.1371/journal.pmed.0020174.