1. Introduction
Myxobolus cerebralis (Myxozoa, Bivalvulida: Myxobolidae) (
Hofer 1903) is a parasite that infects salmonids, including trout and whitefish, resulting in whirling disease. The parasite invades the fish via the skin and consumes cartilage, which can cause skeletal deformities and the characteristic fish “whirling” instead of normal swimming behavior (
Hedrick and El-Matbouli 2002;
DuBey et al. 2007). Whirling disease was described in Europe (
Halliday 1976) in 1903, established in the US in the 1950s–2000, and was first detected in Canada in 2016 in the province Alberta (
Alberta Environment and Parks 2017). Up to 90% of infected juvenile fish may die (
Elwell et al. 2010), resulting in economic (
Turner et al. 2014) and ecological costs (
Kerans and Zale 2002). Infection is preventable in fish culture settings by disrupting parasite transmission (e.g., breaking the parasite’s life cycle by inactivation of parasite spores by ultraviolet (UV) irradiation (
Hedrick et al. 2008), chlorination, or heating (
Wagner 2002)), but management in natural ecosystems is limited to mitigating the risk of parasite exposure and establishment because parasite elimination is impractical (
Ayre et al. 2014), if not impossible. Modeling the spread and establishment of the parasite can inform management by revealing areas of high risk, allowing these areas to be prioritized for management actions.
The parasite has a complex life cycle involving two obligate susceptible hosts including a salmonid fish and a tubificid oligochaete worm,
Tubifex tubifex (
Gilbert and Granath 2003;
Hedrick and El-Matbouli 2002), and two waterborne spore stages, myxospores and actinospores (triactinomyxons, or TAMs;
Wolf and Markiw 1984). Susceptibility varies among salmonids, and the development and severity of clinical signs including myxospore formation (necessary for parasite transmission) depend on salmonid age and size and exposure conditions (dose and environmental conditions;
Hoffman and Putz 1969;
O’Grodnick 1979;
Hedrick et al. 1999,
2001a,
2001b,
2003;
Downing et al. 2002;
Ryce et al. 2004,
2005). Myxospores are released when the fish decomposes or is eaten (
Hedrick and El-Matbouli 2002), or for some fish, such as brown trout (
Salmo trutta), while still alive (
Nehring et al. 2002). Myxospores settle out in the stream sediments, where they can be ingested by
T. tubifex. After 60–90 days, infected
T. tubifex release the neutrally buoyant TAM stage into the water column with feces, where they can go on to infect new fish hosts (
Hedrick and El-Matbouli 2002), completing the life cycle.
The typical construction of models for disease or parasite establishment and spread relies on first, identifying the covariates, and second, understanding their cumulative effects on the target variable. Researchers have used logistic regressions to identify influential covariates (
Schisler and Bergersen 2002), yet the complex disease dynamics are often de-emphasized by these additive models, which are unable to model complex interactions among the covariates. Others have employed mechanistic models to understand the spread dynamics of
M. cerebralis and other myxozoan parasites, but with few covariates: ordinary differential equations with constant environmental covariates (
Turner et al. 2014) and partial differential equations (PDEs) with the two non-constant covariates temperature and stream discharge (
Schakau et al. 2019). In addition to simplifying the computations and analysis, limiting the choice of covariates is often inevitable in mechanistic models as the exact mechanisms by which the covariates impact the target variable are unknown. However, for
M. cerebralis, many experiments have provided evidence for correlations between parasite establishment or spread and potential covariates (
Krueger 2002;
Shirakashi and El-Matbouli 2010;
Kaeser and Sharpe 2006). Although typically insufficient to build predictive models, these results provide the essentials for understanding the underlying processes. Based on the correlations and both descriptive and mechanistic knowledge from the literature, experts can intuitively develop a causal understanding of the establishment and spread of the parasite, based on which qualitative and quantitative models can be constructed. Communicating the obtained “understanding”, however, can be daunting and often tedious.
Researchers have described their “beliefs” about processes via directed acyclic graphs (DAGs), whose nodes represent the target variable and covariates and whose arcs represent direct causal relationships (
Wu et al. 2018;
Bode et al. 2017;
Eklöf et al. 2013;
Herring et al. 2015). Such DAGs provide a framework for evaluating whether individual covariates impact the response variable directly, and if not, what the intermediate variables are.
Defining the seemingly trivial notion of “
A causes
B” has been long and controversially discussed by scholars. The two most used definitions are Granger causality (
Granger 1969) and Pearl’s causality (
Pearl 2009). Viewing “
A causes
B” as “
A →
B” leads naturally to using DAGs to represent casual connections among multiple variables, which has since been exploited and rigorously characterized in recent decades (
Greenland et al. 1999;
Pearl 2009;
Pearl and Mackenzie 2018). However, the fairly complicated notions as well as often infeasible interventions have hindered the broad use of the rigorously defined frameworks. In many cases, the use of DAGs has been limited to representing the causal relationships among the variables, in only an intuitive sense.
Besides being graphical, and hence, easy to interpret, a DAG can be readily used to construct a Bayesian network (BN), which is a probabilistic graphical model that can be fitted with data to make predictions (see Section 4.3).
Ayre et al. (2014) have modeled the risk of establishment of
M. cerebralis at a given management unit via a three-layer tree-structured BN with six covariates, categorized into three latent (i.e., unobserved) variables: worm host environment, connectivity (to other infected management units), and (co-occurrence with and spawning of) fish host habitat. The model provides a simple explanation of a subset of the covariates involved in the parasite spread. Also,
Bartholomew et al. (2005) have constructed a partially directed graph to assess the risk of
M. cerebralis establishment. However, a more detailed model is needed to elaborately incorporate the role of myxospore dissemination and spread (hereinafter termed “propagule pressure”; e.g., angler movement and fish movement and other covariates such as landscape, stream discharge, and water chemistry).
Our objective was to synthesize literature findings and experts’ opinions on the spread dynamics of M. cerebralis in any given river system and to visualize them via a DAG. To build the DAG, we conducted three workshops to compile experts’ beliefs and merged these with empirical findings extracted from the literature. We identified the variables that have a considerable impact on the settlement of M. cerebralis and connected almost any pair of variables whenever one had an immediate impact on the other. The resulting DAG indicates the direct and indirect mechanisms by which the covariates affect parasite establishment, as well as the intermediate variables necessary to understand the indirect mechanisms. The graph compiles the current knowledge of the “causal relationships” between the variables potentially involved in the spread of M. cerebralis, and hence, can inform management and provide the basis for future model development.
DAGs are well suited for modeling the complex life cycle of
M. cerebralis. Because the life cycle involves two different susceptible hosts, salmonid fish and worms (
Tubifex tubifex), and two distinct waterborne spore stages, myxospores and triactinomyxons (TAMs) (
Gilbert and Granath 2003;
Hedrick and El-Matbouli 2002;
Wolf and Markiw 1984), each stage can be compartmentalized and linked via a series of nodes. Moreover, nodes can be created specifically for myxospore production in infected fish hosts that are distinct from those releasing myxospore to the environment. This characteristic of DAGs may be particularly relevant for
M. cerebralis because of the variability associated with parasite transmission and success in different phases of the life cycle. For example, myxospores are primarily released from juvenile fish host tissue and settle into the sediment when the fish dies and decomposes or is eaten (
Hedrick and El-Matbouli 2002); however, they may also be released from live, clinically asymptomatic, adult fish, such as brown trout (
Nehring et al. 2002). In contrast, the myxospores must be ingested by
T. tubifex before the TAM stage can be released, after around 60–90 days at appropriate temperatures (
Kerans et al. 2005). Infected worm hosts then release TAMs into the water column along with faeces, and TAMs in turn may infect fish hosts upon contact with the skin (
Hedrick and El-Matbouli 2002). DAGs possess the flexibility to carefully illustrate these complex dynamics.
2. Methods
The initial phase of the project was to construct a conceptual pathway for
M. cerebralis spread to identify the underlying covariates and mechanisms. To accomplish this we conducted three workshops with the following attendees from Alberta Environment (AE), University of Alberta (UA), and Oregon State University (OSU):
•
Aquaculture: Janelle Sloychuk (aquaculture specialist, AE), Trish Kelley (aquaculture specialist, AE);
•
Data scientist–management: Alicia Kennedy (GIS analyst, AE), Chad Sherburne (resource data biologist, AE), Laurie Gallagher (regional issues manager, AE);
•
Fisheries biology: Andrew Paul (population ecologist and environmental modeler, AE), Bev Larson (fish disease specialist, AE), Clayton James (fisheries biologist, AE), Dave Park (director fisheries management policy, AE), Laura MacPherson (fish stock assessment biologist, AE), Marie Veillard (fisheries biologist, AE), Michael Sullivan (fisheries scientist, AE);
•
Invertebrate ecology: Julie Alexander (invertebrate ecologist, OSU);
•
Modeling: Mark A. Lewis (mathematical biologist, UA), Pouria Ramazi (modeler, applied mathematician, UA), Russell Greiner (machine-learning expert, UA).
Collectively, this expert team gradually constructed a DAG representing the conceptual pathways during the workshops. The first workshop focused on host and parasite habitat, resulting in a rough initial DAG that covered most of the major covariate groups, including hydrology, weather, and propagule pressure. However, some details, especially those on the underlying mechanisms of the propagule pressure, remained untouched. In the second workshop, we clarified some unclear terms used to describe the nodes, such as “flashiness”, and modified some links to improve the representation of the causal relationships. We also incorporated the parasite and fish pressure and human movement variables into the graph. The final workshop focused on further clarifications and detailed discussions on the components of propagule pressure.
We then used the literature on whirling disease to improve the graph created during the three workshops. We focused on empirical studies identifying the driving factors of the parasite establishment and spread. The literature supported many of the nodes and links. We modified the remaining unsupported and included new nodes and links to better explain those covariates and mechanisms that were not discussed in detail during the meetings.
During the workshops and literature-based refinements, we went through the following procedure to develop the graph: (0) Consider an arbitrary aquatic network, partitioned into several segments of roughly the same size with water flowing unidirectionally (i.e., downstream) from one section to another. (1) Consider an arbitrary segment and model the status of
M. cerebralis establishment by the response variable “parasite establishment”. (2) Model all major variables that affect parasite establishment as individual nodes, and link each of them to the parasite establishment node. These variables are either observable (i.e., well-defined and measurable), such as host worm density, or latent (i.e., abstract and not measurable), such as joint propagule pressure. (3) Repeat Step 2 for each of the secondary variables mentioned above and so on until we reach an observable node that is either often readily available to the managers or estimable by well-established models in the literature. So, in the end, we obtain a set of nodes

comprising the response, observable, and latent variables. In the following section, we describe each node, explain how to measure it, list its parenting nodes, and explain their influences on each other.
To obtain the arc set

, we link a node
i to a node
j if there is scientific evidence or expert intuition that variable
i causes or has a considerable impact on variable
j. However, in rare situations, we may link two variables just because they are strongly correlated, and one may predict the other. For example, there is a mathematical relationship between discharge, yearly precipitation, and watershed size, allowing any of them to be computed by the other two. In such instances, we arrange the variables by linking the easily measurable or predictable ones to the less accessible. For our purpose, accessible variables are those that are easily measured and likely to be widely available (e.g., temperature, measured by temperature loggers), as opposed to variables that are less easily measured (e.g., parasite spore densities), and thus less likely to be widely available. Generally, if two variables represented similar quantities, we selected the one with the best mechanistic justification or, if the variables seemed similarly valid, the most accessible one. To simplify the network, we avoid linking of a variable
A directly to another variable
C if there exists a path from
A to
C (e.g.,
A →
B →
C) that already explains the effect of
A on
C. For example, stream velocity affects or partially predicts flashiness (defined as frequency and rapidity of short-term changes in discharge;
Baker et al. 2004), which in turn affects host fish density, and hence, there is no direct link from stream velocity to host fish density. However, stream velocity and flashiness have different impacts on myxospore success, and hence, they are both linked to it. We constrain the arcs to be directional and prohibit directed cycles, resulting in the DAG

.
4. Discussion
Whirling disease can dramatically increase the mortality of some salmonid species (e.g., it was responsible for the near-to-complete collapse of rainbow trout in Colorado and Montana;
Nehring and Thompson 2003b), and there is now evidence of whirling disease impacting populations of rainbow trout in Alberta (
James et al. 2021). Although effects on recreational fishing and tourism may be negligible in some areas (
Elwell et al. 2010), the overall economic, ecological and social costs can be significant (
Ben-David et al. 2016;
Turner et al. 2014). Successful management requires a thorough spatial assessment of the parasite establishment risk. By synthesizing expert opinions and a wide body of literature, we have developed a graph displaying a detailed and clear overview of the many factors currently believed to affect
M. cerebralis establishment and the interplay of these factors. Consequently, the graph facilitates both our understanding and the management of whirling disease.
Our DAG provides an easy-access visualization of the existing results on and experts’ understanding of the establishment and spread of M. cerebralis. It additionally takes the key first step to identifying data requirements for quantitative analysis, building the corresponding BN, examining experts’ understandings in the form of hypothesis testing, and making probabilistic risk assessments.
4.1. Significance
Early detection of invasive species can increase the outcome of control and mitigation strategies, lowering the impact and encountered costs (
Finnoff et al. 2010;
Blackwood et al. 2010;
Epanchin-Niell and Wilen 2012). However, continuous monitoring of large river systems is costly and resource intensive. Therefore, models can help managers to focus monitoring efforts by assessing invasion risk and guiding management efforts to the locations where they are most effective.
A particular challenge for risk assessment models is the diversity of factors potentially impacting the ability of the invader — in this case,
M. cerebralis — to establish in a new location. As a result, insights from a variety of scientific fields are required to understand and predict the parasite invasion, establishment, and spread. Earlier models for
M. cerebralis spread either focused on specific mechanisms, thereby neglecting important environmental variables and human-mediated dispersal (
Ayre et al. 2014), or provided rather general frameworks for risk assessment without accounting for many insights from empirical research (
Bartholomew et al. 2005).
Our approach addresses both issues by merging empirical results from a large body of literature from different research areas to form a single comprehensive model for spatial prediction of M. cerebralis establishment. By being specific about mechanistic links, the model provides a framework into which future empirical findings can be embedded. The graph also facilitates the formulation of hypotheses that may be tested to improve our understanding of M. cerebralis spread.
In particular, the impact of propagule pressure on parasite spread is fairly quantified, thanks to the ongoing research on human, specifically angler, movement models (
Erlander and Stewart 1990;
Leung et al. 2004;
Bossenbroek et al. 2007;
Mari et al. 2011). The effect of parasite-establishment-(past) is also captured by mechanistic models (
Turner et al. 2014;
Schakau et al. 2019); however, more research is required to quantify the relationship between the other two parasite-occurrence covariates (i.e., infected-fish-density and infected-worm-density) with the target variable parasite-establishment. The importance of the habitat-and-host variables in parasite establishment is confirmed in many studies (
Elwell et al. 2010;
Ryce et al. 2005;
Alexander et al. 2011;
Gilbert and Granath 2001), yet a model capturing the governing mechanisms is missing. The effect of climate covariates on the other nodes in the DAG is well-studied (
Krueger 2002;
Kerans et al. 2005;
Touazi et al. 2004). The role of water-chemistry nodes in parasite establishment is perhaps the most blurred, as there have been few related studies (
Sandell et al. 2001;
Smith et al. 2002) with many open questions. There is a rich body of literature on the hydrology-and-river-morphology variables (
Chaudhry 2007;
Gordon et al. 2004). Moreover, the effects of the two variables stream-velocity and discharge on the habitat-and-host nodes have been quantified using partial differential equations in
Schakau et al. (2019). However, characterizing the co-effect of flashiness, river-sediment, stream-velocity, and stream-slope remains concealed. Similarly, more work is required to quantify the exact effect of landscape variables on other connected variables. Out of all these variables, understanding the underlying mechanisms of those directly connected to the target node parasite-establishment is likely to be more influential in modeling the parasite spread and establishment process.
Our DAG and the reviewed empirical results also provide the essentials for building future mechanistic, statistical, and machine-learning models. Modelers can start from the target variable parasite establishment and model the mechanism via which it is impacted by its parenting nodes. Modelers may repeat this process for the nodes higher in the hierarchy and stop at any variable for which data are available in the area of interest since then there is no need to estimate that variable using its predictors in the upper hierarchy.
The information provided in this synthesis largely summarizes the necessary variables needed to estimate the potential establishment and spread of
M. cerebralis or similar aquatic invasive species into novel environments. As such, the conceptual information presented herein would primarily appeal to not yet established jurisdictions where the risk of parasite introduction is high. For example, British Columbia currently has no known detections of the parasite to date but has susceptible fish populations in close proximity to
M. cerebralis positive watersheds in Alberta and has recently implemented a surveillance program (
Freshwater Fisheries Society of BC 2018;
Government of British Columbia 2018). This would represent a scenario whereby our DAG may help inform sampling efforts in areas where a higher probability of detecting the parasite exists.
On the other hand, for locations such as Alberta, in which the distribution of the parasite is generally known but there exists a risk of whirling disease outbreak and loss to fish populations (
James et al. 2021), variables (nodes) more specific to disease outbreak (rather than parasite spread and establishment) should be considered in the predictive models. Such variables include (
i) a high abundance of the aquatic worm host, (
ii) the presence of vulnerable juvenile fish hosts, and (
iii) water temperatures ideal for the development of the parasite (
Elwell et al. 2010;
James et al. 2021).
4.2. Graph construction
The graph reveals the interaction hierarchy of the variables impacting M. cerebralis establishment. Thereby, direct effects (or “causalities”) are represented by links, and indirect effects via paths of links. Although many links in the graph represent direct causal relationships, the exact mechanism behind some links may be unknown, and for some relationships there is no causal direction. For example, discharge and velocity relate via a simple mathematical equation: discharge = velocity × stream width × stream depth. However, discharge and velocity are not causally related (i.e., neither velocity nor discharge “causes” the other). Hence, the link from discharge to velocity is not causal but indicates dependence or correlation.
There is not a unique choice of variables to explain parasite establishment dynamics, even when they have a simple life cycle. The interactions between environmental and biotic variables are often highly complex and involve numerous intermediate factors. For example, vegetation and land use can affect river sediment via different mechanisms, including the erosion of soil that can then be flushed into water bodies. To reduce the graph complexity, we focused on the variables and interactions deemed most influential and neglected latent intermediate variables if they had a single predictor. However, if the intermediate variables were already included in the graph, we used them to represent mechanisms more accurately. For example, a positive correlation has been reported between sentinel rainbow trout infection and conductivity (that can serve as a proxy for nutrient enrichment) in
Sandell et al. (2001). This correlation may be explained via the positive effect of nutrient enrichment on host worm density, which increases TAM density, parasite establishment, and infected fish density, in order. Therefore, we did not include a direct link between nutrient enrichment (or conductivity) and fish infection prevalence in the graph.
In our effort to synthesize literature findings, we needed to consider potential intermediate variables for correlations and unify results by merging similar variables considered in different studies. Some mechanisms may be similarly well described via different environmental predictors. For example, maximal stream temperature is strongly correlated with degree days, and either of the two variables could be used to model a temperature-driven mechanism. Similarly, changes in conductivity can be used as a predictor for water pollution via nutrient enrichment (
Fondriest Environmental Inc. 2014). However, because there are multiple mechanisms that impact conductivity, we chose to focus on nutrient enrichment instead.
We excluded variables with little influence, such as the density of non-susceptible host fish (e.g., longnose suckers (
Catastomus catastomus) and carp). Although TAMs can still attach to these “non-target” fish, thereby reducing the density of TAMs available to infect target hosts (
Kallert et al. 2009), there are typically still enough TAMs to complete the parasite life cycle.
We have constructed the graph under the assumption that variables take value in the same fixed time interval, say over a year. Some variables such as annual precipitation and degree days are defined on this basis; others such as flashiness and parasite establishment may be defined at a specific time of the year, say the end of August. One may average out some of the variables over the whole year or the time interval of interest. For example, stream velocity may be averaged over the year, or just over the months that are believed to have maximal effect on the establishment of the parasite. Some variables may have been measured occasionally, and hence, at different time intervals than other variables. For example, it might be that water pH is measured in the year 2010, whereas the parasite establishment is studied in 2015. This is typically not a major issue as long as the spatial pattern of the variable remains unchanged over the years.
4.3. Extension to Bayesian networks
In addition to its qualitative use, our DAG can be used to assess the risk of parasite establishment quantitatively if the network is parameterized as a BN. If each node
v in the DAG is understood as a random variable, we can parameterize its dependency on its parents by the conditional probability distribution (CPD). More specifically, associated with each node
v a random variable
Xv is defined, which can be either discrete or continuous. For example, the discrete random variable
Xdischarge can be either Low or High to indicate the discharge level. Similarly, corresponding to the parents of this node, the binary random variables
Xwatershed-area and
Xyearly-precipitation are defined. Being affected by its parental nodes, the probability distribution of discharge depends on that of its parents. This results in the CPD
P(
Xdischarge |
Xwatershed-area,
Xyearly-precipitation), which is presented in
Table 2. In general, the CPD of each random variable
Xv is written as

, where

indicates the random variables corresponding to the parents of node
v.
One may either manually specify the CPDs or learn them from data (see
Marcot et al. 2006 for guidelines). The network

together with the CPDs associated with each node results in a BN that factorizes the joint probability distribution of all of the variables

as follows (
Pearl 1988;
Koller and Friedman 2009):
The resulting BN can then be used to compute the chances of parasite establishment in a region of interest, given the specified values of some of the covariates.
The BN also encodes a precise set of probabilistic interdependencies among the variables. Any two nodes that are linked by a path of arcs without a “v-structure” (i.e., two arcs pointing towards each other:
a →
b ←
c) are probabilistically dependent. For example, TAM-success and infected-worm-density are probabilistically dependent as they are linked by the directed path TAM-success → parasite-establishment → infected-worm-density in
Fig. 1. So are TAM-success and myxospore-success as they are linked by the path TAM-success ← degree days ← myxospore-success. However, conductivity and water-pH are probabilistically independent as they are only linked by the path conductivity → TAM-success ← water-pH, which is a v-structure. There are many other more-subtle ways to read off independencies (and its complement, dependencies), which can be used to help understand the connections. For example, two variables may become conditionally independent given another variable. One may refer to the notion of d-separation for identifying all conditional independencies between the variables in a BN (
Pearl 1988). Such (conditional) independencies can then be tested based on collected data, and if violated, suggests ways to produce a more accurate DAG structure.
Compared to other mechanistic, statistical, and even most machine-learned models, BNs have the advantage of handling missing values. For example,
Fig. 1 shows more than 50 nodes; a standard model would not be able to estimate the chance of parasite-establishment unless we provide all values of these nodes (or at least, those of parasite-establishment’s Markov blanket, which are its parents, its children, and the “co-parents” of those children). However, as this BN is a probability distribution, one can simply marginalize out the unspecified variables to obtain the probability of parasite establishment.
Moreover, some parts of the BN, for example the sub-graph consisting of the node propagule pressure and its parents and connecting links, can be modeled separately by models such as gravity models (
Muirhead and MacIsaac 2011;
Potapov et al. 2011) and then be connected to the remaining of the BN. This allows us to mechanistically model those parts with known mechanisms and estimate the rest from data. Finally, control can also be integrated into BNs via the introduction of decision nodes, representing management actions that can be taken at all or some of the partitioned regions, and utility nodes, indicating the associated costs and benefits (
Koller and Friedman 2009). The resulting BN is also referred to as a Bayesian decision network or influence diagram (
Nyberg et al. 2006).
Instead of manually constructing the DAG, which produced the model in this paper, one may learn it directly from data. That is, given the variables and an existing dataset, a BN-learning algorithm will attempt to produce the BN that best fits the data, with respect to some scoring function (e.g., log-likelihood;
Scutari 2009;
Koller and Friedman 2009;
Ramazi et al. 2021b). Although such data-driven BN structures are a good fit to the data and may suggest connections beyond the state-of-the-art knowledge, they may not make perfect ecological sense, especially if one attempts to read the links causally. Moreover, manually constructed BNs often suffer less from overfitting. Therefore, depending on the goal, one may use either the data-driven or manually constructed structure. The first is typically better at making predictions whereas the second may be better for interpreting the already known scientific hypotheses. Clearly, if a manually constructed BN encompasses accurately all causal relationships and underlying mechanisms, and for each instance to predict, we have values for all of the covariates, then we expect it to outperform other structures, including the data-driven ones. However, this is rarely the case in real world scenarios, motivating the parallel development of machine-learning approaches for prediction purposes.
4.4. Limitations
Although we have collected empirical results confirming the included links, our graph does not yield quantitative results. It describes only which variable affects another, but does not reveal how. For example, while it claims that water chemistry, river sediment, and nutrient enrichment affect host worm density, it does not provide the exact effecting mechanism. The mechanisms could be modeled either by using machine-learning algorithms (
Ramazi et al. 2021a,
2021c), based on data, or mechanistically with, for example, logistic regressions or PDEs (
Schakau et al. 2019), based on the literature (cf. the references that we provided for each node). In particular, evolutionary game theory provides a framework to model human behavior in terms of “cooperation” or “defection”, that is, limiting or accelerating disease spread by following or violating decontamination policies (
Nowak and Sigmund 2005;
Govaert et al. 2017,
2021;
Riehl et al. 2018).
The presented graph is directed acyclic, and does not model the full life cycle of the parasite. This is because a DAG does not allow for modeling reciprocal relationships between the variables. For example, high fish density enhances parasite establishment; however, the establishment of the parasite may decrease fish density. We have not modeled the second, reciprocal effect, as we have already modeled the first, unreciprocal. Nevertheless, a temporal extension of the model allows capturing these effects. For example, if parasite establishment and fish density are present at years
t and
t + 1 in the graph, then we can link fish density at year
t to parasite establishment at year
t + 1, and similarly, parasite establishment at year
t to fish density at year
t + 1. Such temporal connections lead to the construction of “temporal Bayesian networks” (
Koller and Friedman 2009).
Besides the limitations due to the acyclic structure, DAGs may not easily capture highly dynamic processes on a mechanistic level, and other approaches such as differential equations may be better suited for this task (
Schakau et al. 2019). On a time-aggregated and phenomenological level, however, graphical models can yield accurate results (
Koller and Friedman 2009), and the option to include a variety of many environmental variables in the model may offset mechanistic deficiencies.
Finally, we have conducted a thorough literature review to collect empirical knowledge about mechanisms deemed significant by experts and described in the literature. However, it is difficult to oversee the whole diversity of relevant scientific studies, and future research may suggest new or, even different, predictor variables and interactions. Nonetheless, the presented DAG provides a framework that can be extended to accommodate such results in a relatively easy, modular fashion.