Data
The USGS conducts separate surveys to sample nearshore and offshore locations in Lake Superior (
Fig. 1). Nearshore sampling sites were located around the perimeter of the lake with sampling typically occurring in May and June (∼3% of trawls took place in April, July, or August) during daylight hours. Trawls were conducted with a 12 m Yankee bottom trawl with either a chain or 15 cm rubber roller foot rope (
Vinson et al. 2018). The roller foot rope was used at sites with steeper, rockier bottoms to reduce snagging. A mean of 77 nearshore sites were sampled annually (range: 52–87). Nearshore trawls were conducted across depth contours. Depth was measured at the start and end of each trawl. Start depths ranged from 9 to 68 m (median: 16 m) and end depths ranged from 12 to 144 m (median: 59 m). The median increase in depth within a trawl was 41 m. The mean of start and end depth was used to represent depth in the analyses, with mean depth ranging from 15 to 92 m (median: 39 m).
The offshore survey occurred in July except for 2012 when sampling was conducted in August. A mean of 38 offshore sites were sampled annually (range: 30–53). The variability in the number of samples across years was due to 2011 and 2016 when additional shallower sites (∼85 m) were sampled; the number of locations sampled in other years ranged from 30–36 sites. Offshore trawls were conducted along depth contours with sampling depths ranging from 87 to 315 m (median: 181 m).
Trawl collections were counted and weighed by species. Species-specific fish abundance (number·ha−1) and biomass density (kg·ha−1) were estimated by dividing sample counts and weights by the bottom area fished during each trawl.
Statistical analysis
The addition of offshore trawls in 2011 was a significant change to the survey. Therefore, the dataset was split and analysed separately. The first analysis included only nearshore trawls from 1989–2018 with the purpose of identifying long-term trends in occurrence and biomass density of PWF (hereinafter referred to as long-term trend analysis). The second analysis included nearshore and offshore trawls from 2011–2018 with the purpose of estimating lake-wide biomass (hereinafter referred to as lake-wide biomass analysis). The long-term trend dataset consisted of 2314 trawls, of which 949 (41%) caught PWF. The lake-wide biomass dataset consisted of 919 trawls, of which 314 (34%) caught PWF.
Due to the high percentage of trawls in which no PWF were caught, we used hurdle models in the analysis. A hurdle model is a two-part model: a Bernoulli process to analyse the presence–absence data and a continuous process (here a gamma model) to analyse the positive catches. Presence–absence,
, at location
i and time
t was modelled as follows (
Zuur and Ieno 2018):
and positive biomass density (
, kg·ha
−1) was modelled as follows (
Zuur and Ieno 2018):
where
π represents mean occurrence in the catch and
μ represents the mean biomass density when PWF was present, with
r the gamma distribution shape parameter.
Zi,t represents all fixed effects (e.g., intercept or intercept and covariates). The trend in time,
ut, was modelled as a random walk function with noise term
wt, such that (
Zuur et al. 2017)
where
Additional random effects incorporated into the models are represented by
e. For each analysis, we fit three models that differed in their random structure to account for different levels of dependencies in the data (
Table 1). Hyperparameters for the random effects were estimated separately for the presence–absence and positive catch portions of the hurdle models. The simplest model included no additional random structure (
e = 0) and is referred to as the generalized linear model (GLM). The second model included a random effect for sampling location accounting for the potential dependency at each sampling location across years where
e = site and the random effects are independent and identically distributed with a mean of 0 and variance of
. This model is referred to as the mixed model. The third model included correlated random effects at sampling locations accounting for potential dependencies at each sampling location across years as well as potential correlations among observations that are closer together in space. This model is referred to as the spatial model. For this model,
e =
vi, where
vi is a GMRF with mean 0 and covariance matrix
Σ. The covariance matrix
Σ included variance
and correlations estimated by the Matérn correlation function using SPDE (
Lindgren et al. 2011). In INLA, the Matérn correlation function is defined by an estimated range parameter,
R, which describes the distance at which the correlation between observations drops to 0.1. The GMRF is estimated over a mesh consisting of non-overlapping triangles constructed with built-in INLA functions (see online Supplement; Fig. S1
1).
The long-term trend analysis consisted of 30 years of data. To allow for possible shifts in spatial distribution over time, multiple spatial fields were estimated. Rather than estimate a spatial field for each year of data, which is computationally intensive, four spatial fields were estimated at evenly distributed knots,
k, in the time series (1992, 2000, 2008, and 2016). At each knot a spatial field was estimated as the weighted average of data from the surrounding years. The knot years were selected such that each spatial field had an approximately equal weight (total number of years in time series influenced by each spatial field). Correlation between spatial fields was assumed following an AR1 process with correlation Φ; therefore, the spatial field was a function of the previous field and the new spatial effect,
si,k, such that the random spatial effect becomes (
Krainski et al. 2019)
The lake-wide biomass analysis represents only eight years of data and therefore a single spatial field was estimated for the model assuming that PWF distribution did not change over that time.
In the long-term trend analysis, we were only interested in the temporal trend in occurrence and biomass density, so additional covariates were not incorporated into the models. In the lake-wide biomass analysis, additional covariates were necessary to make lake-wide projections (
Table 1). We included depth as a second-degree polynomial in all models. Preliminary analysis revealed a nonlinear effect of depth on occurrence and biomass density with the second-degree polynomial providing a superior fit to higher degree polynomials. As well, positional data (as eastings and northings (km)) were included as continuous linear variables in the GLM and mixed model. Positional data were not included as fixed effects in the spatial model as this information is represented in the spatial field. Vague priors were used for all model components.
The lake-wide biomass analysis models were used to make spatial projections of PWF biomass for 2018 based on the probability of occurrence and biomass density predictions. The projections were made on a 1 km
2 grid across the lake. The projections were summed to give estimates of lake-wide population size. Occupied habitat and preferred habitat were estimated as the amount of habitat with projected biomass density >0 kg·ha
−1 and the amount of habitat with >50% predicted occurrence, respectively. To incorporate random effects from the mixed model the values were interpolated using inverse distance weighting and the package intamap (
Pebesma et al. 2011).