Predictions of the distribution of groundfish species are needed to support ongoing marine spatial planning initiatives in Canadian Pacific waters. Data to inform species distribution models are available from several fishery-independent surveys. However, no single survey covers the entire region and different gear types are required to survey the range of relevant habitat. Here, we demonstrated a method for integrating presence–absence data across surveys and gear types that allows us to predict the coastwide distributions of 65 groundfish species in British Columbia. Our model leverages data from multiple surveys to estimate how species respond to environmental gradients while accounting for differences in survey catchability. We find that this method has two main benefits: (1) it increases the accuracy of predictions in data-limited surveys and regions while having negligible impacts on accuracy when data are already sufficient, and (2) it reduces uncertainty, resulting in tighter confidence intervals on predicted occurrences. These benefits are particularly relevant in areas of our coast where our understanding of habitat suitability is limited due to a lack of spatially comprehensive long-term groundfish surveys.


Coastwide predictions of the distribution of groundfish species are needed to support ongoing marine spatial planning (e.g., PNCIMA 2017) and oil spill response planning in Canada’s Pacific Region. Data to inform these predictions are available from several fishery-independent surveys (Anderson et al. 2019). However, these surveys cover different spatial extents, target different substrate types, use different gear types, and thus vary in their ability to catch different species. Therefore, no single survey can capture the full coastwide extent for which distribution information is desired and needed to support marine spatial planning initiatives. To date, species distribution models (SDMs) applied in this region have restricted their inference to data from single surveys or, in some cases, multiple surveys that use the same gear type (Nephin et al. 2020; Thompson et al. 2022). However, if models could be developed that are jointly informed by all surveys, we could have higher confidence in applying them to predict the coastwide distribution of species.
Methods that integrate different data types and sources have been demonstrated to improve SDMs by increasing the accuracy of the predictions, by reducing uncertainty, and by increasing the spatial and temporal extent of the model (Fletcher et al. 2019; Miller et al. 2019; Isaac et al. 2020; Rufener et al. 2021; Watson et al. 2021). The benefits of integrated models have been demonstrated when models combine higher and lower quality datasets (e.g., expert surveys and presence only data Pacifici et al. 2017; Robinson et al. 2020), as well as when models combine data from surveys that differ in their spatial extent or method of sampling (e.g., Pirtle et al. 2019). However, constructing integrated SDMs requires careful consideration and accounting for differences and biases in the various data sources (Fletcher et al. 2019; Isaac et al. 2020) or models risk providing less accurate predictions compared with models based on the best single source of data (Simmonds et al. 2020). Integrated models rely on the assumption that all data sources are providing estimates of the same underlying species distribution (Miller et al. 2019). Thus, while each data source may provide an incomplete and potentially biased sample of that distribution, the final predicted distribution must be consistent with the data from all sources. Therefore, integrated models produce a single set of model parameters associated with environmental covariates and any latent variables, while other parameters account for differences in data source and the method of sampling.
While SDMs based on individual surveys are still the norm in marine systems (Grüss and Thorson 2019), models that integrate multiple surveys and gear types are becoming more common (Grüss et al. 2018a, 2018b, 2021; Runnebaum et al. 2018; Grüss and Thorson 2019; Pirtle et al. 2019; Webster et al. 2020; Monnahan et al. 2021; Watson et al. 2021). Studies that have assessed how integrating influences model performance have found that combining surveys can improve model precision (Grüss and Thorson 2019; Rufener et al. 2021) and accuracy (Runnebaum et al. 2018; Rufener et al. 2021). However, these comparisons were only based on a single species and it is unknown whether this would hold when applying integrated models across multiple taxonomic groups.
When considering fishery-independent groundfish survey data in Canadian Pacific waters, a major difference that must be accounted for is the sampling gear (i.e., longline vs. trawl). Longlines attract fish using bait on a finite number of hooks and species vary in the degree to which they are likely to be caught on a hook and the degree to which they will compete for hooks with other species (Kuriyama et al. 2019). In contrast, bottom trawls catch a wider range of species, and each trawl typically covers a larger spatial extent than would be sampled by a single longline set (Anderson et al. 2019). Thus, combining these two data types to jointly predict species’ abundance or biomass is a considerable challenge (but see Runnebaum et al. 2018; Webster et al. 2020 for single-species examples). However, data from these surveys can be converted to presence–absence data with the assumption that the probability of occurrence may differ across surveys because of differences in catchability by the different gear types (e.g., Grüss et al. 2018a). This allows us to integrate data across models by including the survey associated with each observation as a fixed effect that accounts for the average difference in catchability between surveys.
In this analysis, we combine presence–absence data from three fishery-independent scientific surveys that employ longline or bottom trawl methods to develop integrated SDMs for 65 species of groundfish that span the entire continental shelf to the toe of the slope in Pacific Canadian waters. We test the hypothesis: if integrating data from multiple surveys improves SDMs, then across the majority of species investigated, integrated models should have higher predictive power and reduced uncertainty compared with models that rely on only data from a single survey. We then used these integrated models to make coastwide predictions of groundfish occurrence and diversity that can be used for marine spatial planning and oil spill response in Pacific Canadian waters.


Survey data

Our SDMs are based on data from three fishery-independent scientific surveys conducted within Canadian Pacific waters (Fig. 1): the Fisheries and Oceans Canada (DFO) groundfish synoptic bottom trawl surveys (Sinclair et al. 2003; Anderson et al. 2019), the DFO groundfish hard bottom longline surveys (Lochead and Yamanaka 2006, 2007; Doherty et al. 2019), and the International Pacific Halibut Commission fishery-independent setline survey (IPHC 2021). In all cases, we based our analyses on data from 2003 to 2020 and used presence–absence records to facilitate integrating across surveys and gear types. Our analysis included all species that were present in at least 150 sets (to exclude species that are rarely sampled by the surveys) across all the survey datasets, resulting in a final set of 65 species (Table S1).

Groundfish synoptic bottom trawl surveys

The groundfish synoptic bottom trawl surveys (Trawl) cover the upper continental shelf and most of the British Columbia coast (Figs. 1a1c). The surveys cover five subregions: Queen Charlotte Sound (QCS), Hecate Strait (HS), West Coast Vancouver Island (WCVI), West Coast Haida Gwaii (WCHG), and the Strait of Georgia (SoG). The surveys are conducted between May and September and follow a random depth-stratified design with 2 km2 sampling blocks. The surveys use an Atlantic Western IIA box trawl connected to 1500 kg Thyboron Type II doors (Nottingham et al. 2018), except in the SoG where a smaller Yankee 36 Trawl net connected to 1135 kg U.S.A Jet doors were used to minimize catch of depleted species and to allow for a smaller area of deployment for fishing events (King et al. 2013). The QCS and HS surveys have been conducted in odd years since 2003 and 2005, respectively, although QCS was also sampled in 2004. WCVI and WCHG surveys have been conducted in even years since 2004 and 2006, respectively, although WCHG was also sampled in 2007. Our analysis included 5696 trawls, of which 93 come from the SoG. In this manuscript, Trawl refers to the entire survey dataset, Outer Trawl refers to the dataset excluding the SoG, and SoG Trawl refers to the data just from the SoG.

Groundfish hard bottom longline surveys

The groundfish hard bottom longline surveys (HBLL) cover most of the nearshore, hard bottom habitat of coastal British Columbia where trawl surveys are not possible (Fig. 1d). Our analysis includes data from 2003 (the first year of the survey) to 2020. Like the trawl surveys, the Longline Surveys follow a random depth-stratified design with 2 km2 sampling blocks. The inner surveys cover the inlets and protected waters east of Vancouver Island. Northern (Johnstone Strait and Broughton Archipelago) and southern sites (Desolation Sound, the Strait of Georgia, and southern Gulf Islands) are sampled on alternating years (Lochead and Yamanaka 2007). The outer surveys cover most of the remaining coastline of British Columbia (Fig. 1). Northern (northern Milbanke Sound, Dixon Entrance, and Haida Gwaii) and southern (southern Milbanke Sound, Queen Charlotte Sound, and the north and west coasts of Vancouver Island) sites are sampled on alternating years in summer months (July–September; Doherty et al. 2019). The surveys use a standardized “snap and swivel” longline hook gear baited with frozen squid and set on the ocean floor. The inner surveys use a size 13/0 circle hook and the outer surveys use a size 14/0 circle hook. Despite this minor difference in hook size, we elected to combine the inner and outer surveys in our analysis. Cases when the majority of hooks on a longline set are taken by fish indicate that fish may be competing for hooks and so some species that are present may not be caught by the surveys (Obradovich 2018). To avoid this, we excluded all sets where fewer than 20% of hooks were recovered with bait, as opposed to having caught a fish or where the bait was lost. This threshold was determined visually as the approximate point below which sets tended to be dominated by North Pacific Spiny Dogfish and the occurrence of other species decreased (Fig. S1). This resulted in excluding 757 of the total 3403 sets, leaving 2646 sets from the HBLL surveys in our final analysis.
Fig. 1.
Fig. 1. Map of the study area with names of geographic locations referred to in the text (a) and maps of the locations where the survey data were collected (be). Each point in panels (be) represents a single trawl or longline set. Map projection is NAD83/BC Albers and the coastline polygon was produced by the Canadian Hydrographic Service, DFO. The inset map projection in panel (a) is NAD83/Albers Canada.

International Pacific Halibut Commission fishery-independent setline survey

The International Pacific Halibut Commission fishery-independent setline survey (IPHC) surveys nearly the entire commercial fishing grounds of the Pacific Halibut fishery in both the U.S.A. and Canada in summer months. Our analysis is restricted to surveys conducted in British Columbian waters (Fig. 1e) and includes all years from 2003 to 2019, with the exception of 2013. Each year a set of standard stations set on a 10 × 10 nautical mile square grid, limited to seafloor depths of 37–503 m, is surveyed. In 2018, an additional 131 stations were sampled to capture inlets and nearshore waters not typically covered by the survey. The survey uses between five and seven 549 m skates, each with 100 (16/0) circle hooks per skate, at each station (IPHC 2021). This survey offers the benefit of using similar methods to the HBLL, while overlapping with the footprints of both the trawl and the HBLL surveys, making it easier to link the three surveys using a single integrated model. Frozen chum salmon (Oncorhynchus keta) are used as bait. Hook competition is also a potential issue in the IPHC survey. Here, we excluded stations where fewer than 12% of hooks were recovered with bait. We found that below this threshold, sets tended to be dominated by North Pacific Spiny Dogfish and the occurrence of other species decreased (Fig. S2). This resulted in excluding 1963 of the total 2824 sets, leaving 861 sets from the IPHC surveys in our final analysis.

Environmental predictors

As environmental predictors, we used log seafloor depth, bathymetric position index (BPI), substrate indices for muddiness and rockiness, log tidal speed (m·s−1), mean summer ocean current speed (m·s−1), mean summer salinity (practical salinity units, PSU), and mean summer salinity range (PSU). We opted not to include temperature as a predictor because it is highly correlated with depth in this region, and because year-to-year variation in temperature has been estimated to only explain a small fraction of the variation in species’ occurrence after accounting for depth (Thompson et al. 2022). Environmental predictors for the Northern and Southern Shelf Bioregions were at a resolution of 100 m and predictors in the Salish Sea were at a resolution of 20 m, but were aggregated to the same 100 m based on the mean value in each 100 m grid cell. Details on the data sources and pre-processing methods are described in Nephin et al. (2020). In our SDM (described below), we included all environmental predictors other than depth and BPI as linear fixed effects. We included depth and BPI as second-order polynomials to allow for non-linearity in response curves. We also included year as a penalized spline (i.e., generalized additive model; Wood 2017) to account for year-to-year variation in species’ occurrences. All environmental predictors were standardized by subtracting their mean and dividing by their standard deviation prior to model fitting. Environmental data were matched to each survey set by taking the median value across 10 equally spaced points between the starting and ending coordinates for the trawl or longline. To ensure a good match between our environmental values and the surveys, we excluded any sets for which the median depth estimated based on the coordinates and bathymetric layer differed by more than 100 m from the depth recorded at the time of sampling.
The Salish Sea and BC Shelf bathymetry layers were derived by mosaicking four digital terrain models, using the GDAL library, to cover the entire study area (Gregr 2012; Carignan et al. 2013; Davies et al. 2019; Kung 2021). The Salish Sea bathymetry layer had a 20 m resolution and the BC Shelf bathymetry layer that covered the rest of the study area had a 100 m resolution. The BPI layer is a derivative of the bathymetry layer, calculated with the Benthic Terrain Modeller 3.0 Toolbox in ArcGIS Pro 2.5 (Walbridge et al. 2018). BPI captured topographic features within a 1 to 10 km neighbourhood. Substrate layers were derived from a predictive model of categorical substrate types (Gregr et al. 2021). From the categorical substrate, continuous indices for sandy and muddy substrate, ranging from 0 to 1, that represent the amount of each substrate category within a 10 cell radius circular neighbourhood were calculated with the raster package (Hijmans 2022) in R (R Core Team 2021). The substrate index layers aimed to reduce patchiness and uncertainty in the categorical layer by integrating substrate information across a wider area.
Bottom salinity (mean and range) and ocean current speed (tidal and circulation) predictors were sourced from two regional circulation models: a 1998–2007 hindcast from a BC Shelf model with a roughly 3 km resolution (Masson and Fine 2012) and a 2017 hindcast from a Salish Sea model with a roughly 500 m resolution (Soontiens et al. 2016; Soontiens and Allen 2017). Mean summer bottom salinity and bottom ocean circulation current speed were calculated from April to September outputs of these models. Salinity range and tidal current speed were calculated from model outputs from April to September values in all years. Current speeds were calculated from zonal (u) and meridional (v) velocities from the two regional circulation models using the root mean square method to calculate tidal current speed and using the monthly means of u and v velocities to calculate ocean circulation current speed. Salinity and current speed layers were downscaled to match the resolution of the bathymetry and substrate layers using the spline with barriers function in ArcGIS Pro 2.5 with a smoothed coastline polygon as a barrier to interpolation. Additional smoothing using a 13 cell circular neighbourhood was applied to the downscaled Salish Sea current speed layers to reduce sharp changes in current speed values that were visible between depth levels in the regional model. These were likely artefacts of extracting only the bottom cells from the depth-level regional model, where the height of the bottom cell varies with depth (M. Dunphy (personal communication)).

Species distribution model

We fitted SDMs using generalized linear mixed effects models with the sdmTMB package (Anderson et al. 2022). We chose to use the sdmTMB package because it allowed us to include spatial and spatiotemporal random fields. These spatial and spatiotemporal random effects capture additional variation in species occurrences (Anderson et al. 2022). This could include variation that is due to environmental variables that were not included in the model as well as spatiotemporal variation due to dispersal patterns, biotic interactions, or stochasticity (Ovaskainen and Abrego 2020; Thompson et al. 2020). The spatial random effects estimate spatial patterns that are present across all years, while the spatiotemporal random effects estimate patterns that vary from year to year. Although estimating temporal change was not the goal of this study (but see Thompson et al. 2022), we included the spatiotemporal random effects to improve the model fit so that the environmental responses could be more accurately estimated and for valid statistical inference by accounting for year-to-year spatial autocorrelation (Dormann et al. 2007).
For each species, we modelled the occurrence Ys,t at location s and time t using a binomial observation model and a logit link:
where μs,t represents the mean occurrence. The symbol Xs,t represents the vector of environmental predictors (described above) corresponding to location s and time t, and β represents a vector of corresponding fixed effect parameters. The spatial and spatiotemporal random effects are represented by ωs and ϵs,t, respectively (Fig. S3). These random effects were assumed to be drawn from a Gaussian random field with covariance matrices Σω and Σε that were constrained by Matérn covariance functions (Cressie and Wikle 2011). The spatial components of the analysis were modelled as random fields using a predictive process approach with a triangulated mesh (Lindgren and Rue 2015) and bilinear interpolation between vertices (Fig. S4). We constructed the mesh such that vertices had a minimum gap (“cutoff”) of 15 km. We applied a coastline physical barrier in which we assumed the spatial range (distance at which two data points are effectively independent) was 0.2 of the in-water range (Bakka et al. 2019). This causes the spatial correlation to decay five times faster with distance across land than water. A comparison of models with different minimum distances between vertices showed that meshes with smaller minimum distances had slightly higher predictive accuracy, but models using a 15 km value were nearly as accurate as models with 5 km values, while being much faster to fit (Fig. S5). sdmTMB fits the model by maximum marginal likelihood using a mesh constructed by INLA (Rue et al. 2009; Lindgren et al. 2011; Lindgren and Rue 2015), a model template coded in TMB (Kristensen et al. 2016), the marginal likelihood function maximized with the non-linear minimizer nlminb (Gay 1990) in the R statistical language (R Core Team 2021), and the random effects integrated over via the Laplace approximation (Kristensen et al. 2016).

Candidate models

For each species, we compared the predictive power of three classes of models. These were models fit on the individual surveys separately (HBLL, IPHC, Outer Trawl, and the SoG Trawl), models that integrated surveys that shared the same gear type (integrated gear model), and a model that integrated both gear types and all surveys (fully integrated model). We excluded surveys that included fewer than 10 occurrences for the focal species because including these surveys resulted in poor model convergence. Thus, for some species, the fully integrated model or one or both (i.e., trawl and longline) of the integrated gear models were not run. The SoG Trawl survey was only conducted in 2012 and 2015 and is typically excluded in analyses of the broader dataset (Anderson et al. 2019) because of the differences in gear and the limited temporal coverage. Thus, we considered the SoG Trawl survey as a separate survey in our single-survey models but not in our integrated models (see the “Model Comparison” section for details).
A summary of the models is provided in Table 1. All models included the fixed and spatiotemporal random effects described previously except for the SoG Trawl single-survey model where fixed effects other than depth were dropped because more complex models failed to converge due to the low number of observations in that dataset. We integrated the surveys using a categorical fixed effect (i.e., HBLL vs. IPHC vs. Trawl) to account for methodological differences between the surveys. This survey effect was included in both the integrated longline model and in the fully integrated model. Because we did not use a survey effect to distinguish between the Outer and SoG trawls, there was no survey effect in the integrated gear trawl model. Species responses to the environmental conditions were shared across surveys in the integrated models, but the overall probability of occurrence could vary depending on the survey. Although there are some methodological differences in the trawl surveys in the SoG compared with the other regions, we assume that these differences would have minimal effect on presence–absence data. Furthermore, it would not be possible to distinguish these effects from the spatial and environmental differences that exist between the SoG and the outer shelf, which we assume have a larger impact on species occurrences. Thus, we elected to consider all trawl surveys as a single survey in the integrated models.
Table 1.
Table 1. Summary of models including the model name, the level of integration (Type), the surveys that the model was trained on (Survey inputs), the fixed effects, the random effects, and the surveys for which the model can make predictions (Prediction surveys).

Model comparison

We compared models based on their predictive accuracy estimated using spatial block cross-validation. We used three-fold cross-validation with folds arranged sequentially over 24 blocks that divided the coastline equally from north to south (Fig. S6). We assessed predictive accuracy using the area under the curve (AUC; Pearce and Ferrier 2000) and the expected log pointwise predicted density (ELPD; Vehtari et al. 2017) of the withheld data in the cross-validation. Because the folds did not contain equal numbers of observations, we elected not to estimate our predictive accuracy within each fold and then average. Instead, we collected the predicted values from all three cross-validation folds and then estimated predictive accuracy from the full dataset. We also estimated our predictive accuracy statistics separately for data from each individual survey. This allowed us to compare the accuracy of the single-survey models with the integrated models when predicting the same data. In addition, we calculated model residuals using the DHARMa package (Hartig 2022), simulating new observations with all parameters at their maximum likelihood estimates, and visually inspected model residuals using quantile–quantile plots (Fig. S7).

Estimated response curves

We estimated species’ response curves for each environmental gradient while setting all other environmental values to their mean across all surveys and setting year as 2015. We used 2015 because this is the most recent year in which we have data from all surveys. For each environmental gradient, we estimated the response curves across the full range of values present for which we make our spatial predictions (see next section), regardless of whether that range of conditions was sampled by the specific surveys used to train the model. We set the random field values to zero, thereby removing their uncertainty from the response curves.

Selection of final model for each species

After comparing the predictive accuracy of the various models, we selected a model for each species for our final spatial predictions. We selected this final model for each species by applying a series of decisions:
Exclude SoG Trawl single-survey models because of their spatial coverage constraints.
Exclude all models with a cross-validation AUC less than 0.7 because of low predictive accuracy.
Select the highest level of integration within the remaining models.
When a fully integrated model is selected, select the survey for making predictions (i.e., survey fixed effect) based on which survey has the higher proportion of occurrences for that species, with two exceptions—we elected not to select the IPHC survey for Pacific Halibut and North Pacific Spiny Dogfish as these species have an extremely high affinity for longlines and were caught in more than 98% of all sets. Although the IPHC survey is well suited to estimating spatial variation in biomass for these species (Webster et al. 2020), this high catch rate makes it poorly suited for a presence–absence model.

Spatial species distribution predictions

We made predictions of species occurrence for our selected models at a 1 km resolution and at a resolution of 3 km for our non-selected models. This lower resolution was used for computational efficiency as these predictions were only used to illustrate differences between the outputs of the various models. Predictions were made across the full British Columbia coast for areas where seafloor depth was between 10 and 1400 m and mean summer bottom salinity was above 28 PSU. These values correspond to the approximate range of environmental conditions sampled in the surveys (Fig. S8). Environmental values were obtained by averaging the 100 m resolution environmental data described previously within each 1 or 3 km grid cell. The 1 km resolution for our selected models was chosen because it corresponds with the average distance between the start and end coordinates of the HBLL surveys (1.1 km) and is in the same magnitude as the average trawl length (1.8 km). Finer scale predictions could be made if necessary (e.g., Nephin et al. 2020). However, this would require making the assumption that a model trained on survey data, which integrates multiple kilometres of habitat, can make predictions at finer spatial resolutions. All species distribution predictions were made using models fit to the full dataset, rather than using the cross-validation models. Predictions were made for 2012–2015 and then averaged across years in each grid cell. We elected to use this time span because the SoG Trawl survey was only conducted in 2012 and 2015 and so our predictions would not require extrapolation beyond the years in which we have data. Using the mean value across multiple years ensured that our predictions are not heavily influenced by year-to-year variation in predicted species occurrences.
For each model, we repeated our predictions for each survey that was included in the model. For the single-survey methods, this corresponds to a single prediction across the region, which generally requires considerable extrapolation to areas and environmental conditions that were not sampled by that survey. In the integrated models, this corresponds to multiple predictions across the coast, each of which are our predictions for the probability of catching a species if that survey were used across the entire spatial extent. Again, this requires extrapolation into areas that were not sampled by that survey, but now these predictions are also informed by the other surveys that did sample in those regions.
To obtain confidence estimates of species occurrences, we drew 500 simulated values from the joint precision matrix of our model, assuming a multivariate normal distribution on the parameters as transformed internally in the model (sensu Gelman and Hill 2007). We calculated the standard error (SE) of our predictions as the standard deviation of these simulated values. To compare how integrating the surveys changes prediction uncertainty, we calculated the ratio between the integrated model SE and the single-survey model SE for each grid cell and year combination: . We made this comparison for both the integrated gear vs. single model as well as for the fully integrated vs. single model. We then calculated the median value across all grid cells and years to obtain a central value of uncertainty change for each species.
To identify occurrence hotspots for the individual species, we calculated the proportion of simulated values in each grid cell that exceeded a high occurrence threshold. This threshold was the lower of (1) 0.8 and (2) the 80th percentile of the predicted probability of occurrence values across all grid cells that had a probability of occurrence greater than 0.05. Excluding grid cells with a probability of occurrence lower than 0.05 ensured that habitats that were unsuitable were not included. Using an upper bound of 0.8 for the hotspot threshold ensured that thresholds were not unreasonably high for species with high occurrence probability where it is predicted to be found (e.g., >0.999 for Longspine Thornyhead).

Species richness predictions

We predicted species richness by summing the predicted species occurrences within each grid cell across all the selected final models (sensu Ovaskainen and Abrego 2020). This value should be interpreted as the number of species that would be expected to be caught if that location were surveyed using the most appropriate survey method for each species. This assumes, for example, that if one species has a probability of occurrence of 0.6 and another species has a probability of 0.8 in a given location, then a survey would catch an average of 1.4 species at that location if sampled repeatedly.


There was large variation in the estimated environmental response curves based on the models fit to the individual surveys and these response curves often had high uncertainty, particularly in portions of the environmental range that were not well sampled by the survey (e.g., Fig. 2). Integrating the models aligned these response curves and tended to reduce uncertainty.
Fig. 2.
Fig. 2. Estimated environmental response curves for Pacific cod (Gadus macrocephalus) as an example of how the integrated models estimate coefficients that are consistent with other data sources. Estimates are with all other predictors set at their mean values. Only a select set of environmental covariates used in the model is shown. The models trained on just one survey are shown in yellow, the integrated gear models (trained on the HBLL and IPHC together, or the Outer Trawl and SoG Trawl together) are shown in red, and the fully integrated models that are trained on all four surveys are shown in blue. Each row shows the survey (fixed effect) used to make the predictions, labelled on the right-hand side. The lines show the mean estimated values and the bands show the 95% confidence intervals, which include intercept and main effect uncertainty but exclude random field uncertainty. The ticks at the top of the panels show the environmental conditions for all the individual survey sets. Note that only depth is included in the single model for the SoG Trawl. Our decision rules recommended making predictions for Pacific cod using the fully integrated model with fixed effect for survey set to Outer Trawl. See Supplementary Material C or our Shiny app (https://msea.science/integrated_gf_shiny/) for the environmental response curves for other species.
The single-survey models differed in their predicted spatial distributions (e.g., Fig. 3). The single-survey model predictions tended to differ the most in regions where one or more surveys did not sample. Integrating the surveys aligned the spatial pattern of the predictions, but the models differed in their estimated occurrence values depending on for which survey the predictions were made. For example, the Outer Trawl survey tended to make poor predictions in the Strait of Georgia, but this was corrected when the data from the SoG Trawl were integrated.
Fig. 3.
Fig. 3. Example of how integrating alters the predicted spatial distribution of the species, using Pacific cod (Gadus macrocephalus) as an example. Panel (a) shows the locations of the sets for each of the four surveys (panels), and sets that contained Pacific cod are shown in red; sets without Pacific cod are shown in black. Panel (b) shows the predicted probability of occurrence of Pacific cod across the full coast at a 3 km grid cell scale. The three types of models—single, integrated gear, and fully integrated—are shown in different columns. Each row shows the survey (fixed effect) used to make the predictions, labelled on the right-hand side. Our decision rules recommended making predictions for Pacific cod using the fully integrated model with fixed effect for survey set to Outer Trawl. See Supplementary Material D or our Shiny app (https://msea.science/integrated_gf_shiny/) for the predicted occurrence maps for other species. Map projection is NAD83/BC Albers and the coastline polygon was produced by the Canadian Hydrographic Service, DFO.
All 65 species included in our analysis had at least one model that met our inclusion criteria of having an AUC above 0.7. The final models selected included 29 fully integrated models, 33 integrated gear models, and 3 single-survey models (Fig. 4). In the fully integrated models, our decision rules recommended making predictions using Outer Trawl as the survey fixed effect for 19 species, with HBLL as the fixed effect in 8 species, and with IPHC as the fixed effect in 2 species. All but one selected integrated gear models were based on the longline surveys and for these our decision rules recommended making predictions using the HBLL survey. All single-survey models are based on the HBLL survey. The species-specific outputs from the selected models (i.e., occurrence prediction maps, hotspot probability maps, environmental response curves) are available in Supplementary Materials B–D and in our Shiny app (https://msea.science/integrated_gf_shiny/).
Fig. 4.
Fig. 4. Predictive accuracy (AUC) of the selected model for each species. Species are distinguished based on whether the selected model was based on a single survey or whether it integrated surveys of common gear type (integrated gear) or integrated all surveys (fully integrated). The colour of each point indicates either the corresponding survey for the single-survey models or the survey that our decision rules recommend for making further predictions (e.g., maps) in the integrated models.
Integration resulted in an average increase in model AUC for predicting data from the two surveys with the fewest samples (IPHC and SoG Trawl), and negligible differences for the surveys with more data (HBLL and Outer Trawl; Table 2; Figs. 5a and 5b). The greatest improvement in AUC was for predicting data from the IPHC survey (both integrated models) and for predicting the SOG Trawl survey with the fully integrated model. Integration resulted in no median change or minimal changes to the AUC when predicting the HBLL and Outer Trawl data. Integration also resulted in minimal changes in ELPD (Table 2), with increases and decreases being roughly balanced (Fig. S9).
Table 2.
Table 2. Comparison predictive accuracy change (ΔAUC, ΔELPD) and standard error (SE) change between the single-survey models and the integrated gear or fully integrated models.
Integration generally reduced uncertainty in the model predictions but there was variation across species and surveys (Fig. 6) and across space (Fig. 7). The largest reductions in uncertainty were for the predictions based on the IPHC and SoG Trawl surveys, but uncertainty also decreased for the other surveys (Table 2). For all surveys, the median reduction in uncertainty was greater for the fully integrated model compared with the integrated gear model. The spatial pattern of uncertainty change from integration was heterogeneous, but in general, integration tended to reduce uncertainty in areas that are not well sampled by the surveys but have minor changes in areas that are well sampled (Fig. 7).
Fig. 5.
Fig. 5. Predictive accuracy difference, as measured by area under the curve (AUC) based on spatial block cross-validation, between the single-survey models and the integrated gear (a) or fully integrated (b) models. Positive values (green) indicate an increase in predictive accuracy in the integrated model over the single-survey model. Each point represents a single species.
Fig. 6.
Fig. 6. Comparison across species of the change in the size of the standard error of the predicted occurrence values in the integrated models and the single-survey models. Each point represents a single species and is the median value of the ratio between the integrated standard error and the single-survey standard error across all 3 km grid cells in the model. Values below 1 (green) represent a decrease in standard error in the integrated models. Panel (a) shows the comparison for the integrated gear model. Panel (b) shows the comparison for the fully integrated model.
Fig. 7.
Fig. 7. Map of the proportional change in the standard error of the species occurrence predictions due to integration in the various surveys. Values are medians across all species within each 3 km grid cell. All standard error ratios are based on the value in the integrated model (i.e., fully integrated or integrated gear) divided by the value from the single-survey model. Values below 1 (blue) represent a proportional reduction in uncertainty in the integrated models; values above 1 (red) represent proportional increases in uncertainty in the integrated models. The two levels of integration are shown in the columns; the four surveys are shown in the rows. Map projection is NAD83/BC Albers and the coastline polygon was produced by the Canadian Hydrographic Service, DFO.
Predicted species richness ranged spatially from 1.2 to 25 with a median value of 15.6 (Fig. 8a). Predicted species richness was generally higher on the outer shelf than in the waters to the east and south of Vancouver Island (e.g., Strait of Georgia, Juan de Fuca Strait, Johnstone Strait; Fig. 1a). The median SE of predicted occurrences across species ranged spatially from 0.67 to 5.1 with a median value of 1.2 (Fig. 8b). These SEs were lowest on the outer shelf where the Outer Trawl survey is conducted and was highest in the inlets, the Strait of Georgia, and the Juan de Fuca Strait, where less data are available.
Fig. 8.
Fig. 8. Map of the predicted number of species in each location (i.e., species richness) (a) and the median standard error of predicted occurrence in link space across all species (b). Predictions are made at a 1 km resolution. Colour scale in panel (b) is log transformed. See our Shiny app (https://msea.science/integrated_gf_shiny/) for an interactive version of panel (a). Map projection is NAD83/BC Albers and the coastline polygon was produced by the Canadian Hydrographic Service, DFO.


Our analyses suggest that integrating data from multiple scientific surveys can provide improved predictions of groundfish species occurrence and biodiversity patterns in Canadian Pacific waters. Integration allowed us to develop models that were informed by data that cover a much broader environmental and spatial range, resulting in model parameters that were consistent with the full range of survey data in the region. Integration resulted in increased predictive accuracy for the majority of species. This increase in accuracy was most often seen when predicting data from the more data-limited surveys (i.e., SoG Trawl and IPHC for some species), whereas integration had negligible impacts on predictive accuracy for the Outer Trawl and HBLL surveys where data are already sufficient in most cases. However, the main benefit of integration was that it resulted in higher model certainty as quantified by the changes in the SE of the predictions. Thus, the integrated model predictions represent a substantial improvement over the single-survey models because they allowed us to make coastwide predictions with higher certainty, while increasing the accuracy of predictions in areas that are not as frequently sampled.
We expected that the integrated models would have increased predictive accuracy over individual models because integration should allow for more accurate estimates of how the environmental covariates shape species’ distributions (Miller et al. 2019). This was likely the case for the many species for which the integrated models had higher AUC when predicting occurrences, particularly in the more data-limited SoG Trawl and IPHC (for some species) surveys. These increases in model accuracy are consistent with improved model performance that integration provided in a model of cusk in the Gulf of Maine (Runnebaum et al. 2018). However, there are also species that showed a decrease in AUC in the integrated models—again primarily in the SoG Trawl and IPHC surveys. These cases likely reflect single-survey models that were overfit to the data and so had higher predictive accuracy but were not consistent with data from the rest of the surveys. However, it is also possible that, for some species, the reduction in AUC from integration could be a signal that the surveys are sampling different subsets of the species that differ in their environmental responses (e.g., Dahlke et al. 2020). For example, if juveniles inhabit nearshore habitats and are sampled by one survey, and adults live in deeper habitats and are sampled by another survey. These surveys are not well suited to catch very young fish and there tends to be high overlap in the length distributions of individuals caught in the HBLL and trawl surveys (Fig. S10). However, lengths do differ for some species, which indicates that the surveys may be catching different life stages. In such cases, integration would result in environmental response curves that are broader than those that would define any individual life stage. Nevertheless, the fact that integration tended to increase the predictive accuracy of the data-limited surveys, while having minimal impact on the predictive accuracy of the data-rich surveys (i.e., HBLL and the Outer Trawl survey), suggests that integration tends to result in models that are as good, or better than the single-survey models.
In contrast to the mixed effects on predictive accuracy, integration had more consistent benefits for reducing model uncertainty. This was expressed both in the fact that integration decreased model uncertainty for most species (Fig. 6), and in most locations on the coast (Fig. 7). This is consistent with the reduction in uncertainty in integrated models of Gulf of Mexico Red Snapper (Grüss and Thorson 2019) or when integrating commercial with survey data (Rufener et al. 2021). Across the diversity of species included in our analysis, these reductions in uncertainty were common. Like with predictive accuracy, the largest changes in uncertainty occurred when making predictions based on the more data-limited SoG Trawl and IPHC (for some species) surveys. This reinforces the conclusion that integrated models tend to have the greatest benefit when data are limited, while having minimal influence in more data-rich scenarios. Nevertheless, there was still considerable spatial variation in the uncertainty of our model predictions. Areas such as the continental shelf that are well sampled had small confidence intervals, but our predictions were much less certain in areas off the continental shelf break, within the Strait of Georgia, the Juan de Fuca Strait, Johnstone Strait, and in the inlets and fjords. This suggests that our understanding of the groundfish community in these areas would benefit greatly from additional sampling effort focused in these areas.
When selecting the final model for estimating species occurrence distributions, we balanced the ability of the models to predict data from the individual surveys with our confidence in using the models to make predictions in areas that are less frequently sampled. This meant that we did not necessarily select the model that most accurately predicted the survey data that it was trained on. Instead, we opted to select the model that integrated across the most surveys possible, provided it had adequate predictive accuracy (i.e., AUC > 0.7). For the species where the fully integrated model was not selected, this tended to be because the species was not caught in sufficient frequency in the other surveys to allow the fully integrated model to be fit. We have less confidence in applying these models to the full spatial extent of the region because more extrapolation is involved compared with when full integration is possible.
A downside of using presence–absence data to integrate different surveys is that our models cannot provide predictions of abundance or biomass. Models that integrate long line and trawl data to predict biomass have been developed, but so far have been limited to single species (e.g., Runnebaum et al. 2018; Webster et al. 2020). Scaling these methods up to make predictions at the scale of the entire groundfish community will require further consideration, especially with respect to length selectivity, but is a promising future direction.
While integrating data from multiple surveys resulted in clear benefits for our predictions of almost all species, this does not suggest that integration will always improve model performance. Integration requires that data sources contain sufficient overlap in environmental and geographic space so that differences in observation probability between data sources can be accurately estimated (Isaac et al. 2020). Therefore, integrating surveys that target distinct geographic areas or environmental conditions will require making additional assumptions about which factors are responsible for the differences in observation probability. In addition, many integrated models couple high-quality survey data covering a limited spatial and temporal extent with lower quality observation data (e.g., from databases such as the Global Biodiversity Information Facility and the Ocean Biodiversity Information System) that cover a much broader extent (Fletcher et al. 2019). While such integration may allow for predictions across a broader extent, these models may sacrifice accuracy compared with models trained only on higher quality data. Nevertheless, simulations have shown that such reductions in accuracy can be avoided or minimized when biases in the data are properly accounted for (Simmonds et al. 2020). Integrating our three fishery-independent surveys was relatively straightforward because they overlap in space, have good spatial balance, and provide presence–absence estimates. However, a wealth of additional data exist in the marine environment from sources such as commercial catch, opportunistic observations, museum collections, and eDNA. Point process models could allow these diverse, but often highly biased, data types to be integrated with fishery-independent surveys, which may provide additional insight into the distribution of marine species (e.g., Rufener et al. 2021; Watson et al. 2021).
Our final selected models provide predictions of species occurrence for 65 groundfish species that can inform ongoing marine spatial planning initiatives in Canadian Pacific waters. Previous models of groundfish distribution and diversity in this region have been informed by data from single-survey types, and thus their predictions have been spatially limited to subsets of the region (Nephin et al. 2020). Here, integration allowed us to make predictions that span nearly the full extent of the continental shelf and slope in Canadian Pacific waters up to a depth of 1400 m. In particular, we provided predictions in areas such as the fjords and inlets and the Strait of Georgia, which are areas where we previously lacked this information (Anderson et al. 2019; Carrasquilla-Henao et al. 2019, Thompson et al. 2022). These models have the potential to identify the diversity of the groundfish community and the specific species that are likely to be present in locations that may be impacted by oil spills or other anthropogenic or environmental impacts and will be used to inform ongoing marine spatial planning initiatives (e.g., PNCIMA 2017; MPA Network 2022). In addition, because our models identify areas of our coast where uncertainty is highest, they can be used to target areas for additional sampling to benefit models in the future. Overall, this analysis demonstrates how existing surveys can be integrated to increase the accuracy, confidence, and spatial extent of biodiversity and species occurrence predictions to support the spatial management of marine ecosystems.


We would like to thank the DFO and IPHC groundfish research survey teams and the DFO and the IPHC for making these data available. We thank Cole Fields, who provided data analysis and processing for the substrate index layers. We thank Robert Skelly for assistance in setting up the Shiny App server. We thank the Salish Sea MEOPAR project team (https://salishsea.eos.ubc.ca/nemo/) for making their ocean model data available. We thank two anonymous reviewers and the associate editor for helpful comments that greatly improved this paper.


Anderson S.C., Keppel E.A., Edwards A.M. 2019. A reproducible data synopsis for over 100 species of British Columbia groundfish. Can. Sci. Advis. Sec. Res. Doc. 041: 321.
Anderson S.C., Ward E.J., English P.A., Barnett L.A.K. 2022. sdmTMB: an R package for fast, flexible, and user-friendly generalized linear mixed effects models with spatial and spatiotemporal random fields. bioRxiv: 2022.03.24.485545.
Bakka H., Vanhatalo J., Illian J.B., Simpson D., Rue H. 2019. Nonstationary Gaussian models with physical barriers. Spat. Stat. 29: 268–288.
Carignan K., Eakins B., Love M., Sutherland M., McLean S. 2013. Bathymetric digital elevation model of British Columbia, Canada: procedures, data sources, and analysis. NOAA National Geophysical Data Center (NGDC), Boulder, CO. Available from https://www.ngdc.noaa.gov.
Carrasquilla-Henao M., Yamanaka K.L., Haggarty D., Juanes F. 2019. Predicting important rockfish (Sebastes spp.) habitat from large-scale longline surveys for southern British Columbia, Canada. Can. J. Fish. Aquat. Sci. 76(5): 682–694.
Cressie N.A.C., Wikle C.K. 2011. Statistics for spatio-temporal data. Wiley, Hoboken, NJ.
Dahlke F.T., Wohlrab S., Butzin M., Pörtner H.-O. 2020. Thermal bottlenecks in the life cycle define climate vulnerability of fish. Science, 369(6499): 65–70.
Davies S.C., Gregr E.J., Lessard J., Bartier P., Willis P. 2019. Coastal digital elevation models integrating ocean bathymetry and land topography for marine ecological analyses in Pacific Canadian waters. Can. Tech. Rep. Fish. Aquat. Sci. 3321: 38.
Doherty B., Benson A.J., Cox S.P. 2019. Data summary and review of the PHMA hard bottom longline survey in British Columbia after the first 10 years (2006–2016). Can. Tech. Rep. Fish. Aquat. Sci. 3276: 86.
Dormann C.F., McPherson M.J., Araújo B.M., Bivand R., Bolliger J., Carl G., 2007. Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography, 30(5): 609–628.
Edwards A.M., Anderson S.C., Keppel E.A., Grandin C. 2021. gfiphc: data extraction and analysis for groundfish data from the IPHC longline survey in BC. Available from https://github.com/pbs-assess/gfiphc [accessed February 2, 2022].
Fletcher R.J., Hefley T.J., Robertson E.P., Zuckerberg B., McCleery R.A., Dorazio R.M. 2019. A practical guide for combining data to model species distributions. Ecology, 100: e02710.
Gay D.M. 1990. Usage summary for selected optimization routines. At&T Bell Laboratories, Murray Hill, NJ. p. 07974.
Gelman A., Hill J. 2007. Data analysis using regression and multilevel/hierarchical models. In Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, New York, NY. p. 625.
Gregr E.J. 2012. BC_EEZ_100m: a 100 m raster of the Canadian Pacific Exclusive Economic Zone. SciTech Environmental Consulting, Vancouver, BC. Available from https://bcmca.ca/datafeatures/eco_physical_bathymetry/.
Gregr E.J., Haggarty D.R., Davies S.C., Fields C., Lessard J. 2021. Comprehensive marine substrate classification applied to Canada’s Pacific shelf. PLoS ONE, 16(10): e0259156.
Grüss A., Thorson J.T. 2019. Developing spatio-temporal models using multiple data types for evaluating population trends and habitat usage. ICES J. Mar. Sci. 76(6): 1748–1761.
Grüss A., Perryman H.A., Babcock E.A., Sagarese S.R., Thorson J.T., Ainsworth C.H.,. 2018a. Monitoring programs of the U.S Gulf of Mexico: inventory, development and use of a large monitoring database to map fish and invertebrate spatial distributions. Rev. Fish. Biol. Fish. 28(4): 667–691.
Grüss A., Thorson J.T., Babcock E.A., Tarnecki J.H. 2018b. Producing distribution maps for informing ecosystem-based fisheries management using a comprehensive survey database and spatio-temporal models. ICES J. Mar. Sci. 75(1): 158–177.
Grüss A., Pirtle J.L., Thorson J.T., Lindeberg M.R., Neff A.D., Lewis S.G., Essington T.E. 2021. Modeling nearshore fish habitats using Alaska as a regional case study. Fish. Res. 238: 105905.
Hartig F. 2022. DHARMa: residual diagnostics for hierarchical (multi-level/mixed) regression models. Available from https://CRAN.R-project.org/package=DHARMa [accessed January 14, 2022].
Hijmans R.J. 2022. Raster: geographic data analysis and modeling. Available from https://CRAN.R-project.org/package=raster [accessed May 16, 2022].
IPHC. 2021. International Pacific Halibut Commission Fishery-Independent Setline Survey Sampling Manual(2021). IPHC, Seattle, WA.
Isaac N.J.B., Jarzyna M.A., Keil P., Dambly L.I., Boersch-Supan P.H., Browning E.,. 2020. Data integration for large-scale models of species distributions. Trends Ecol. Evol. 35(1): 56–67.
Keppel E.A., Anderson S.C., Edwards A.M., Grandin C. 2022. Gfdata: data extraction for DFO PBS groundfish stocks. Available from https://github.com/pbs-assess/gfdata [accessed February 2, 2022].
King J.R., Surry M.R., Wyeth M.R., Olsen N., Workman G. 2013. Strait of Georgia Groundfish Bottom Trawl Survey, March 14–24, 2012. Can. Tech. Rep. Fish. Aquat. Sci. 3056.
Kristensen K., Nielsen A., Berg C.W., Skaug H., Bell B.M. 2016. TMB: automatic differentiation and Laplace approximation. J. Stat. Softw. 70(5): 1–21.
Kung R. 2021. Canada west coast topo-bathymetric digital elevation model. Available from https://open.canada.ca/data/en/dataset/e6e11b99-f0cc-44f7-f5eb-3b995fb1637e [accessed May 16, 2022].
Kuriyama P.T., Branch T.A., Hicks A.C., Harms J.H., Hamel O.S. 2019. Investigating three sources of bias in hook-and-line surveys: survey design, gear saturation, and multispecies interactions. Can. J. Fish. Aquat. Sci. 76(2): 192–207.
Lindgren F., Rue H. 2015. Bayesian spatial modelling with R-INLA. J. Stat. Softw. 63(19): 1–25.
Lindgren F., Rue H., Lindström J. 2011. An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach: link between Gaussian fields and Gaussian Markov random fields. J. Roy. Stat. Soc.: Series B, Stat. Methodol. 73(4): 423–498.
Lochead J.K., Yamanaka K.L. 2006. Summary report for the inshore rockfish (Sebastes spp.) longline survey conducted in statistical areas 12 and 13, August 24–September 10, 2004. Can. Tech. Rep. Fish. Aquat. Sci. 2627: 77.
Lochead J.K., Yamanaka K.L. 2007. Summary report for the inshore rockfish (Sebastes spp.) longline survey conducted in statistical areas 14 to 20, 28 and 29, from August 11 to September 6, 2005. Can. Tech. Rep. Fish. Aquat. Sci. 2690: 63.
Masson D., Fine I. 2012. Modeling seasonal to interannual ocean variability of coastal British Columbia. J. Geophys. Res. 117(C10): 10019.
Miller D.A.W., Pacifici K., Sanderlin J.S., Reich B.J. 2019. The recent past and promising future for data integration methods to estimate species’ distributions. Methods Ecol. Evol. 10(1): 22–37.
Monnahan C.C., Thorson J.T., Kotwicki S., Lauffenburger N., Ianelli J.N., Punt A.E. 2021. Incorporating vertical distribution in index standardization accounts for spatiotemporal availability to acoustic and bottom trawl gear for semi-pelagic species. ICES J. Mar. Sci. 78(5): 1826–1839.
MPA Network. 2022. MPA Network BC Northern Shelf. Available from http://mpanetwork.ca/bcnorthernshelf/ [accessed 11 March 2022].
Nephin J., Gregr E.J., Germain C.S., Fields C., Finney J.L. 2020. Development of a species distribution modelling framework and its application to twelve species on Canada’s Pacific Coast. Can. Sci. Advis. Sec. Res. Doc. 2020/004: 107. Available from http://www.dfo-mpo.gc.ca/csas-sccs/Publications/ResDocs-DocRech/2020/2020_004-eng.pdf.
Nottingham M.K., Williams D.C., Wyeth M.R., Olsen N. 2018. Summary of the Queen Charlotte Sound Synoptic Bottom Trawl Survey, July 4–August 1, 2017. Can. Manuscr. Rep. Fish. Aquat. Sci. 3050.
Obradovich S.G. 2018. Evaluating key assumptions of a hook-based relative abundance index derived from the catch of bottom longlines. Ph.D. thesis, University of British Columbia. Available from https://doi.org/10.14288/1.0365702.
Ovaskainen O., Abrego N. 2020. Joint species distribution modelling: with applications in R. Cambridge University Press, Cambridge.
Pacifici K., Reich B.J., Miller D.A.W., Gardner B., Stauffer G., Singh S., 2017. Integrating multiple data sources in species distribution modeling: a framework for data fusion. Ecology, 98(3): 840–850.
Pearce J., Ferrier S. 2000. Evaluating the predictive performance of habitat models developed using logistic regression. Ecol. Model. 133(3): 225–245.
Pirtle J.L., Shotwell S.K., Zimmermann M., Reid J.A., Golden N. 2019. Habitat suitability models for groundfish in the Gulf of Alaska. Deep Sea Res. Part II: Topical Stud. Oceanogr. 165:303–321.
PNCIMA. 2017. Pacific North Coast Integrated Management Area Plan. p. 78 [accessed december 15, 2021].
R Core Team. 2021. R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Available from https://www.R-project.org/.
Robinson O.J., Ruiz Gutierrez V., Reynolds M.D., Golet G.H., Strimas Mackey M., Fink D. 2020. Integrating citizen science data with expert surveys increases accuracy and spatial extent of species distribution models. Divers. Distrib. 26(8): 976–986.
Rue H., Martino S., Chopin N. 2009. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. Roy. Stat. Soc.: Series B, Stat. Methodol. 71(2): 319–392.
Rufener M., Kristensen K., Nielsen J.R., Bastardie F. 2021. Bridging the gap between commercial fisheries and survey data to model the spatiotemporal dynamics of marine species. Ecol. Appl. 31: e02453.
Runnebaum J., Guan L., Cao J., O’Brien L., Chen Y. 2018. Habitat suitability modeling based on a spatiotemporal model: an example for cusk in the Gulf of Maine. Can. J. Fish. Aquat. Sci. 75(11): 1784–1797.
Simmonds E.G., Jarvis S.G., Henrys P.A., Isaac N.J.B., O’Hara R.B. 2020. Is more data always better? A simulation study of benefits and limitations of integrated distribution models. Ecography, 43(10): 1413–1422.
Sinclair A., Schnute J., Haigh R., Starr P., Stanley R., Fargo J., Workman G. 2003. Feasibility of multispecies groundfish bottom trawl surveys on the BC coast. Can. Sci. Advis. Sec. Res. Doc. 049. Available from http://waves-vagues.dfo-mpo.gc.ca/Library/274196.pdf.
Soontiens N., Allen S.E. 2017. Modelling sensitivities to mixing and advection in a sill-basin estuarine system. Ocean Model. 112: 17–32.
Soontiens N., Allen S.E., Latornell D., Souëf K.L., Machuca I., Paquin J.-P., 2016. Storm surges in the Strait of Georgia simulated with a regional model. Atmos.-Ocean, 54(1): 1–21.
Thompson P.L., Anderson S.C., Nephin J., Haagarty D., Peña M.A., English P.A., 2022. Disentangling the impacts of environmental change and commercial fishing on demersal fish biodiversity in a northeast Pacific ecosystem. Mar. Ecol. Prog. Ser. 689: 137–154.
Thompson P.L., Guzman L.M., Meester L.D., Horváth Z., Ptacnik R., Vanschoenwinkel B.,. 2020. A process-based metacommunity framework linking local and regional scale community ecology. Ecol. Lett. 23(9): 1314–1329.
Vehtari A., Gelman A., Gabry J. 2017. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 27(5): 1413–1432.
Walbridge S., Slocum N., Pobuda M., Wright D.J. 2018. Unified geomorphological analysis workflows with Benthic Terrain Modeler. Geosciences, 8(3): 94.
Watson J., Joy R., Tollit D., Thornton S.J., Auger-Méthé M. 2021. Estimating animal utilization distributions from multiple data types: a joint spatiotemporal point process framework. Ann. Appl. Stat. 15(4): 1872–1896.
Webster R.A., Soderlund E., Dykstra C.L., Stewart I.J. 2020. Monitoring change in a dynamic environment: spatiotemporal modelling of calibrated data from different types of fisheries surveys of Pacific halibut. Can. J. Fish. Aquat. Sci. 77(8): 1421–1432.
Wood S.N. 2017. Generalized additive models: an introduction with R. 2nd ed. Chapman and Hall/CRC, Boca Raton, FL.

Supplementary material

Supplementary Material 1 (PDF / 7.88 MB).
Supplementary Material 2 (PDF / 33.2 MB).
Supplementary Material 3 (PDF / 10.7 MB).
Supplementary Material 4 (PDF / 32.9 MB).

Information & Authors


Published In

cover image Canadian Journal of Fisheries and Aquatic Sciences
Canadian Journal of Fisheries and Aquatic Sciences
Volume 80Number 1January 2023
Pages: 195 - 210


Received: 27 May 2022
Accepted: 10 September 2022
Accepted manuscript online: 3 October 2022
Version of record online: 8 December 2022

Data Availability Statement

Data from the trawl and HBLL surveys were obtained from the gfdata package (Keppel et al. 2022). Data from the IPHC surveys were obtained from the gfiphc package (Edwards et al. 2021).
Code used for analysis is available from https://gitlab.com/dfo-msea/integrated-groundfish-sdm.

Key Words

  1. demersal fish
  2. species distribution modelling
  3. integrated model
  4. spatiotemporal random field
  5. species richness



Institute of Ocean Sciences, Fisheries and Oceans Canada, Sidney, BC V8L 5T5, Canada
Department of Zoology, University of British Columbia, Vancouver, BC V6T 1Z4, Canada
Pacific Biological Station, Fisheries and Oceans Canada, Nanaimo, BC V9T 6N7, Canada
Department of Mathematics, Simon Fraser University, Burnaby, BC V5A 1S6, Canada
Jessica Nephin
Institute of Ocean Sciences, Fisheries and Oceans Canada, Sidney, BC V8L 5T5, Canada
Pacific Regional Headquarters, Fisheries and Oceans Canada, Vancouver, BC V6C 3S4, Canada
Pacific Biological Station, Fisheries and Oceans Canada, Nanaimo, BC V9T 6N7, Canada
Institute of Ocean Sciences, Fisheries and Oceans Canada, Sidney, BC V8L 5T5, Canada
Pacific Biological Station, Fisheries and Oceans Canada, Nanaimo, BC V9T 6N7, Canada
Department of Biology, University of Victoria, Victoria, BC V6T 1Z4, Canada
Institute of Ocean Sciences, Fisheries and Oceans Canada, Sidney, BC V8L 5T5, Canada
Department of Forest and Conservation Sciences, University of British Columbia, Vancouver, BC V8W 2Y2, Canada

Author Contributions

Conceptualization: PLT, SCA, CKR, DRH, EMR
Data curation: PLT, JN, BP, DRH
Formal analysis: PLT, SCA
Investigation: PLT, SCA, AEP
Methodology: PLT, SCA, JN, AEP
Project administration: PLT, CKR
Software: SCA
Supervision: EMR
Validation: PLT
Visualization: PLT, BP
Writing – original draft: PLT
Writing – review & editing: SCA, JN, CKR, BP, AEP, DRH, EMR

Competing Interests

The authors declare that there are no competing interests.

Funding Information

This project was funded in part by the Fisheries and Oceans Canada’s Marine Spatial Planning Initiative.

Metrics & Citations


Other Metrics


Cite As

Export Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited by

1. Integrating disparate survey data in species distribution models demonstrate the need for robust model evaluation
2. Spatially varying catchability for integrating research survey data with other data sources: case studies involving observer samples, industry-cooperative surveys, and predators as samplers
3. Response of Pacific halibut (Hippoglossus stenolepis) to future climate scenarios in the Northeast Pacific Ocean
4. Application of the machine learning method to estimate the biomass of pacific cod in the North Kuril zone

View Options

View options


View PDF

Get Access

Login options

Check if you access through your login credentials or your institution to get full access on this article.


Click on the button below to subscribe to Canadian Journal of Fisheries and Aquatic Sciences

Purchase options

Purchase this article to get full access to it.

Restore your content access

Enter your email address to restore your content access:

Note: This functionality works only for purchases done as a guest. If you already have an account, log in to access the content to which you are entitled.





Share Options


Share the article link

Share on social media

Cookies Notification

We use cookies to improve your website experience. To learn about our use of cookies and how you can manage your cookie settings, please see our Cookie Policy.