The adaptation of marine and anadromous fishes to novel, freshwater environments requires major physiological shifts in functions related to osmoregulation, immunity, and metabolism. For Atlantic salmon (Salmo salar) populations, such changes have occurred independently in many landlocked populations that were formed as a result of extensive hydrological shifts in North America around 10 000 years ago. We compared patterns of genetic variation between two landlocked and one anadromous population of Atlantic salmon to identify loci that may have played an important evolutionary role in facilitating the transition from an anadromous to an entirely freshwater life history. Outlier loci included single nucleotide polymorphisms (SNPs) in genes related to functions including immunity, growth, and osmoregulation. We also used these same populations to characterize loci associated with distinct hatchery rearing environments. This additional comparison identified outlier SNPs annotated to genes related to wound healing, consistent with findings from other genetic studies of domestication selection in fishes. Together, our results highlight putative responses to both natural selection imposed by major environmental changes and artificial selection levied by differing hatchery environments.
L’adaptation des poissons marins et anadromes à de nouveaux milieux d’eau douce nécessite d’importantes modifications physiologiques de fonctions associées à l’osmorégulation, à l’immunité et au métabolisme. En ce qui concerne le saumon atlantique (Salmo salar), de tels changements se sont produits de manière indépendante dans de nombreuses populations intérieures qui se sont formées dans la foulée de vastes changements hydrologiques en Amérique du Nord il y a environ 10 000 ans. Nous avons comparé les motifs de variation génétique de deux populations intérieures et d’une population anadrome de saumons atlantiques afin de cerner les locus qui pourraient avoir joué un rôle important pour faciliter le passage d’un cycle biologique anadrome à un cycle biologique passé entièrement en eau douce. Parmi les locus atypiques figurent des polymorphismes mononucléotidiques (SNPs) dans des gènes reliés à différentes fonctions, dont l’immunité, la croissance et l’osmorégulation. Nous avons utilisé les mêmes populations pour caractériser les locus associés à différents milieux d’élevage en écloserie. Cette comparaison a permis de cerner des SNP atypiques annotés à des gènes reliés à la cicatrisation de plaies, ce qui concorde avec les constatations d’autres études génétiques de la sélection par domestication chez des poissons. Collectivement, nos résultats font ressortir des réactions présumées à la sélection naturelle imposée par d’importants changements environnementaux et à la sélection artificielle modulée par des milieux d’écloserie différents. [Traduit par la Rédaction]
Atlantic salmon (Salmo salar) are a culturally, ecologically, and economically important fish species that historically ranged from New York (USA) to Iceland in North America and from Portugal to northwestern Russia in Europe (MacCrimmon and Gots 1979). Over the past several decades, however, their native range has exhibited a shrinking southern margin, with many of the southernmost populations in North America and Europe now either extirpated or in strong decline (Parrish et al. 1998; Lehnert et al. 2019). These declines have been attributed to myriad threats, including loss of habitat quality and quantity, warming winter temperatures, increasing marine mortality rates, and introduced pathogens and parasites (Chaput 2012; Forseth et al. 2017; Lehnert et al. 2019). Furthermore, declines in population sizes have coincided with decreases in effective population sizes (Ne) and genome-wide changes in patterns of adaptive variation (Lehnert et al. 2019). These genetic losses may be exacerbated by introgression of maladaptive alleles from escaped farmed salmon or hatchery-reared fish into remaining wild populations (Bourret et al. 2011; Ozerov et al. 2016; Karlsson et al. 2016; Wringe et al. 2018). In contrast to these losses of genetic diversity, Atlantic salmon populations continue to maintain a wide variety of life history and migratory strategies, with numerous anadromous and landlocked populations distributed throughout their range (Hutchings et al. 2019).
In North America, landlocked populations of salmon formed around 10 000 years ago, coinciding with the end of the Last Glacial Maximum and the gradual rise of land masses previously depressed by the weight of glaciers (Yesner et al. 1984; Lumme et al. 2016). In many instances, this uprising disrupted hydrological connections between freshwater lakes and the ocean. Affected lake populations of salmon became physically isolated from their anadromous progenitor populations and began migrating between freshwater lakes and rivers, rather than between marine and freshwater environments, to complete their life cycles. Adaptation to this new set of environments required shifts in a broad suite of functions, including osmoregulatory, immunological, behavioral, and metabolic traits, as has been described for other fish species with populations secondarily adapted to fresh water (Deagle et al. 2013; Kozak et al. 2014; Willoughby et al. 2018). Despite these adaptations, Atlantic salmon populations are declining throughout their native range, with many landlocked and anadromous populations supported by hatchery supplementation (Parrish et al. 1998; Naish et al. 2007; Lehnert et al. 2019).
Hatchery programs are often used to support imperiled fish populations — particularly for salmonids — either via the restoration of previously extirpated populations or by supplementing declining populations. Initiation of hatchery rearing can result in genetic bottlenecks, increasing the effects of genetic drift and decreasing heterozygosity and Ne in subsequent hatchery-supplemented generations (Ryman and Laikre 1991; Christie et al. 2012). Overall, the effects of increased genetic drift are predicted to reduce genetic diversity throughout the genome due to the random sampling of small numbers of individuals for breeding. More targeted reductions in genetic diversity can result from new agents of selection imposed by hatchery rearing practices, often called domestication selection, affecting genes involved in metabolism, behavior, immunity, and wound healing (Christie et al. 2016; Naval-Sanchez et al. 2020; Jin et al. 2020). Both genetic drift and domestication selection can reduce genetic variation at loci under selection in the natural environment and can diminish individual and population fitness after release into the wild (Araki et al. 2007; Milot et al. 2013; Willoughby and Christie 2019).
To identify candidate loci associated with adaptation to an entirely freshwater environment, we examined genome-wide patterns of genetic variation for landlocked and anadromous populations of Atlantic salmon. Two landlocked populations, Lake Champlain and Sebago Lake, were compared against the anadromous LaHave River population (Fig. 1), and we examined 23 326 SNPs from RNA-seq data to identify outlier loci associated with differences between landlocked and anadromous life history strategies. We used the same populations to identify candidate loci that have responded to differences in selection between distinct hatchery rearing environments. Individuals from the Sebago Lake and LaHave River populations were maintained in the same hatchery facility prior to sampling, whereas individuals from Lake Champlain were reared in a different hatchery. To determine whether distinct hatchery rearing environments may have led to detectable patterns of selection, we compared Lake Champlain to both Sebago Lake and LaHave River and identified outlier loci shared by the two comparisons. We detect putative signals of selection across the three populations with respect to both life history strategy and hatchery rearing environment, and we highlight strongly differentiated SNPs in genes with physiological functions relevant to freshwater adaptation and hatchery-imposed selection. Our results provide further support for the hypothesis that genetic adaptation to the hatchery environment can occur over brief timescales and demonstrate that associated genetic patterns can be shared among populations with disparate life history strategies. Over broader evolutionary timescales, we find that adaptation to an entirely freshwater life cycle has involved changes in genes related to immune, osmoregulatory, and metabolic functions, consistent with previous investigations of adaptive differences between freshwater and marine fish populations.
In Lake Champlain (Vermont, New York, Quebec; Fig. 1A), landlocked Atlantic salmon were extirpated in the mid-1800s, and reintroduction efforts began with sustained stocking in the 1970s (Marsden and Langdon 2012). Despite these efforts, there has been little to no documented natural reproduction, although 123 natural-origin young-of-the-year were observed in 2016 — the first in the lake since the late 1800s (Prévost et al. 2020). Individuals from potential source populations have been experimentally introduced to the lake by New York and Vermont state agencies and the US Fish and Wildlife Service, including landlocked salmon sourced from Sebago Lake (Maine), Lake Memphremagog (Vermont, Quebec), Little Clear Lake (New York), and West Grand Lake (Maine). Of these sources, the Vermont Department of Fish and Wildlife determined that the broodstock developed from Sebago Lake individuals exhibited the strongest returns and continued introducing them to Lake Champlain (Dimond and Smitka 2005). Thus, the direct descendants of wild Sebago Lake Atlantic salmon therefore likely comprise a large portion of the Lake Champlain gene pool. Because of the complex supplementation history in Lake Champlain, it is difficult to estimate the number of generations of captive ancestry for the individuals sampled in this study.
Landlocked Atlantic salmon in Sebago Lake (Maine) comprise one of only four native Atlantic salmon populations remaining in Maine, with 62% of the population designated as wild-born in 2013 (Pellerin and Pierce 2015). The Sebago Lake individuals used in this study are descended from fertilized eggs of natural-origin Sebago Lake salmon received by the Ontario Ministry of Natural Resources and Forestry’s Harwood Fish Culture Station (FCS) in 2006 (Houde 2015). Specifically, single-pair matings were conducted in November 2011 at Harwood Fish Culture Station (Harwood, Ontario), and the sequenced individuals were second-generation hatchery fish (He et al. 2017, 2018).
Fertilized eggs from the anadromous LaHave River (Nova Scotia) population were received by Normandale FCS from 1989 to 1995, and these eggs were sourced from a naturally reproducing population (Houde 2015). Fish were maintained at Normandale FCS from 1996 until 2006 when they were moved to Harwood FCS. Individuals sequenced by He et al. (2018) were produced in November 2011 at Harwood FCS and were fourth-generation hatchery fish (2–3 generations at Normandale, followed by 1–2 generations at Harwood; He et al. 2017).
Sampling and sequenced individuals
Thirty-six Lake Champlain Atlantic salmon were sampled in February 2018 at 95 days post-fertilization and comprise four offspring from each of nine single-pair matings conducted in November 2017. Whole fry were sampled and sequenced as described in Harder et al. (2020). Briefly, total RNA was extracted using an RNeasy kit (Qiagen), separate libraries were prepared for all individuals using poly(A) enrichment followed by the TruSeq Stranded mRNA protocol (Illumina), and each cDNA library was sequenced at the Purdue Genomics Core on an Illumina NovaSeq 6000.
Sebago Lake and LaHave River fish were produced from crosses in fall 2011 and were sampled in July 2013 by He et al. (2018). RNA was extracted from spleen tissue of 20 subadult Atlantic salmon from each of the two populations, and poly(A) enrichment was applied to the samples. For each of these populations, equal concentrations of mRNA from four individuals were combined into five pools (n = 20 individuals per population) and sequenced on an Illumina HiSeq 2000. We obtained these reads from the NCBI Sequence Read Archive (BioProject PRJNA335629).
We next aligned our cleaned reads to the annotated Atlantic salmon reference genome (S. salar ICSASG_v2 assembly, NCBI accession GCA_000233375.4; Lien et al. 2016) using HISAT2 v2.1.0 (Kim et al. 2015) with default settings. The BAM files of mapped reads were sorted using SAMtools v1.8 (Li et al. 2009), and single nucleotide polymorphisms (SNPs) were called using GATK v3.8.0 and v4 (McKenna et al. 2010). We used the “HaplotypeCaller” algorithm in Genomic Variant Call Format (GVCF) mode and called genotypes jointly with “GenotypeGVCFs” as recommended for RNA-seq data (Brouard et al. 2019). Due to high memory requirements for this step, we called genotypes across all individuals for one chromosome at a time, resulting in one VCF file for each of the 29 nuclear chromosomes (i.e., excluding scaffold and mitochondrial sequences). “VariantFiltration” was applied to these chromosome-level VCF files to flag SNPs for later removal that occurred in clusters (≥3 SNPs in a 35-bp window), exhibited strand bias (FisherStrand values > 30), or had low call confidence (QualByDepth < 2). Finally, chromosome-level VCF files were combined into one VCF file for all individuals and chromosomes using the “concat” command in BCFtools (Li et al. 2009). The resulting file was imported in R v3.6.2 (R Core Team 2019) using the “vcfR” package (Knaus and Grünwald 2017) for application of filtering criteria. We removed indels, loci with >2 alleles, and SNPs that did not pass the GATK filters described above. We retained genotype calls supported by a read depth of ≥5 per individual (Brouard et al. 2019) and required genotype calls for ≥20 individuals per locus to match the number of individuals sampled from Sebago Lake and LaHave River. After filtering, we retained genotypes for 231 819 loci in the Lake Champlain individuals.
Sebago Lake and LaHave River variant calling and filtering
Cleaned reads for Sebago Lake and LaHave River samples were aligned to the Atlantic salmon reference genome using HISAT2 as described above. The resulting BAM files (1 file per pool; n = 10) were sorted and SAMtools was used to generate a pileup file for each pool. Pileup files were converted to sync files using PoPoolation2 (Kofler et al. 2011), requiring a minimum base quality of 20. Loci were filtered such that all loci in the final sync file had read depths ≥1 in all 10 pools, leaving 271 692 SNPs for further filtering.
We read this sync file into R using the “poolfstat” package (Hivert et al. 2018) and applied a more stringent read depth requirement, keeping loci with read depth ≥8 in at least 4 pools per population. Therefore, we required a minimum of 32× coverage per locus and population. To prevent any single pool within a population from having an outsized effect on determining population allele frequencies, we calculated the coefficient of variation (CV) in read depth for each locus and intended to remove loci with a CV greater than 3 standard deviations from the mean calculated for all 10 pools. However, application of the read depth filter described above removed all loci with a CV greater than our cutoff value.
Calculating allele frequencies and estimating population genetic parameters
Genotypes for Lake Champlain individuals were used to calculate population allele frequencies, which were used to calculate pooled FST and to identify loci putatively under selection (see below). For Sebago Lake and LaHave River, we estimated population allele frequencies by first calculating allele frequencies within each pool (reference allele depth/total read depth at a locus) and then averaging these frequencies across pools within each population. By calculating pool frequencies first, rather than summing read depths across all pools within a population, we attempted to mitigate the possibility of one pool within a population having an outsized impact on population allele frequency due to interpool variability in gene expression and, subsequently, read depth (see online Supplementary Material1, “Pool and population allele frequency calculations”). Because “poolfstat” randomly designates an observed allele as the reference allele, we checked all reference allele identities in the Sebago Lake – LaHave River data and corrected these assignments to match allele identities in the published reference genome and in the Lake Champlain data. We combined the calculated allele frequencies for all three populations into a single data set and only retained loci that were polymorphic in at least one population that had sufficient coverage in all three populations and with minor allele frequencies > 1/76 (i.e., removing doubletons). Furthermore, for all loci with alleles with minor allele frequencies between 1/76 and 4/76 (0.013 to 0.053; n = 3,148), we examined read depth for all minor alleles in Sebago Lake and LaHave River to ensure that they were not driven by only a handful of reads.
To visualize relationships among the populations, we first replicated the structure of the Sebago Lake – LaHave River data for the nonpooled Lake Champlain population by randomly assigning individuals to one of nine pools (replicating four individuals per pool) and calculating pooled allele frequencies using the same approach as for the Sebago Lake – LaHave River populations (i.e., reference allele depth for the pool/total read depth at a locus). Using allele frequencies for the nine bioinformatically pooled Lake Champlain samples and the ten physically pooled Sebago Lake – LaHave River samples, we estimated pairwise pooled FST among all 19 pools, substituting number of individuals per pool for read depth at each locus because read depth always exceeded number of individuals sampled. The script we used to calculate FST is part of the GOPOPS suite of tools for analyzing pooled genetic data and implements Weir and Cockerham’s (1984) estimator modified for pooled data (https://github.com/jacobtennessen/GOPOPS/blob/master/FstFromPooledFreqs.pl) (Willoughby et al. 2018). We supplied the resulting matrix of pairwise FST values to the “hclust” function in R to perform hierarchical clustering using the unweighted pair group method with arithmetic mean (UPGMA) approach and plotted the resulting dendrogram.
To estimate genetic diversity within each population, we calculated pooled heterozygosity (HP) at each locus as HP = 2 × ∑nMAJ × ∑nMIN/(∑nMAJ + ∑nMIN)2, where ∑nMAJ and ∑nMIN are the sums of the major and minor allele counts at a locus (Rubin et al. 2010). For pooled genomic data, HP is typically calculated in windows, summing major and minor allele counts across each window. However, because RNA-seq reads produce uneven read depths across the genome due to variation in transcript expression levels, we calculated HP for each locus separately. We then averaged locus-specific HP over each chromosome and across the whole genome. To statistically test for differences in HP between populations, we used a randomization test. Specifically, we pooled all chromosome-specific HP values for the two populations being compared and randomly drew 29 values without replacement (i.e., the number of nuclear chromosomes) and calculated the mean value. We repeated this process 10 000 times per population comparison and compared the distribution of sample mean values against the known mean chromosome-specific HP values for the two populations being compared. Because the Lake Champlain individuals comprise nine sets of four full siblings, we also evaluated the effect of relatedness among individuals on HP by sampling one individual per family (n = 9), calculating whole-genome HP as described above, repeating this process 1000 times, and calculating the mean and standard deviation of HP across all subsamples.
To identify genetic differentiation among populations, we estimated FST from population allele frequencies (averaged across pools for Sebago Lake and LaHave River) as described above. For the Sebago Lake – LaHave River comparison, we also used the “computeFST” function in “poolfstat” to calculate FST across loci and compared the results against those produced by the GOPOPS approach. To estimate genome-wide differentiation among populations, we averaged FST across all loci for each population comparison. We also examined the effects of relatedness among Lake Champlain individuals on the estimation of pooled FST by again randomly sampling one individual per family (n = 9) and calculating allele frequencies for this subsample. We repeated this process 10 000 times, each time calculating the absolute difference between FST values for the full and subsampled datasets. To examine any potential effects of relatedness, we determined the mean difference in FST between the full and subsampled data sets at each locus.
Identifying SNPs putatively under selection
To identify SNPs putatively under selection across the three populations, we used a k-nearest neighbor (kNN) approach. Implemented in the R “PopGenome” package (Pfeifer et al. 2014) and relying on ELKI software (Schubert and Zimek 2019), this method uses pairwise FST values from all population comparisons to define the locations of all loci in multidimensional space (Pfeifer et al. 2020). Loci in putative neutrally evolving genomic regions cluster together in this space and loci identified as global outliers represent candidate SNPs subject to selection. We used the “kNN_tau_window” function to determine the appropriate value for k and performed the kNN scan with the “kNNCallElkiSINGLE” function. We applied a 0.01 quantile to identify SNPs putatively under selection. Separately, we also applied a Z-transformation to pairwise FST values and only retained loci for which FST was ≥3 standard deviations from the mean (i.e., Z(FST) ≥ 3) in at least one pairwise comparison between two of the three populations. Loci were considered to be outliers if they were identified as putatively under selection by the kNN approach and if Z(FST) ≥ 3 in at least one comparison. To assess the sensitivity of outlier identification to inter-pool variability in the Sebago Lake and LaHave River samples, we also conducted a leave-one-pool-out approach (see online Supplementary Material1, “Assessing effects of inter-pool variability on population allele frequencies”).
Outlier SNPs were annotated and features were identified using the reference genome and SnpEff v4.3 (Cingolani et al. 2012). We retained all error-free annotations comprising unique SNP × gene combinations, excluding SNPs mapped to pseudogenes. Annotations were categorized as coding (missense variants), regulatory (synonymous, intron, and UTR variants), or distant (within 5 kb of a gene, but not located within an annotated transcribed sequence). We categorized synonymous SNPs as “regulatory” rather than “coding” because while synonymous SNPs do not affect the amino acid sequence of a protein, they may have consequences for translation and post-translational processes (reviewed in Shabalina et al. 2013). For SNPs associated with uncharacterized features in the reference annotation, we searched translated sequences against the NCBI database of nonredundant protein sequences using “blastx” and recorded the top characterized result for each feature (Altschul et al. 1997).
Finally, we partitioned the outlier SNPs by considering patterns of differentiation across multiple population comparisons. Sebago Lake and LaHave River fish had been reared in the same hatchery facility for >1 generation prior to sampling, and shared differences between these two populations and Lake Champlain may be due to differences in hatchery practices. To identify outlier SNPs that may be associated with the effects of hatchery rearing practices, we filtered loci with Z(FST) ≥ 3 in both the Lake Champlain – Sebago Lake and Lake Champlain – LaHave River comparisons. We used a similar approach to identify SNPs putatively associated with differences in life history strategy (i.e., anadromous LaHave River population vs. landlocked Lake Champlain and Sebago Lake populations) by requiring Z(FST) ≥ 3 in both the Lake Champlain – LaHave River and Sebago Lake – LaHave River comparisons. Differences among the populations in tissue type (whole fry vs. spleen) and developmental stage sampled (fry vs. subadult) likely limited the number of transcripts that could be detected in all three populations and, therefore, these SNPs likely represent a subset of loci involved in freshwater and hatchery adaptation.
Sequence processing and variant filtering
Our final sets of cleaned reads comprised an average of 39 348 214 read pairs per Lake Champlain individual and an average of 20 377 740 read pairs per pool for the Sebago Lake – LaHave River data sets. After all SNP filtering steps, we retained 234 012 SNPs for Lake Champlain and 135 839 SNPs for Sebago Lake – LaHave River. A total of 23 326 SNPs were shared among all three data sets and these loci were used for all downstream analyses. For Sebago Lake – LaHave River, the majority of SNPs in this final set had at least 8× coverage in each of the five pools per population (i.e., minimum 40× coverage) (83.7% and 82.8% of SNPs in Sebago Lake and LaHave River, respectively). Additionally, mean read depth for minor alleles with frequencies between 1/76 and 4/76 was 11.77 (SD = 22.50) in Sebago Lake and 16.30 (SD = 31.35) for LaHave River. Only 833 and 633 minor alleles had a total depth of ≤5 reads in Sebago Lake and LaHave River, respectively, and were not at any loci later identified as outliers.
Relationships among populations and genetic diversity
The dendrogram constructed using allele frequencies for all three populations illustrates that the landlocked Lake Champlain and Sebago Lake populations are more closely related to each other than either population is to the anadromous LaHave River population (Fig. 1B). Estimates of genome-wide genetic differentiation are consistent with this pattern, with the lowest pairwise FST estimate for the Lake Champlain – Sebago Lake comparison (FST = 0.03) and higher estimates for Lake Champlain – LaHave River (FST = 0.13) and Sebago Lake – LaHave River (FST = 0.12). For the Sebago Lake – LaHave River comparison, FST values calculated using the GOPOPS approach and the “computeFST” function were identical across loci, and we used the FST estimates generated by the GOPOPS script for all downstream analyses. Genome-wide pooled heterozygosity (HP) was highest in the Lake Champlain population (HP = 0.278), followed by Sebago Lake (HP = 0.255) and LaHave River (HP = 0.247) (Fig. 1C). Heterozygosity was significantly higher in Lake Champlain than in both Sebago Lake and LaHave River (p < 0.0001), which did not significantly differ from one another (p > 0.05; Fig. S11).
Relatedness among Lake Champlain individuals had minimal impact on estimation of HP, with a mean HP value for family-sampling permutations of 0.270 (SD = 0.001; Fig. S21). We also found relatedness to have little effect on estimation of FST. For the Lake Champlain – Sebago Lake population comparison, the mean difference between FST estimated for nine and 36 individuals was ≤0.05 for 78% of loci (median value = 0.006) and was ≤0.05 for 49% of loci for the Lake Champlain – LaHave River comparison (median value = 0.053; Fig. S31).
Outlier SNPs putatively under selection
In total, the kNN approach identified 232 SNPs putatively under selection, and there were 1020 SNPs with Z(FST) ≥ 3 in at least one population comparison. The intersection of these two lists of SNPs comprises 209 SNPs and represents the final set of outlier loci (Supplementary Table S11). The Lake Champlain – Sebago Lake comparison included the fewest outlier SNPs (37 SNPs; Fig. 2A), with similar numbers of outlier SNPs for the Sebago Lake – LaHave River and Lake Champlain – LaHave River comparisons (145 and 160 SNPs, respectively; Figs. 2B, 2C). We found that interpool variability minimally affected the identity of detected outlier SNPs, with the vast majority of outliers retained in all analyses of subsampled pools (see Supplementary Material1, “Assessing effects of inter-pool variability on population allele frequencies”).
Outlier SNP annotations
Of the 209 outlier SNPs, 190 were successfully annotated to functional protein-coding features in the reference genome. A total of 77 SNPs were annotated to more than one feature (for example, a SNP can occur in the 3′ UTR of a gene and also within 5 kb of another gene), resulting in a total of 290 unique annotations (Supplementary Table S21). Three genes contained multiple nonsynonymous missense outlier variants; one gene is uncharacterized in the reference annotation (but its best “blastx” match was gypsy retrotransposon integrase 1-like protein (Clarias magur); all “blastx” results are provided in Supplementary Table S31), and the other two genes play roles in transcription, including a transcriptional repressor known as zinc finger and BTB domain-containing protein 18-like (LOC106610379; Fig. 3) and eukaryotic translation initiation factor 5B-like (LOC106574637).
We identified 21 SNPs as outliers in both the Lake Champlain – Sebago Lake and Lake Champlain – LaHave River comparisons, and these loci may be associated with differences in hatchery rearing environments (i.e., putative hatchery-associated SNPs). Error-free annotations were produced for 16/21 of hatchery-associated SNPs (Supplementary Table S41), with 13 SNPs annotated to transcribed sequences and four of these SNPs were identified as missense variants. We also identified 108 SNPs that were outliers in both the Lake Champlain – LaHave River and Sebago Lake – LaHave River comparisons and that may be associated with adaptation to fresh water in Lake Champlain and Sebago Lake (i.e., potential life history-associated SNPs). For life history-associated SNPs, 99/108 were annotated (Supplementary Table S51). Eleven of these variants were classified as missense, including one SNP in a gene involved in osmoregulation (Fig. 4).
Of the three populations compared in this study, we found the closest genetic relationship between Lake Champlain and Sebago Lake individuals. This result likely reflects the inclusion of Sebago Lake salmon in the Lake Champlain breeding program. In addition to receiving genetic input from Sebago Lake individuals, the Lake Champlain population has also been supplemented by other source populations (Dimond and Smitka 2005). Although salmon originating from Sebago Lake were deemed most successful in Lake Champlain by management agencies, the high genetic diversity (i.e., HP) in Lake Champlain relative to both Sebago Lake and the LaHave River suggests that fish sourced from other populations may have also been reproductively successful in the Lake Champlain system and genetically contributed to the contemporary population. Alternatively, elevated HP in Lake Champlain may be due to differences in breeding designs or variance in family sizes between Lake Champlain hatcheries and hatcheries that have propagated Sebago Lake and LaHave River individuals (i.e., the Normandale and Harwood Fish Culture Stations). Disentangling these hypotheses would require a more extensive genetic survey of hatchery broodstock through time and the identification of all sources for Lake Champlain reintroduction efforts.
The history of shared ancestry between Lake Champlain and Sebago Lake also likely explains the relatively low number of outlier SNPs shared between these two populations, but this trend may also be influenced by environmental similarities between the two populations. Both Lake Champlain and Sebago Lake Atlantic salmon are landlocked (i.e., no fish migrate to the ocean), and both lakes also contain prey species with high concentrations of thiaminase in their tissues (Dimond and Smitka 2005). Thiaminase is an enzyme that degrades thiamine (vitamin B1), and consumption of this enzyme has been linked to thiamine deficiency in salmonids (reviewed in Harder et al. 2018). Relative to Sebago Lake salmon, LaHave River fish (with a natural diet lacking thiaminase) have been shown to experience greater decreases in liver tissue thiamine concentrations when fed an experimental diet containing thiaminase, suggesting that the Sebago Lake population may have genetically adapted to dietary thiaminase (Houde et al. 2015). The genetic variation associated with this putative adaptation may also be advantageous in Lake Champlain, and evidence suggests that the genetic variation required for adaptation to low thiamine availability may already be present in the Lake Champlain population (Harder et al. 2020).
Of the 37 outlier SNPs identified between Lake Champlain and Sebago Lake, 21 are also outliers in the Lake Champlain – LaHave River comparison. From 2006 until sampling in 2011, the Sebago Lake and LaHave River fish were reared in the same hatchery (Houde 2015), and shared differences between these two populations and Lake Champlain may reflect parallel patterns of hatchery-imposed selection. Responses to selection in hatcheries can occur over short timescales (e.g., a single generation; Christie et al. 2016) and may also vary among specific rearing strategies (Berejikian et al. 2000; Venney et al. 2021). We identified putative hatchery-associated SNPs annotated to genes with roles in transcription, protein folding, and intracellular signaling, among other functions. Two SNPs are annotated to genes important for blood platelet formation (transcription factor NF-E2 45 kDA subunit-like; LOC106572584) and epithelial cell adhesion (chloride intracellular channel 4; CLIC4) (Shivdasani et al. 1995; Argenzio et al. 2014). Specifically, CLIC4 knockdown and suppression experiments have demonstrated its critical roles in skin and corneal wound healing (Padmakumar et al. 2012), and expression of NF-E2 is required for platelet production by megakaryocytes (Shivdasani et al. 1995). Differentiation at these genes is consistent with previous findings that the earliest stages of domestication in fishes can lead to new patterns of gene expression in wound healing pathways (Christie et al. 2016; Konstantinidis et al. 2020).
We used a second set of comparisons to identify loci that may be associated with life history by comparing the landlocked Lake Champlain and Sebago Lake populations with the anadromous LaHave River population. For putative life history-associated SNPs located in regulatory regions, 77 SNPs were annotated to 89 unique gene sequences (Supplementary Table S51). Of these genes, one was previously identified as differentially expressed in response to supplemental thiamine treatment and with respect to family-level survival in the face of thiamine deficiency in Lake Champlain Atlantic salmon by Harder et al. (2020) (Supplementary Fig. S41). This gene, suppressor of cytokine signaling 2-like (LOC106576177; SOCS2), is a negative regulator of growth hormone signaling and may also be important in adaptation to salinity (Greenhalgh et al. 2005; Komoroske et al. 2016; Dalongeville 2018). In anadromous Arctic char (Salvelinus alpinus), SOCS2 is strongly upregulated in response to fasting, indicating a role for this gene in maintenance of energy homeostasis in another migratory fish species (Jørgensen et al. 2013). The expression patterns of SOCS2 associated with thiamine deficiency in Lake Champlain coupled with strong differences in allele frequencies between the two lake populations and the LaHave River suggest that differences in diet and ecology in the two types of environments may have led to adaptive differences.
Twelve additional life history-associated SNPs were missense variants annotated to genes with roles in intracellular signaling, translation, and immune function, including one MHC class I gene (major histocompatibility complex class I-related gene protein-like; LOC106588402). Of the genes involved in intracellular signaling, one encodes adenylate cyclase type 6-like (ADCY6; LOC106571946) — a protein that promotes increased water reabsorption from urine in kidney collecting ducts (Konno et al. 2010; Jung and Kwon 2016; Olesen and Fenton 2017) (Fig. 4) Specifically, activated ADCY6 initiates a signaling cascade that leads to insertion of water channel proteins (aquaporin-2) into collecting duct cell membranes. This allows for increased water reabsorption from urine, a process critical for maintenance of tissue water homeostasis (Jung and Kwon 2016). Variation in the coding sequence of the ADCY6 gene between the two landlocked populations and the LaHave River population is likely driven by different osmotic demands in freshwater and marine environments, as has been described for other osmoregulatory genes in fish populations secondarily adapted to entirely freshwater life cycles (Deagle et al. 2013; Kozak et al. 2014; Willoughby et al. 2018).
When considering all outlier SNPs (i.e., with Z(FST) ≥ 3 in at least one population comparison), only three genes contained multiple missense variants. One of these genes is zinc finger and BTB domain-containing protein 18-like (LOC106610379; ZBTB18). Three missense SNPs located in exons of ZBTB18 are Z(FST) outliers in the Lake Champlain – LaHave River comparison (Figs. 3A, 3B). Substantial allele frequency shifts were also detected between Sebago Lake and LaHave River at these three missense SNPs (mean FST = 0.77, but they were not identified as Z(FST) outliers (Figs. 3B, 3C). ZBTB18 is a transcriptional repressor involved in a wide variety of functions including brain and muscle development (Yokoyama et al. 2009; Okado 2019) and is also differentially expressed in response to hyperosmotic conditions in cell culture (Boyd et al. 2005; Ni et al. 2017). Differentiation at multiple SNPs within this gene between landlocked and anadromous populations may be related to differences in growth and development (e.g., age and size at maturity; Riley and Power 1987; Hutchings et al. 2019) or to differences in salinity between the two environments.
It is important to note that differences among the populations in tissue type (whole fry vs. spleen) and developmental stage sampled (fry vs. subadult) likely limited the number of loci that could be detected in all three populations (i.e., locus-specific transcripts upregulated in one study may have been absent in the other, and vice versa). However, because our analyses focused on SNPs that were shared among all three populations, and individual SNPs were only called from reads that aligned to the exact same genomic position, there should not be any bias related to tissue type or stage in the allele frequency estimates of the loci that we did use in this study. Nevertheless, we may be missing additional SNPs important to freshwater and hatchery adaptation, and future studies should use similar experimental designs (e.g., matching tissue types and developmental stage) to maximize the number of shared SNPs for outlier analyses.
By comparing landlocked and anadromous Atlantic salmon populations, we identified outlier SNPs in genes related to immune, osmoregulatory, and metabolic functions, consistent with previous investigations of adaptive differences between freshwater and marine fish populations. In addition to SNPs that may be subject to selection in the natural environment, we also identified outlier loci putatively associated with hatchery rearing conditions by comparing Lake Champlain against Sebago Lake and LaHave River. Some of these SNPs were annotated to genes critical for proper wound healing, similar to findings of other genetic studies examining domestication selection in fishes. Together, these two sets of results highlight putative responses to both natural selection imposed by major environmental changes and artificial selection imposed by differing hatchery environments. Our findings reinforce the importance of minimizing the number of generations spent in captivity, particularly for species already experiencing range-wide declines in population sizes and genetic diversity, and highlight the utility of genomic data for answering questions in both evolutionary biology and fisheries management.
The authors declare there are no competing interests.
AMH and MRC designed the project, AMH analyzed the data, and AMH and MRC wrote the paper. Both authors approved the final manuscript.
We thank T. Chairvolotti, T. Jones and K. Kelsey of the Vermont Fish and Wildlife Department and W. Ardren, H. Bouchard, P. Boynton, S. Frost, T. Jones, E. Lehnert, W. Olmstead, N. Staats, W. Wayman and D. Wong of the USFWS for coordination of Lake Champlain spawning and gamete collection events. We also thank B. Neff, A. Houde, and X. He for sharing information regarding the hatchery histories of the Sebago Lake and LaHave River populations. Additionally, we thank the Purdue Genomics Core for their sequencing efforts and members of the Christie laboratory group for constructive comments on an earlier draft of this manuscript. This research was supported by the Waser Graduate Research Assistantship in Ecology and Evolutionary Biology (Purdue University) awarded to AMH and by funding from the Purdue Department of Biological Sciences to MRC.
Altschul S.F., Madden T.L., Schäffer A.A., Zhang J., Zhang Z., Miller W., and Lipman D.J. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17): 3389–3402.
Berejikian B.A., Tezak E.P., Flagg T.A., LaRae A.L., Kummerow E., and Mahnken C.V. 2000. Social dominance, growth, and habitat use of age-0 steelhead (Oncorhynchus mykiss) grown in enriched and conventional hatchery rearing environments. Can. J. Fish. Aquat. Sci. 57(3): 628–636.
Bourret V., O'Reilly P.T., Carr J.W., Berg P.R., and Bernatchez L. 2011. Temporal change in genetic integrity suggests loss of local adaptation in a wild Atlantic salmon (Salmo salar) population following introgression by farmed escapees. Heredity, 106(3): 500–510.
Cingolani P., Platts A., Wang L.L., Coon M., Nguyen T., Wang L., et al. 2012. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly, 6(2): 80–92.
Dalongeville A., Benestan L., Mouillot D., Lobreaux S., and Manel S. 2018. Combining six genome scan methods to detect candidate genes to salinity in the Mediterranean striped red mullet (Mullus surmuletus). BMC Genomics, 19: 217.
Deagle B.E., Jones F.C., Absher D.M., Kingsley D.M., and Reimchen T.E. 2013. Phylogeography and adaptation genetics of stickleback from the Haida Gwaii archipelago revealed using genome-wide single nucleotide polymorphism genotyping. Mol. Ecol. 22(7): 1917–1932.
Greenhalgh C.J., Rico-Bautista E., Lorentzon M., Thaus A.L., Morgan P.O., Willson T.A., et al. 2005. SOCS2 negatively regulates growth hormone action in vitro and in vivo. J. Clin. Invest. 115(2): 397–406.
Hutchings J.A., Ardren W.R., Barlaup B.T., Bergman E., Clarke K.D., Greenberg L.A., et al. 2019. Life-history variability and conservation status of landlocked Atlantic salmon: an overview. Can. J. Fish. Aquat. Sci. 76(10): 1697–1708.
Jin Y., Olsen R.E., Harvey T.N., Østensen M., Li K., Santi N., et al. 2020. Comparative transcriptomics reveals domestication‐associated features of Atlantic salmon lipid metabolism. Mol. Ecol. 29(10): 1860–1872.
Jørgensen E.H., Martinsen M., Strøm V., Hansen K.E.R., Ravuri C.S., Gong N., and Jobling M. 2013. Long-term fasting in the anadromous Arctic charr is associated with downregulation of metabolic enzyme activity and upregulation of leptin A1 and SOCS expression in the liver. J. Exp. Biol. 216(17): 3222–3230.
Karlsson S., Diserud O.H., Fiske P., Hindar K., and Stewart Grant W. 2016. Widespread genetic introgression of escaped farmed Atlantic salmon in wild salmon populations. ICES J. Mar. Sci. 73(10): 2488–2498.
Konno N., Kurosawa M., Kaiya H., Miyazato M., Matsuda K., and Uchiyama M. 2010. Molecular cloning and characterization of V2-type receptor in two ray-finned fish, gray bichir, Polypterus senegalus and medaka, Oryzias latipes. Peptides, 31(7): 1273–1279.
Konstantinidis I., Sætrom P., Mjelle R., Nedoluzhko A.V., Robledo D., and Fernandes J.M.O. 2020. Major gene expression changes and epigenetic remodelling in Nile tilapia muscle after just one generation of domestication. Epigenetics, 15(10): 1052–1067.
Kozak G.M., Brennan R.S., Berdan E.L., Fuller R.C., and Whitehead A. 2014. Functional and population genomic divergence within and between two species of killifish adapted to different osmotic niches. Evolution, 68(1): 63–80.
Lumme, J., Ozerov, M.Y., Veselov, A.E., and Primmer, C.R. 2016. The formation of landlocked populations of Atlantic salmon. In Evolutionary biology of the Atlantic salmon. Edited by T. Vladić and E. Petersson. CRC Press, New York. pp. 26–43.
McKenna A., Hanna M., Banks E., Sivachenko A., Cibulskis K., Kernytsky A., et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20(9): 1297–1303.
Naish K.A., Taylor J.E., Levin P.S., Quinn T.P., Winton J.R., Huppert D., and Hilborn R. 2007. An evaluation of the effects of conservation and fishery enhancement hatcheries on wild populations of salmon. Adv. Mar. Biol. 61–194.
Padmakumar V.C., Speer K., Pal-Ghosh S., Masiuk K.E., Ryscavage A., Dengler S.L., et al. 2012. Spontaneous skin erosions and reduced skin and corneal wound healing characterize C LIC4NULL mice. Am. J. Pathol. 181(1): 74–84.
Prévost A.D., Hill N.L., Grant J.W.A., Ardren W.R., and Fraser D.J. 2020. Patterns of reproductive success among reintroduced Atlantic salmon in two Lake Champlain tributaries. Conserv. Genet. 21: 149–159.
Shivdasani R.A., Rosenblatt M.F., Zucker-Franklin D., Jackson C.W., Hunt P., Saris C.J.M., and Orkin S.H. 1995. Transcription factor NF-E2 is required for platelet formation independent of the actions of thrombopoietin/MGDF in megakaryocyte development. Cell, 81: 695–704.
Wringe B.F., Jeffery N.W., Stanley R.R.E., Hamilton L.C., Anderson E.C., Fleming I.A., et al. 2018. Extensive hybridization following a large escape of domesticated Atlantic salmon in the Northwest Atlantic. Commun. Biol. 1(1): 108.
Yokoyama S., Ito Y., Ueno-Kudoh H., Shimizu H., Uchibe K., Albini S., et al. 2009. A systems approach reveals that the myogenesis genome network is regulated by the transcriptional repressor RP58. Dev. Cell. 17(6): 836–848.
Avril M. Harder and Mark R. Christie. 2022. Genomic signatures of adaptation to novel environments: hatchery and life history-associated loci in landlocked and anadromous Atlantic salmon (Salmo salar). Canadian Journal of Fisheries and Aquatic Sciences.
79(5): 761-770. https://doi.org/10.1139/cjfas-2021-0066
If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.
1. Genomic Signatures of Freshwater Adaptation in Pacific Herring (Clupea pallasii)