Standardized IMGT® Nomenclature of Salmonidae IGH Genes, the Paradigm of Atlantic Salmon and Rainbow Trout: From Genomics to Repertoires

In teleost fish as in mammals, humoral adaptive immunity is based on B lymphocytes expressing highly diverse immunoglobulins (IG). During B cell differentiation, IG loci are subjected to genomic rearrangements of V, D, and J genes, producing a unique antigen receptor expressed on the surface of each lymphocyte. During the course of an immune response to infections or immunizations, B cell clones specific of epitopes from the immunogen are expanded and activated, leading to production of specific antibodies. Among teleost fish, salmonids comprise key species for aquaculture. Rainbow trout (Oncorhynchus mykiss) and Atlantic salmon (Salmo salar) are especially important from a commercial point of view and have emerged as critical models for fish immunology. The growing interest to capture accurate and comprehensive antibody responses against common pathogens and vaccines has resulted in recent efforts to sequence the IG repertoire in these species. In this context, a unified and standardized nomenclature of salmonid IG heavy chain (IGH) genes is urgently required, to improve accuracy of annotation of adaptive immune receptor repertoire dataset generated by high-throughput sequencing (AIRRseq) and facilitate comparisons between studies and species. Interestingly, the assembly of salmonids IGH genomic sequences is challenging due to the presence of two large size duplicated IGH loci and high numbers of IG genes and pseudogenes. We used data available for Atlantic salmon to establish an IMGT standardized nomenclature of IGH genes in this species and then applied the IMGT rules to the rainbow trout IGH loci to set up a nomenclature, which takes into account the specificities of Salmonid loci. This unique, consistent nomenclature for Salmonid IGH genes was then used to construct IMGT sequence reference directories allowing accurate annotation of AIRRseq data. The complex issues raised by the genetic diversity of salmon and trout strains are discussed in the context of IG repertoire annotation.

In teleost fish as in mammals, humoral adaptive immunity is based on B lymphocytes expressing highly diverse immunoglobulins (IG). During B cell differentiation, IG loci are subjected to genomic rearrangements of V, D, and J genes, producing a unique antigen receptor expressed on the surface of each lymphocyte. During the course of an immune response to infections or immunizations, B cell clones specific of epitopes from the immunogen are expanded and activated, leading to production of specific antibodies. Among teleost fish, salmonids comprise key species for aquaculture. Rainbow trout (Oncorhynchus mykiss) and Atlantic salmon (Salmo salar) are especially important from a commercial point of view and have emerged as critical models for fish immunology. The growing interest to capture accurate and comprehensive antibody responses against common pathogens and vaccines has resulted in recent efforts to sequence the IG repertoire in these species. In this context, a unified and standardized nomenclature of salmonid IG heavy chain (IGH) genes is urgently required, to improve accuracy of annotation of adaptive immune receptor repertoire dataset generated by high-throughput sequencing (AIRRseq) and facilitate comparisons between studies and species. Interestingly, the assembly of salmonids IGH genomic sequences is challenging due to the presence of two large size duplicated IGH loci and high numbers of IG genes and pseudogenes. We used data available for Atlantic salmon to establish an IMGT standardized nomenclature of IGH genes in this species and then applied the IMGT rules to the rainbow trout IGH loci to set up a nomenclature, which takes into account

INTRODUCTION
Vertebrate species with jaws (Gnasthostomata) that appeared more than 400 million years ago are all characterized by an adaptive immune system based on B and T cells along with the huge diversity and specificity of their antigen receptors, the immunoglobulins (IG) or antibodies and the T cell receptors (TR), respectively (1, 2). The analysis of the germline IGH locus defines the genomic repertoire with the identification of the functional variable (V), diversity (D), and joining (J) genes that participate in the synthesis of VH domains. It also allows the identification of the functional constant (C) genes that encode the constant regions of the heavy chains and define their isotypes (3)(4)(5)(6)(7).
In teleost fish, B cell clonal responses are induced by infection or immunization, as described in humans or mice. Antibodies constitute a key factor for fish specific immunity and for the protection afforded by vaccines. As key species in aquaculture, Salmonids (family Salmonidae) including rainbow trout (Oncorhynchus mykiss; Oncmyk) and Atlantic salmon (Salmo salar; Salsal) constitute important models for the study of antibodies and B cell responses in fish.
Several groups started to clone and sequence IGH cDNA from rainbow trout in the early 1990s (8)(9)(10)(11)(12). Comparison of VH domains (V-D-J-REGION) expressed in trout stocks from Sweden, France, and the US revealed differences in IGHV subgroup usage: subgroups named 8, 9, 10, and 11 were found only in Swedish stocks while subgroups 4 and 7 were only found in French stocks and subgroup 5 (now part of IGHV1) was found in Swedish, French, and US stocks. These observations suggested genetic differences between the IGHV gene germline repertoires of different populations, but this was not fully clear due to the very small numbers of sampled individuals. In 1996, expressed VH domain sequences were classified into a set of 11 IGHV subgroups, defining a first unified nomenclature for rainbow trout (13). A more extensive study performed in 2006 on American trout by the group of Steve Kaattari found all these subgroups expressed, indicating that IGHV subgroups may have a wider distribution than previously suggested. Two additional subgroups expressed at low frequency were also discovered in this survey (14), leading to a repertoire of 13 IGHV subgroups. These subgroups were used for an IMGT gene table created in 2009, with a provisional gene nomenclature (letter S) for rainbow trout IGHV [path to access: IMGT Repertoire (IG and TR) >1. Locus and genes > Gene tables > IGHV > Rainbow trout (O. mykiss)] 1 .
In Atlantic salmon, Solem et al. described in 2001 nine IGHV subgroups (15), seven of which corresponded to IGHV subgroups defined in rainbow trout (1, 2, 3, 6, 8, 9, and 11). Southern blot experiments suggested that the number of genes per subgroup could vary between 1 and 7 ± 10. This work also clearly established that Atlantic salmon IGHV genes were rearranged and transcribed from both of the two Atlantic salmon IGH loci (IGH locus A on chromosome 6 and IGH locus B on chromosome 3), which were most likely produced by the salmonid whole genome duplication. These data actually suggested that genes from some subgroups could be expressed only from a single locus, while genes from other subgroups were expressed from both A and B loci. This analysis was later extended and refined in 2010 by Yasuike et al. from a complete assembly of the Atlantic salmon IGH A and B loci based on sequences of 24 bacterial artificial chromosomes (BAC) (16). This study provided a first map of the organization of the duplicated IGH loci of a salmonid species. Ninety-nine IGHV genes were found in locus A, and 103 in locus B; 23 IGHV genes are functional in locus A, and 32 in locus B. Using the IMGT threshold of 75% identity for the V-REGION, 18 IGHV subgroups were defined in this work (16). Subgroups that did fit with the IGHV subgroups established in rainbow trout were given a subgroup number consistent with the online 2009 IMGT gene table [IMGT Repertoire (IG and TR) > 1. Locus and genes > Gene tables > IGHV > Atlantic salmon (Salmo salar)] 1 .
As new genome assemblies of Atlantic salmon and rainbow trout have been recently made available, we decided to annotate the IGH locus of these species and to establish a common nomenclature of IGH genes based on IMGT rules. We used data previously published for Atlantic salmon (16) to develop a prototype for the Salmonid IMGT standardized nomenclature. We also applied the IMGT rules to the rainbow trout IGH loci as a novel example of IMGT genomic annotation. The objective was to take into account the specificities of the Salmonid loci and to develop a unique, consistent nomenclature, while respecting the IMGT Scientific chart rules and standards. These standards are based on the concepts of identification (keywords), classification (gene and allele nomenclature), description (labels), and numbering (IMGT unique numbering and IMGT Collier de Perles) (3). It is important to note that a consistent nomenclature is crucial to build IMGT reference directory sets that are constituted by the V-REGION, D-REGION, and J-REGION of each IMGT reference allele from IMGT/LIGM-DB (same accession numbers as GenBank, ENA, and DDBJ) (17). These reference directory sets are the fundamental basis for annotation of repertoire datasets produced by high-throughput AIRRseq approaches for the analysis of expressed repertoires, in particular to define expressed clonotypes (18)(19)(20). The IMGT reference directories are built following the classification of the V, D, J, and C genes and alleles according to the IMGT rules and the assignment of the IMGT functionality: functional (F), open reading frame (ORF), or pseudogene (P) (IMGT Scientific chart > IMGT functionality) 1 (3). These rules ensure that the nomenclature is consistent within and between species, and can be updated when more sequence data become available. Reference directory sets are used by IMGT/V-QUEST and IMGT/JunctionAnalysis (21,22) for detailed analysis of nucleotide (nt) sequences of V domains [V-(D)-J-REGION]; by IMGT/DomainGapAlign, which provides alignments of amino acid (AA) sequences with the closest V and J regions for V domains and the closest C exons for C domains (23); by IMGT Collier de Perles based on the IMGT unique numbering for V and C domains (24,25); and by IMGT/HighV-QUEST (26,27) for high-throughput sequence analysis of expressed IGH repertoires and clonotype definition (18)(19)(20). Importantly, IMGT reference directory sets are freely available for the academic community and can be used by other programs developed for repertoire analysis.
In this work, we produced reference directory sets for IGH loci of Atlantic salmon and rainbow trout, based on a unique nomenclature developed for salmonids and following IMGT rules. We show how the particularities of salmonid IGH loci (duplicated loci in each haplotype, large number of genes and pseudogenes) were taken into account and how reference directory sets can be used for annotation of IGH expression datasets. We also discuss how the nomenclature and reference directories can be updated with new data and extended to other salmonid species.
For obtaining IMGT gene names, newly identified Atlantic salmon and rainbow trout IGH genes and alleles from genome assemblies were submitted to the IG, T cell receptors (TR), and major histocompatibility (MH) Nomenclature Sub-Committee (IMGT-NC) of the International Union of Immunological Societies (IUIS) Nomenclature Committee 2,3 . Two IMGT_NC reports #2019-5-0131 and #2019-7-0220 2 comprise the

RESULTS
The complete and correct assembly of the Salmonidae IGH loci is a significant challenge owing to (i) the existence of two duplicated loci due to the tetraploidization (named locus A and locus B), (ii) the large size of each locus, (iii) the high number of different IGHV subgroups compared to mammals, (iv) the internal amplification and potential gene conversion that occurred inside each locus during their evolution, and (v) the very high number of pseudogenes, many of them partial, relative to the functional genes.
We therefore explored how the standardized IMGT nomenclature could allow the identification and classification of genes and alleles in incomplete or not yet fully annotated genome assemblies. The IGH data published for Atlantic salmon (16), largely based on BAC sequencing, were used as a prototype for establishing the standardized IMGT nomenclature for salmonids and for dealing, by comparison, with newly identified IGH genes from both Atlantic salmon and rainbow trout genome assemblies. The particularities of these IGH loci (in particular the tetraploidization) were taken into consideration for consistency between salmonid species.
From IG Classes to IMGT Constant (C) Gene Names Three antibody classes have been identified in fish, namely, IgM, IgD, and IgT, while IgG, IgA, and IgE are absent (28). IgM and IgD are generally co-expressed at the cell surface of the same B cells through alternative splicing, as in mammals. Soluble IgM are tetrameric and constitute the main antibody class in serum. A third class, IgT, is expressed in most fish groups including salmonids. Interestingly, the IG-Heavy-Tau chains of IgT have a VH domain that results from independent V-D-J rearrangements, and is not obtained by a switch process (29). IgT has been found only in bony fish and is particularly involved in mucosal immunity and protection (30). IGHD was cloned and characterized in rainbow trout and Atlantic salmon, in parallel to the discovery of IGHT encoding the third fish IG-Heavy-Tau isotype (28,29) and then in Atlantic salmon (31).
By convention, IMGT groups are designated by the locus and gene type. Based on the four gene types, V (variable), D (diversity), J (joining), and C (constant), the IGH genes belong to four groups: IGHV, IGHD, IGHJ, and IGHC. For the IGH locus, the constant genes are designated by the letter (and, if relevant, number) corresponding to the encoded isotype (IGHT, IGHM, and IGHD), instead of using the letter C.
The salmonid IGHC genes belong to three subgroups IGHM, IGHD, and IGHT and encode, when functional, the C-REGION of the heavy chain defining these three isotypes, IG-Heavy-Mu (heavy chain of the IgM class), IG-Heavy-Delta (heavy chain of the IgD class), and IG-Heavy-Tau (heavy chain of the IgT class) ( Table 1). Salmonid locus A and locus B were assigned based on the literature, with the letter D (for "duplicated") added to the conventional gene names for locus B.

Atlantic Salmon IGH Constant Genes and Associated D and J Genes
The Atlantic salmon IGH locus A, which is in a reverse (REV) orientation on chromosome 6 and spans 660 kilobases (kb) (with the V genes encompassing 600 kb) (Figure 1) includes 7 IGHC genes with 17 associated IGHD genes and 13 IGHJ genes. The Atlantic salmon IGH locus B, which is in forward (FWD) orientation on chromosome 3 and spans 720 kb (with the V genes encompassing 670 kb) (Figure 2) includes 5 IGHC genes with 11 associated IGHD genes and 8 IGHJ genes. The constant region of the IG-Heavy-Mu chain and of the IG-Heavy-Delta are encoded by a unique gene per locus (IGHM and IGHD for locus A and IGHMD and IGHDD for locus B) preceded by a D-J cluster. There are several IG-Heavy-Tau genes (IGHT), but the associated D-J cluster may be incomplete (lacking D and/or J genes). In Atlantic salmon, there is only one IGHT functional (F) gene per locus, IGHT4 for locus A and IGHT2D for locus B, each one having a complete D-J cluster ( Table 2).
In the Atlantic salmon locus A, the D and J genes associated to IGHT genes comprise two D (IGHD1T2 and IGHD2T2) and two J (IGHJ1T2 and IGHJ2T2) upstream of the pseudogene (P) IGHT2, two J (IGHJ1T3 and IGHJ2T3) upstream of IGHT3 (P), five D (IGHD1T4 to IGHD5T4) and two J (IGHJ1T4 and IGHJ2T4) genes, all of them functional, upstream of IGHT4 (F) and one D (IGHD1T5) and two J (IGHJ1T5 and IGHJ2T5) upstream of IGHT5 (P). There is no IGHD or IGHJ upstream of IGHT1 (P) ( Table 2). The D and J associated to IGHM and IGHD comprise nine D (IGHD1 to IGHD9), all of them functional and five J genes, three of them functional (IGHJ1, IGHJ3, and IGHJ4), one with ORF, the IGHJ2, and one with alleles F or ORF (IGHJ5). They are located upstream of IGHM (F) and shared with the IGHD constant gene (F) ( Table 2 and Figures S1, S2). Eleven IGHD not directly associated to constant genes are dispersed in locus A (IGHD-1 to IGHD-11).

Rainbow Trout IGH Constant Genes and Associated D and J Genes
Similar to the Atlantic salmon, the rainbow trout has one functional gene per IGH locus encoding the constant region of the IG-Heavy-Mu (IGHM gene in locus A and IGHMD gene in locus B), the constant region of the IG-Heavy-Delta (IGHD gene in locus A and IGHDD gene in locus B), and the constant region of the IG-Heavy-Tau (IGHT2 gene in locus A and IGHT1D gene in locus B).
The rainbow trout IGH locus A, which spans 360 kb and is in a forward (FWD) orientation on chromosome 13, includes 11 IGHD genes, 10 IGHJ genes, and 4 IGHC genes ( Table 3). There are three D and two J genes upstream of IGHT1 (P), two D and two J genes upstream of IGHT2 (F), and six D and six J genes (all of them F) upstream of IGHM (F) and shared with the IGHD (F) constant gene (Figures S1, S2).
The rainbow trout IGH locus B, which spans 485 kb and is in a forward (FWD) orientation on chromosome 12, includes 13 IGHD genes, 9 IGHJ genes, and 3 IGHC genes ( Table 3).
There are four D genes (1 ORF and 3 F) and two J genes (both F) upstream of IGHT1D (F), and six D and seven J genes (all of them F) upstream of IGHMD (F) and shared with the IGHDD (F) constant gene (Figures S1, S2).
The demonstration that there is only one rainbow trout IG-Heavy-Delta complete gene per locus, IGHD in locus A and  IGHDD in locus B, respectively, and that these two genes are functional, results from the analysis derived from applying the nomenclature of the Atlantic salmon IGH loci as well as the interpretation of expression data and published references (15,16,29,31). The anomalies (partial IGHD and IGHDD genes with exons in aberrant localizations or in reverse-complementary orientation) are likely artifacts of the current genome assembly. For that reason, the functionality of the IGHD and IGHDD, deduced from literature data and supported by sequences external to the genome assembly, is shown in parentheses in Table 3.
Based on the percentage of identity between nucleotide sequences of the V-REGION (threshold 75%), the Atlantic salmon 303 IGHV genes can be classified into 16

Rainbow Trout IGH Variable Genes
A total of 129 IGHV genes were identified in the rainbow trout genome, of which 57 can be considered fully functional or with an ORF without stop codon. A number of other sequences were identified as IGHV fragments in the assembly and were not included in the annotation. On chromosome 13 (locus A), 44 IGHV genes were found upstream of the functional IGHT2 gene, as well as 5 IGHV genes between the D-J-IGHT2 cluster and the D-J-IGHM-IGHD cluster. Eighty IGHV genes were found on chromosome 12 (locus B): 70 IGHV were located upstream of the functional IGHT1D gene and 10 IGHV were found between  the D-J-IGHT1D cluster and the D-J-IGHMD-IGHDD cluster. The 129 rainbow trout IGHV genes could be classified into the same 16 subgroups defined for the Atlantic salmon IGHV genes, containing from only 1 pseudogene (i.e., IGHV5, IGHV13, and IGHV14 subgroups) to 35 genes, i.e., IGHV1 subgroup, which includes 12 F, 2 ORF, and 21 P IGHV genes. Figure 3 shows a phylogenetic tree based on nucleotide sequences of IGHV genes (F and ORF) present in Atlantic salmon and rainbow trout IGH loci. While some IGHV subgroups are not represented in both species, as far as we know, this tree illustrates how rainbow trout IGHV genes nicely cluster with their Atlantic salmon counterparts.

Expressed Repertoire Analysis
IMGT/V-QUEST and its high-throughput version, IMGT/HighV-QUEST, can perform analysis of nucleotide sequences of the IG and TR variable domains (21,22,26,27). These tools run against the IMGT/V-QUEST reference directory database that includes several sets (per group and per species) and are built based on the IMGT standards (3) (annotation in IMGT/LIGM-DB, Gene tables, Alignments of alleles, Protein display, entry in IMGT/GENE-DB). The IMGT/V-QUEST sets comprise IMGT reference sequences from all functional (F) and ORF genes and alleles (in Advanced parameters, Selection of IMGT reference directory set "F + ORF"). The sets also include IMGT reference sequences from pseudogenes (P) and alleles with an in-frame V-REGION for versatile genomic analysis (proposed by default, in Advanced parameters IMGT reference directory set "F + ORF + in-frame P").
We then investigated the functionality and expression level of IGHV genes from the two species using the standardized nomenclature based on genomic annotation. To do so, adaptive immune receptor repertoire datasets generated by high-throughput sequencing (AIRRseq) were submitted to IMGT/HighV-QUEST analysis.

Atlantic Salmon
AIRRseq data from head kidney of Atlantic salmon were generated based on 5 ′ RACE and specific primers for IGHM constant region [data from reference (32)]. Using the Atlantic salmon reference dataset updated in 2019, a total of 50 IGHV genes (42 functional "F, " 4 "ORF" and 4 pseudogenes "P") were expressed in the dataset (Figure 4A). More than 80% of submitted sequences presented IGHV F genes. Interestingly, the majority of expressed V genes were from locus B (chromosome 3). This difference was reflected in the abundance of rearrangements (∼66% from locus B) and in the diversity of IGHV genes expressed: 25 IGHV from locus B vs. 17 IGHV from locus A (Figure 4A). On average, IGHV1D-25 * 01, IGHV6D-18 * 01, IGHV6D-16 * 01, and IGHV1-73 * 01 were the most abundant IGHV functional genes, accounting for 30% of the expressed repertoire.

Rainbow Trout
In this species, we analyzed AIRRseq datasets from fish intraperitoneally immunized with a killed bacterial pathogen, Yersinia ruckeri [data from reference (33)]. 5 ′ RACE PCR products were produced from spleen of immunized fish, using specific primers for IGHM constant region and with unique molecular identifiers (UIDs) for better data normalization (33). Only in-frame productive rearrangements (CDR3-IMGT without stop codons) were analyzed. Trout used in this study belonged to the isogenic line derived from Swanson strain that was selected for the rainbow trout genome project. Hence, these AIRRseq data express IGH genes from the very same repertoire, which was annotated in the current IMGT reference directories. These data therefore provided a quantitative assessment of the expression of IGHV genes in the spleen of three genetically similar individuals responding to a pathogen. In this dataset, IMGT/High V-QUEST unambiguously identified the IGHV gene in 94% of submitted sequences, 90% of them with at least 99% of sequence identity (52% with 100% of identity). A total of 55 IGHV genes (35 functional "F, " 9 "ORF, " and 7 pseudogenes "P") were expressed. Interestingly, these rearrangements are from both IGH loci (A and B) in relatively similar proportions.
In each trout sample, about 17% of sequences corresponded to IGHV ORF genes and 1.7-4.7% corresponded to IGHV pseudogenes (most of them correspond to IGHV1D-12 * 01 P or IGHV1-21 * 01 P) involved in-frame junction rearrangements. This feature could be detected because we selected the IMGT/HighV-QUEST directory sets "F + ORF + in-frame P, " which also include pseudogenes with in-frame stop codon in V region or defect in the leader or recombination signal (RS) sequences (3). Although IGH transcripts with stop codon are generally rare in mammals, they are typically much more frequent in fish, perhaps because nonsense-mediated mRNA decay (NMD) may work differently (28,32,34,35).
Hence, about 80% of submitted rainbow trout sequences presented functional IGHV genes ( Figure 4B). IGHV4D-24 * 01 F, IGHV6D-40 * 01 F, IGHV1-18 * 01 F, and IGHV11-25 * 01 F were the most expressed on average, with a limited interindividual variation as expected from the genetic constitution of the fish analyzed. In this dataset, for about 6% of submitted sequences, IMGT/HighV-QUEST provided two results assigned to distinct duplicated germline IGHV with alleles having identical or close sequences (for example, IGHV12D56 * 01/IGHV12D57 * 01, or IGHV8-30 * 01/IGHV8-40 * 01) owing to the gene duplication in salmonids.   Table 5), were compared. Only one allele per gene was included in the analysis (allele *01 for all but two IGHV8-58*02 F and IGHV8-53*03 F). Nodes with a bootstrap support higher than 75% are indicated.
Although the datasets analyzed here for salmon and trout were not selected for direct comparison, it suggests that these two species (at least, the fish strains analyzed here) do not use the two loci in the same way (see above). A rigorous and comprehensive comparison of expressed repertoires between rainbow trout and Atlantic salmon will require a systematic comparison of AIRRseq data from multiple strains.

Genetic Variability of IG Genes in Salmonids
Making available a full annotation and versatile nomenclature also offers the possibility to better integrate new data about variability of IG (or TR) genes. This issue is of particular interest in Salmonids for two main reasons: (1) variations of IG gene sequences may affect the repertoire of specificities targeted by Abs, in turn impacting the quality and efficiency of responses against pathogens, and (2) salmonid IG loci are particularly complex with high numbers of functional genes and pseudogenes located in two regions; therefore, they constitute interesting models to understand mechanisms of short-term evolution of such loci and the potential importance of homogenization vs. diversification of IG sequences.
To get preliminary data about IGHV variation in a salmonid species, we took advantage of the full genome sequencing of 19 isogenic lines of rainbow trout. These lines were produced using a mitogynogenesis-based strategy by Quillet et al. (36,37). They represent 19 haplotypes randomly picked from the socalled INRA-SY "synthetic" population. This population was created about 35 years ago by a planned random mating (i.e., panmictic) mixture of French, Danish, and American domestic populations, and has been maintained since without any voluntary selection. The 19 isogenic lines analyzed here do not appear to be closely related to the Swanson trout generated at Washington State University using androgenesis, which has been sequenced and constitutes the reference genome (38,39).
The numbers of indel and SNP detected within IGHV genes and pseudogenes are indicated in Table 6. Genetic variation between isogenic lines overall appears to be relatively modest at this level. It seems to be more frequent in the locus located in chromosome 13 (67 SNP and 1 indel for 29 functional genes, 41 SNP and 3 indel for 20 pseudogenes) compared to chromosome 12 (23 SNP and no indel for 29 functional genes, and 53 SNP and 10 indel for 51 pseudogenes). The proportion of silent vs. non-silent mutations was not significantly different between the two regions (40NS/67 SNP for chromosome 13 and 13NS/23 SNP for chromosome 12), suggesting that these genes did not evolve under strong positive selection. Indel

IGHV subgroup
Atlantic salmon Rainbow trout Nb of genes Nb of alleles IGH locus A IGH locus B Nb of genes Nb of alleles IGH locus A IGH locus B F* ORF + P* F* ORF* P* F* ORF* P* F* ORF* P*  and SNP were not significantly more frequent in pseudogenes.
Variants were filtered to eliminate all assembly artifacts, but these data will have to be fully validated by resequencing, and the impact of variation on the gene status evaluated. We have indications that several new genes are present in productive and expressed rearrangements. This might be due to the absence of such genes in the genome of the Swanson strain or to gaps in the current reference genome assembly. In this context, it is of interest to evaluate the variability of IGHV gene numbers between the different haplotypes. Future assemblies will allow a more accurate description of the IGH diversity and variability. Incompleteness of the annotated repertoire may constitute a problem for repertoire analysis (for example, when a missing gene is used by a clonotype clonally selected in a response). Hence, sequences of genes that are not localized in the current assembly may be added to the IMGT Reference directories sets, providing that sufficient evidence is available to demonstrate their existence and expression. These sequences will be given a provisional name (with S) until their location and presence in the germline genomic sequence are validated. If new genes would appear, which do not belong to any of the IGHV subgroups identified and described in this work, a new subgroup may have to be defined. This is not impossible, but seems to be unlikely since we believe that the large set of IGHV sequences analyzed from Atlantic salmon and rainbow trout probably contains at least one representative of all subgroups. Such additions will be validated by the IG, TR, and MH Nomenclature Sub-Committee (IMGT-NC) (6, 7) of the IUIS Nomenclature Committee 2,3 , following a procedure analogous to the one used for example for inferred alleles in human.

CONCLUSION
Genome assembly is available for both Atlantic salmon and rainbow trout, representing the two main genera of Salmonids (Salmo and Oncorhynchus). More genomic (and transcriptome) data are coming from a number of genomic backgrounds, which will provide a rich source of knowledge about variations of potential antibody repertoires in these species. We therefore revisited the description and annotation of the two IGH loci present in these two species, currently from cDNA and BAC clone sequences, based on the IMGT biocuration and nomenclature for Salmonid IGH genes that will facilitate the analysis of AIRRseq data. The IG or antibody repertoire sequencing has started to develop both in rainbow trout and in Atlantic salmon, reflecting a growing interest for an accurate and comprehensive description of the response against common pathogens and vaccines. As full genome assemblies are now available for several salmonid species (Atlantic salmon, rainbow trout, coho salmon, and chinook salmon), comparative analysis of the IGH locus structure in FIGURE 4 | IGHV usage determined by IMGT/HighV-QUEST tool. Analysis of AIRRseq datasets obtained previously from head kidney of three healthy Atlantic salmons (A) and from spleen of three rainbow trouts that were intraperitoneally immunized with killed Yersinia ruckeri (B). Libraries were generated by 5 ′ RACE using specific primers for IGHM constant region. IGHV usage is expressed as the percentage of total productive rearrangements.
these closely related tetraploidized species is of great interest. It also appears very important to investigate the level of variation between germline repertoires of IG genes across commercial and wild salmonid stocks. This variation may have significant implications for practical issues in aquaculture and conservation; it will also be of significant interest for the basic comparative immunology community, in particular to address accurately the mechanisms of gene conversion, somatic hypermutation, and memory in these species and during vertebrate evolution.

DATA AVAILABILITY STATEMENT
The datasets generated for this study can be found in the www.imgt.org -accession numbers can be found within the manuscript. Any other data supporting the conclusions of this manuscript will be made available by the authors, without undue reservation, to any qualified researcher.

SUPPLEMENTARY MATERIAL
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fimmu. 2019.02541/full#supplementary-material Figure S1 | Alignment of the D-GENE-UNIT sequences of the IGHD (diversity) genes located upstream of the IGHM (locus A) and IGHMD (locus B) genes of Salmo salar (Salsal) and Oncorhynchus mykiss (Oncmyk) (A) and located upstream of IGHT genes (B). Genes of the locus B genes are identified by the letter D which follows the gene number. Labels are according to the D-GENE prototype (IMGT Scientific chart > 1. Sequence and 3D structure identification and description > IMGT prototypes table > D-GENE) 1 . Figure S2 | Alignment of the J-REGION amino acid sequences of the IGHJ (joining) genes located upstream of the IGHM or IGHT (locus A) and IGHMD or IGHTD (locus B) genes of Salmo salar (Salsal) and Oncorhynchus mykiss (Oncmyk). Genes of the locus B are identified by the letter D which follows the gene number. Labels are according to the J-GENE prototype (IMGT Scientific chart > 1. Sequence and 3D structure identification and description > IMGT prototypes table > J-GENE) 1 . The highly conserved FDYWGKGTXVT motif is pink highlighted and those residues that deviated from it are in red.