BRIEF RESEARCH REPORT article
Sec. Evolutionary and Population Genetics
Whole Genome Sequencing Reveals Signatures for Artificial Selection for Different Sizes in Japanese Primitive Dog Breeds
- 1College of Animal Science and Veterinary Medicine, Shenyang Agricultural University, Shenyang, China
- 2Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, ON, Canada
Body size is an important trait in companion animals. Recently, a primitive Japanese dog breed, the Shiba Inu, has experienced artificial selection for smaller body size, resulting in the “Mame Shiba Inu” breed. To identify loci and genes that might explain the difference in the body size of these Shiba Inu dogs, we applied whole genome sequencing of pooled samples (pool-seq) on both Shiba Inu and Mame Shiba Inu. We identified a total of 13,618,261 unique SNPs in the genomes of these two breeds of dog. Using selective sweep approaches, including FST, Hp and XP-CLR with sliding windows, we identified a total of 12 genomic windows that show signatures of selection that overlap with nine genes (PRDM16, ZNF382, ZNF461, ERGIC2, ENSCAFG00000033351, CCDC61, ALDH3A2, ENSCAFG00000011141, and ENSCAFG00000018533). These results provide candidate genes and specific sites that might be associated with body size in dogs. Some of these genes are associated with body size in other mammals, but 8 of the 9 genes are novel candidate genes that need further study.
As our best friend, dogs are crucial to modern human society and are involved in many aspects of our life. During domestication, many traits found in dogs diverged greatly from their ancestors. The Shiba Inu (“inu” means dog in Japanese) is an ancient and native dog breed in Japan, and it is a basal spitz breed (Parker et al., 2017). Shiba Inus typically range in height from 34 to 41 cm (Figure 1A)1 although in recent years, artificial selection has been imposed to yield smaller individuals, which are called mini or Mame Shiba Inu (Mame is a Japanese word for “bean,” representing “small”). It is bred only from Shiba Inu to keep “pure ancestry,” and shares some traits with Shiba Inu, such as coat color. The body height of an adult Mame Shiba Inu should be not more than 34 cm (Figure 1B). Although this new breed, the Mame Shiba Inu, has not yet been approved by the American Kennel Club (AKC), it is approved by Kennel Club of Japan and well recognized as a breed in Japan and China.
Regulation of dog body size, or weight, in the dog has been extensively researched, with many studies conducted at the genomic level (Ostrander et al., 2017). Through these works, many genes and loci have been identified as associated with the regulation of body size in the dog. For instance, IGF1 is a well-studied body size gene. Using a genome-wide scan in Portuguese water dogs, a single IGF1 SNP haplotype was linked as a major contributor to small body size (Sutter et al., 2007). Consistently, results reported from a study involving 93 dog breeds found that individuals with different IGF1 SNP and SINE insertion alleles have significant differences in body weight (Rimbault et al., 2013). An IGF1 SNP was also reported in a study that applied both selective sweep and GWAS analyses in 722 canids, including purebred dogs and wolfs (Plassais et al., 2019). SNPs not only in IGF1, but also its receptor, IGF1R (e.g., a non-synonymous SNP at chr3: 44,706,389) was found to be associated with reduced size in dogs (Hoopes et al., 2012). GWAS studies have also implicated other genes with SNPs linked to body size. HMGA2 was reported to be associated with body size where individuals with the ancestral SNP allele in the HMGA2 5′UTR have higher body weight than those with the derived allele (Rimbault et al., 2013), as well as another variation between exon 1 and 2 (Chr10: 8351907) was also reported to have an effect on body weight (Plassais et al., 2019). A genomic region within CDK4 was reported to be associated with size through an across-breed GWAS analysis that involved 8 different dog breeds (Vaysse et al., 2011). With a genome-wide scan on two German shepherd dog families with microsatellite markers identified a contracted DNA repeat in intron 5 of LHX3 that was associated with dwarfism (Voorbij et al., 2011). A quantitative GWAS was also used on 1,873 dogs, from 158 breeds, which identified an interval on Chromosome X that is upstream of the ARHGAP36, IGSF1, and OR5AK2 genes, which was strongly associated with body size (Hayward et al., 2016). Using a GWAS analyses of 690 dogs, three genes (IRS4, IGSF1, and ACSL4) were found to be associated with body weight, where IRS4 and IGSF1 are both involved in the GH/IGF1 and thyroid hormonal pathways involved in body size regulation (Plassais et al., 2017).
While many studies have focused on the variation of body size in dogs, it is still possible that other novel genes and loci have been involved in the selection for body size in dogs. The change in body size between Shiba Inu and Mame Shiba Inu is relatively mild, but distinct, making them good source for investigating mechanisms that change body size. In addition, these breeds are highly divergent from the other domesticated dogs. Here we performed whole-genome pool-sequencing (pool-seq) on Shiba Inu and Mame Shiba Inu dogs to try to identify key regions that experienced selection and may regulate their different body sizes.
Materials and Methods
Owners of the dogs studied here provided informed consent for the use data of their dogs in this scientific research. All procedures used in this study was approved by the Animal Care and Use Committee of Shenyang Agricultural University.
Sample Preparation and Sequencing
Blood samples from a total of 94 dogs (59 Shiba Inu and 35 Mame Shiba Inu dogs) were collected by venipuncture of their forelimb. For genome sequencing, four pooled samples were generated. The first pooled sample was from 35 Mame Shiba Inu dogs, while the other three were from 29, 22, and 8 Shiba Inu dogs. For each pool, an equivalent volume of blood from each individual of the group was mixed to yield a total volume of 1 mL. Genomic DNA was then extracted from each pool sample using a whole blood genomic DNA extraction kit (TIANGEN, DP318). Genomic DNA were checked for concentration and purity, and then sent to the Beijing Genome Institute (Beijing, China) for library construction (insert size ∼ 300 bp) and genome sequencing. After adapter ligation and DNA cluster preparation, the libraries were then subjected to Illumina HiSeq X Ten for sequencing (150 bp, paired-end). To yield similar sequencing depths for the two dog breeds, the Mame Shiba Inu pool sample was sequenced to 30× depth while each of the three Shiba Inu pool samples were only sequenced to a 10× depth (thus generating 30× for the breed).
Mapping and SNP Calling
To find loci that might be associated with the body size difference between the two breeds, reads from the three Shiba Inu dog pool-seq samples were merged into one pair of fastq files. Raw reads were processed using the PoolParty pipeline (Micheletti and Narum, 2018). Briefly, raw reads were filtered using BBDuk2, with thresholds of 20 for base quality and 25 for minimum read length. Clean reads were mapped to the reference dog genome (assembly CanFam 3.1)3 using BWA 0.7.12-r1039 (Li and Durbin, 2009), and the BAM (Binary Alignment Map) files were sorted by coordinate using the Picard tools4. To eliminate bias caused by PCR during library preparation, SAMBLASTER (Faust and Hall, 2014) was used to mark duplicates. Subsequently, SAMtools (Li et al., 2009) and BCFtools (Li, 2011) were used to call variants and generate mpileup files, which were used in the downstream analyses. SNPs with minor allele frequency (MAF) < 0.05, depth < 10, quality < 20 and located within 15 bp of an indel were discarded.
Detection of Selective Sweeps
To identify genomic regions affected by selection, we applied three types of selective sweep analyses, the fixation index (FST), the pooled heterozygosity (Hp) and the cross-population composite likelihood ratio test (XP-CLR) approaches. FST in 25-kb non-overlapped sliding windows between the two breeds was calculated using the “fst-sliding.pl” module in Popoolation2 (Kofler et al., 2011), according to the Weir and Cockerham method (Weir and Cockerham, 1984). In the case that we had two populations and their different pooling strategies, the FST estimator is:
where ni is the sample size and is the sample allele frequency of the populations (Bhatia et al., 2013). Hp and negative Z-transformed Hp (−ZHp) of the Mame Shiba Inu were calculated using a custom python3 script from 25-kb non-overlapped sliding windows using the following formulas:
Z transformation allowed us to compare the two breeds at the same level, since ZHp values indicate the number of standard deviations by which the Hp value deviates from the mean (Rubin et al., 2010). XP-CLR statistics (Chen et al., 2010) between Mame Shiba Inu and Shiba Inu was calculated using xpclr5 in 25-kb non-overlapped windows. Windows containing < 10 SNPs were discarded to prevent spurious signals. Windows that were all in the top 1% of the FST, -ZHp and normalized XP-CLR score distributions were considered to be candidate selective sweep regions. Genes overlapping with these regions were identified using the ENSEMBL dog genome gene annotations (CanFam 3.1).
Sequencing Data, Mapping, and SNPs Calling
In total, we obtained 192.5 Gb of clean sequence data with an average sequencing depth of 39.98× for the two breeds, which was sufficient for the downstream analyses (Table 1). The average coverage against the reference genome was 98.65%, indicating a good quality for the data. We identified 13,618,261 unique variants in the two breeds, of which 11,552,094 (84.8%) were retained after quality control that included 9,242,589 SNPs and 2,309,505 indels. We compared these variants to dbSNP database, 28.22% of these variants matched 58.91% of the records in dbSNP. The Ts/Tv ratios were 1.848 and 1.856 for Shiba Inu and Mame Shiba Inu, while the ratio was 2.09 when computing with dbSNP dataset.
Detection of Selective Sweeps
After discarding windows containing < 10 SNP in the FST sliding window approach, 86,886 windows were retained. The mean FST value was 0.058, with the highest and lowest FST values being 0.311 and 0.008, respectively. A total of 148 gene colocalized in the 869 windows in the top 1% of the FST distribution (FST > 0.146, Figure 2A). Co-localized genes included FGFRL1, GBX1, IQCA1L, RNF212 and FAM189A1, as well as some previously well-studied body size genes such as IGF1, SMAD2 and LCORL (Supplementary Table 1).
Figure 2. Genomic regions with signatures of selective sweeps identified using 25-kb non-overlapping windowed Fst and Hp approaches. (A) Plot of the Fst values in the Shiba Inu vs. Mame Shiba Inu. Red line indicates the significance threshold of the top 1% (Fst = 0.146). (B) Plot of the –ZHp values of Mame Shiba Inu. Red line indicates the significance threshold of the top 1% (–ZHp = 2.860).
Similarly, the mean −ZHp value for the windows was −0.029, with the highest and lowest −ZHp values being 3.767 and −2.019, respectively. We identified 411 genes (Supplementary Table 2) in the 869 windows in the top 1% of the −ZHp distribution of the Mame Shiba Inu (−ZHp > 2.86, Figure 2B). SLTM and SRGAP1 were found in windows with the greatest -ZHp divergence.
In XP-CLR approach, 861 putative selection windows were detected using the threshold of 1% cutoff of normalized XP-CLR scores (Figure 3), with normalized XP-CLR scores ranged from 3.07 to 40.58. We identified 442 genes in these windows (Supplementary Table 3), and the windows with highest XP-CLR scores co-localize with genes ACSBG2, UBA5, and NPHP3.
Figure 3. Genomic regions with signatures of selective sweeps identified using 25-kb non-overlapping windowed XP-CLR approaches. Red line indicates the significance threshold of the top 1% (XP-CLR = 3.07).
Notably, only 12 windows were located all in the top 1% of the FST, -ZHp, and XP-CLR score distributions. Within these 12 windows with selective signatures, we identified nine co-localizing genes (PRDM16, ZNF382, ZNF461, ERGIC2, ENSCAFG00000033351, CCDC61, ALDH3A2, ENSCAFG00000011141, and ENSCAFG00000018533) that are strong candidate selected genes. These windows also included genomic regions, including on Chromosome 3, 8, and 36, which did not overlap with any annotated gene (Table 2).
Due to human imposed artificial selection, some Shiba Inus now have body sizes that are smaller than their ancestors. In this study, we conducted whole-genome pool sequencing on two breeds of primitive Japanese dogs, Shiba Inu and Mame Shiba Inu, to investigate the molecular mechanisms responsible for the differences in their body size, and to identify candidate genes that may regulate body size in all dogs. In the case that only allele frequencies were needed, pool-seq is well suited for detecting selective sweeps (Boitard et al., 2012). The Ts/Tv ratios were relatively low, probably due to selective forces (Wakeley, 1996). 28.22% of the variants matched dbSNP records, suggesting that there might be breed-specific variants for the two breeds. Using selective sweep analyses, we identified several genomic regions with signatures consistent with selection. When compared to the reference dog genome, we found several genes that are possibly involved in the change in body size of Shiba Inus.
With the FST, Hp and XP-CLR with sliding window approaches, nine genes (PRDM16, ZNF382, ZNF461, ERGIC2, ENSCAFG00000033351, CCDC61, ALDH3A2, ENSCAFG00000011141, and ENSCAFG00000018533) were found to show selection signatures. Among these genes, PRDM16 was reported to be associated with growth traits in some species, such as bovine and chicken (Wang et al., 2010; Wu et al., 2021). It is known that PRDM16 controls a brown fat/skeletal muscle switch (Seale et al., 2008), and it might take a role in the growth of Mame Shiba Inu. Additionally, there are genes identified by two approaches, for instance, NCAPG located in a window that had both high FST and low Hp values (Supplementary Tables 1, 2). NCAPG-LCORL region, has previously been reported to be associated with body size or weight in dogs and several other species. The most significant SNP, located on Chromosome 6, identified in a GWAS of 1,781 sheep associated with body weight is in a genomic region that contains the LAP3, NCAPG, and LCORL genes (Al-Mamun et al., 2015). An indel in LCORL (Chr3: 105,755,416) and a SNP (Chr3: 105,363,241) 246 kb upstream of NCAPG-LCORL were identified to be associated with body size in the Tennessee Walking Horse (Staiger et al., 2016). A SNP (Chr3: 105,547,002) in the NCAPG-LCORL region was reported to be associated with the sizes of horses (Makvandi-Nejad et al., 2012). In dogs, a genomic region in LCORL (Chr3: 93,933,450-93,944,095) was previously reported to be under selection in dogs (Vaysse et al., 2011). To date, however, there has been no reports linking PRDM16 or NCAPG with body size in dogs. We identified a genomic region (Chr3: 91,275,000–91,300,000) that has both a high FST value (0.212) and low Hp value (0.045), thus showing a selective signal, between two breeds of Shiba Inu. We also identified selective signals in three well-studied body size genes (IGF1, SMAD2, and LCORL) by the FST approach (Supplementary Table 1), however they did not show extremely low Hp values, which might indicate that they involved in the change in body size of Shiba Inus during artificial selection, but not in key roles.
Although eight of the nine identified genes (ZNF382, ZNF461, ERGIC2, ENSCAFG00000033351, ALDH3A2, ENSCAFG00000011141, and ENSCAFG00000018533) and the two genomic regions that do not overlap with any gene have not previously been associated with body size, they all showed signals for selection with the FST, Hp, and XP-CLR approaches. We suggest that these genes and genomic regions might possibly have an effect on body size in dogs, they also might be involved in some other traits. More in-depth studies of these genomic regions are required to resolve this question.
In conclusion, using pool-seq data and selective analyses, using the FST, Hp, and XP-CLR approaches, we identified nine genes and two genomic regions that experienced selection, including one that were have previously been reported to be associated with body size in mammals (PRDM16). Our study identified eight novel genes (ZNF382, ZNF461, ERGIC2, ENSCAFG00000033351, ALDH3A2, ENSCAFG00000011141, and ENSCAFG00000018533) that need further study. Our results are consistent with previous findings, and provide novel genes and genomic regions that potentially are associated with body size in dogs and other mammals.
Data Availability Statement
Pool-seq data was submitted to NCBI Sequence Read Archive (accession number PRJNA636107).
The animal study was reviewed and approved by Animal Care and Use Committee of Shenyang Agricultural University. Written informed consent was obtained from the owners for the participation of their animals in this study.
GL participated in the design of this study. GL, SR, and WD collected the samples. GL, CF, and SZ performed the experiments. GL, SZ, and ZW analyzed the data. GL, DI, and ZW wrote the manuscript. SZ and ZW conceived, designed the study, and supervised the work. All authors approved the final version of the manuscript.
This work was supported by grants from the Educational Department of Liaoning Province (Climbing Scholar) and the Organization Department of Liaoning Provincial Committee of China (No. XLYC1907018).
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
We thank the owner of the dogs, Zhihui Ma and Yiyang Li, for providing the samples, and the veterinarian Chensu Li for collecting the samples.
The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fgene.2021.671686/full#supplementary-material
- ^ http://www.akc.org
- ^ http://sourceforge.net/projects/bbmap/
- ^ https://www.ncbi.nlm.nih.gov/assembly/GCA_000002285.2
- ^ http://broadinstitute.github.io/picard/
- ^ https://github.com/hardingnj/xpclr
Al-Mamun, H. A., Kwan, P., Clark, S. A., Ferdosi, M. H., Tellam, R., and Gondro, C. (2015). Genome-wide association study of body weight in Australian merino sheep reveals an orthologous region on OAR6 to human and bovine genomic regions affecting height and weight. Genet. Sel. Evol. 47:66. doi: 10.1186/s12711-015-0142-4
Boitard, S., Schlötterer, C., Nolte, V., Pandey, R. V., and Futschik, A. (2012). Detecting selective sweeps from pooled next-generation sequencing samples. Mol. Biol. Evol. 29, 2177–2186. doi: 10.1093/molbev/mss090
Hayward, J. J., Castelhano, M. G., Oliveira, K. C., Corey, E., Balkman, C., Baxter, T. L., et al. (2016). Complex disease and phenotype mapping in the domestic dog. Nat. Commun. 7:10460. doi: 10.1038/ncomms10460
Hoopes, B. C., Rimbault, M., Liebers, D., Ostrander, E. A., and Sutter, N. B. (2012). The insulin-like growth factor 1 receptor (IGF1R) contributes to reduced size in dogs. Mamm. Genome 23, 780–790. doi: 10.1007/s00335-012-9417-z
Kofler, R., Pandey, R. V., and Schlötterer, C. (2011). PoPoolation2: identifying differentiation between populations using sequencing of pooled DNA samples (Pool-Seq). Bioinformatics 27, 3435–3436. doi: 10.1093/bioinformatics/btr589
Li, H. (2011). A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27, 2987–2993. doi: 10.1093/bioinformatics/btr509
Makvandi-Nejad, S., Hoffman, G. E., Allen, J. J., Chu, E., Gu, E., Chandler, A. M., et al. (2012). Four loci explain 83% of size variation in the horse. PLoS One 7:e39929. doi: 10.1371/journal.pone.0039929
Ostrander, E. A., Wayne, R. K., Freedman, A. H., and Davis, B. W. (2017). Demographic history, selection and functional diversity of the canine genome. Nat. Rev. Genet. 18, 705–720. doi: 10.1038/nrg.2017.67
Parker, H. G., Dreger, D. L., Rimbault, M., Davis, B. W., Mullen, A. B., Carpintero-Ramirez, G., et al. (2017). Genomic analyses reveal the influence of geographic origin, migration, and hybridization on modern dog breed development. Cell Rep. 19, 697–708. doi: 10.1016/j.celrep.2017.03.079
Plassais, J., Kim, J., Davis, B. W., Karyadi, D. M., Hogan, A. N., Harris, A. C., et al. (2019). Whole genome sequencing of canids reveals genomic regions under selection and variants influencing morphology. Nat. Commun. 10:1489. doi: 10.1038/s41467-019-09373-w
Plassais, J., Rimbault, M., Williams, F. J., Davis, B. W., Schoenebeck, J. J., and Ostrander, E. A. (2017). Analysis of large versus small dogs reveals three genes on the canine X chromosome associated with body weight, muscling and back fat thickness. PLoS Genet. 13:e1006661. doi: 10.1371/journal.pgen.1006661
Rimbault, M., Beale, H. C., Schoenebeck, J. J., Hoopes, B. C., Allen, J. J., Kilroy-Glynn, P., et al. (2013). Derived variants at six genes explain nearly half of size reduction in dog breeds. Genome Res. 23, 1985–1995. doi: 10.1101/gr.157339.113
Rubin, C. J., Zody, M. C., Eriksson, J., Meadows, J. R., Sherwood, E., Webster, M. T., et al. (2010). Whole-genome resequencing reveals loci under selection during chicken domestication. Nature 464, 587–591. doi: 10.1038/nature08832
Staiger, E. A., Al Abri, M. A., Pflug, K. M., Kalla, S. E., Ainsworth, D. M., Miller, D., et al. (2016). Skeletal variation in tennessee walking horses maps to the LCORL/NCAPG gene region. Physiol. Genom. 48, 325–335. doi: 10.1152/physiolgenomics.00100.2015
Sutter, N. B., Bustamante, C. D., Chase, K., Gray, M. M., Zhao, K., Zhu, L., et al. (2007). A single IGF1 allele is a major determinant of small size in dogs. Science 316, 112–115. doi: 10.1126/science.1137045
Vaysse, A., Ratnakumar, A., Derrien, T., Axelsson, E., Rosengren Pielberg, G., Sigurdsson, S., et al. (2011). Identification of genomic regions associated with phenotypic variation between dog breeds using selection mapping. PLoS Genet. 7:e1002316. doi: 10.1371/journal.pgen.1002316
Voorbij, A. M., van Steenbeek, F. G., Vos-Loohuis, M., Martens, E. E., Hanson-Nilsson, J. M., van Oost, B. A., et al. (2011). A contracted DNA repeat in LHX3 intron 5 is associated with aberrant splicing and pituitary dwarfism in German shepherd dogs. PLoS One 6:e27940. doi: 10.1371/journal.pone.0027940
Wakeley, J. (1996). The excess of transitions among nucleotide substitutions: new methods of estimating transition bias underscore its significance. Trends Ecol. Evol. 11, 158–162. doi: 10.1016/0169-5347(96)10009-4
Wang, J., Li, Z. J., Lan, X. Y., Hua, L. S., Huai, Y. T., Huang, Y. Z., et al. (2010). Two novel SNPs in the coding region of the bovine PRDM16 gene and its associations with growth traits. Mol. Biol. Rep. 37, 571–577. doi: 10.1007/s11033-009-9816-8
Keywords: selective sweep, body size, dog, pool-seq, genome
Citation: Lyu G, Feng C, Zhu S, Ren S, Dang W, Irwin DM, Wang Z and Zhang S (2021) Whole Genome Sequencing Reveals Signatures for Artificial Selection for Different Sizes in Japanese Primitive Dog Breeds. Front. Genet. 12:671686. doi: 10.3389/fgene.2021.671686
Received: 24 February 2021; Accepted: 24 June 2021;
Published: 14 July 2021.
Edited by:Stephane Boissinot, New York University Abu Dhabi, United Arab Emirates
Reviewed by:Samuele Bovo, University of Bologna, Italy
Brian Davis, West Texas A&M University, Texas A&M University System, United States
Copyright © 2021 Lyu, Feng, Zhu, Ren, Dang, Irwin, Wang and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
*Correspondence: Shuyi Zhang, email@example.com