Whole Genome Sequencing Reveals Signatures for Artificial Selection for Different Sizes in Japanese Primitive Dog Breeds

Body size is an important trait in companion animals. Recently, a primitive Japanese dog breed, the Shiba Inu, has experienced artificial selection for smaller body size, resulting in the “Mame Shiba Inu” breed. To identify loci and genes that might explain the difference in the body size of these Shiba Inu dogs, we applied whole genome sequencing of pooled samples (pool-seq) on both Shiba Inu and Mame Shiba Inu. We identified a total of 13,618,261 unique SNPs in the genomes of these two breeds of dog. Using selective sweep approaches, including FST, Hp and XP-CLR with sliding windows, we identified a total of 12 genomic windows that show signatures of selection that overlap with nine genes (PRDM16, ZNF382, ZNF461, ERGIC2, ENSCAFG00000033351, CCDC61, ALDH3A2, ENSCAFG00000011141, and ENSCAFG00000018533). These results provide candidate genes and specific sites that might be associated with body size in dogs. Some of these genes are associated with body size in other mammals, but 8 of the 9 genes are novel candidate genes that need further study.


INTRODUCTION
As our best friend, dogs are crucial to modern human society and are involved in many aspects of our life. During domestication, many traits found in dogs diverged greatly from their ancestors. The Shiba Inu ("inu" means dog in Japanese) is an ancient and native dog breed in Japan, and it is a basal spitz breed (Parker et al., 2017). Shiba Inus typically range in height from 34 to 41 cm ( Figure 1A) 1 although in recent years, artificial selection has been imposed to yield smaller individuals, which are called mini or Mame Shiba Inu (Mame is a Japanese word for "bean, " representing "small"). It is bred only from Shiba Inu to keep "pure ancestry, " and shares some traits with Shiba Inu, such as coat color. The body height of an adult Mame Shiba Inu should be not more than 34 cm ( Figure 1B). Although this new breed, the Mame Shiba Inu, has not yet been approved by the American Kennel Club (AKC), it is approved by Kennel Club of Japan and well recognized as a breed in Japan and China.
Regulation of dog body size, or weight, in the dog has been extensively researched, with many studies conducted at the genomic level . Through these works, many genes and loci have been identified as associated with the regulation of body size in the dog. For instance, IGF1 is a wellstudied body size gene. Using a genome-wide scan in Portuguese water dogs, a single IGF1 SNP haplotype was linked as a major contributor to small body size (Sutter et al., 2007). Consistently, results reported from a study involving 93 dog breeds found that individuals with different IGF1 SNP and SINE insertion alleles have significant differences in body weight (Rimbault et al., 2013). An IGF1 SNP was also reported in a study that applied both selective sweep and GWAS analyses in 722 canids, including purebred dogs and wolfs (Plassais et al., 2019). SNPs not only in IGF1, but also its receptor, IGF1R (e.g., a non-synonymous SNP at chr3: 44,706,389) was found to be associated with reduced size in dogs (Hoopes et al., 2012). GWAS studies have also implicated other genes with SNPs linked to body size. HMGA2 was reported to be associated with body size where individuals with the ancestral SNP allele in the HMGA2 5 UTR have higher body weight than those with the derived allele (Rimbault et al., 2013), as well as another variation between exon 1 and 2 (Chr10: 8351907) was also reported to have an effect on body weight (Plassais et al., 2019). A genomic region within CDK4 was reported to be associated with size through an across-breed GWAS analysis that involved 8 different dog breeds (Vaysse et al., 2011). With a genome-wide scan on two German shepherd dog families with microsatellite markers identified a contracted DNA repeat in intron 5 of LHX3 that was associated with dwarfism (Voorbij et al., 2011). A quantitative GWAS was also used on 1,873 dogs, from 158 breeds, which identified an interval on Chromosome X that is upstream of the ARHGAP36, IGSF1, and OR5AK2 genes, which was strongly associated with body size (Hayward et al., 2016). Using a GWAS analyses of 690 dogs, three genes (IRS4, IGSF1, and ACSL4) were found to be associated with body weight, where IRS4 and IGSF1 are both involved in the GH/IGF1 and thyroid hormonal pathways involved in body size regulation (Plassais et al., 2017). While many studies have focused on the variation of body size in dogs, it is still possible that other novel genes and loci have been involved in the selection for body size in dogs. The change in body size between Shiba Inu and Mame Shiba Inu is relatively mild, but distinct, making them good source for investigating mechanisms that change body size. In addition, these breeds are highly divergent from the other domesticated dogs. Here we performed whole-genome pool-sequencing (pool-seq) on Shiba Inu and Mame Shiba Inu dogs to try to identify key regions that experienced selection and may regulate their different body sizes.

Ethics Statement
Owners of the dogs studied here provided informed consent for the use data of their dogs in this scientific research. All procedures used in this study was approved by the Animal Care and Use Committee of Shenyang Agricultural University.

Sample Preparation and Sequencing
Blood samples from a total of 94 dogs (59 Shiba Inu and 35 Mame Shiba Inu dogs) were collected by venipuncture of their forelimb. For genome sequencing, four pooled samples were generated. The first pooled sample was from 35 Mame Shiba Inu dogs, while the other three were from 29, 22, and 8 Shiba Inu dogs. For each pool, an equivalent volume of blood from each individual of the group was mixed to yield a total volume of 1 mL. Genomic DNA was then extracted from each pool sample using a whole blood genomic DNA extraction kit (TIANGEN, DP318). Genomic DNA were checked for concentration and purity, and then sent to the Beijing Genome Institute (Beijing, China) for library construction (insert size ∼ 300 bp) and genome sequencing. After adapter ligation and DNA cluster preparation, the libraries were then subjected to Illumina HiSeq X Ten for sequencing (150 bp, paired-end). To yield similar sequencing depths for the two dog breeds, the Mame Shiba Inu pool sample was sequenced to 30× depth while each of the three Shiba Inu pool samples were only sequenced to a 10× depth (thus generating 30× for the breed).

Mapping and SNP Calling
To find loci that might be associated with the body size difference between the two breeds, reads from the three Shiba Inu dog pool-seq samples were merged into one pair of fastq files. Raw reads were processed using the PoolParty pipeline (Micheletti and Narum, 2018). Briefly, raw reads were filtered using BBDuk 2 , with thresholds of 20 for base quality and 25 for minimum read length. Clean reads were mapped to the reference dog genome (assembly CanFam 3.1) 3 using BWA 0.7.12-r1039 (Li and Durbin, 2009), and the BAM (Binary Alignment Map) files were sorted by coordinate using the Picard tools 4 . To eliminate bias caused by PCR during library preparation, SAMBLASTER (Faust and Hall, 2014) was used to mark duplicates. Subsequently, SAMtools  and BCFtools (Li, 2011) were used to call variants and generate mpileup files, which were used in the downstream analyses. SNPs with minor allele frequency (MAF) < 0.05, depth < 10, quality < 20 and located within 15 bp of an indel were discarded.

Detection of Selective Sweeps
To identify genomic regions affected by selection, we applied three types of selective sweep analyses, the fixation index (F ST ), the pooled heterozygosity (H p ) and the cross-population composite likelihood ratio test (XP-CLR) approaches. F ST in 25-kb non-overlapped sliding windows between the two breeds was calculated using the "fst-sliding.pl" module in Popoolation2 (Kofler et al., 2011), according to the Weir and Cockerham method (Weir and Cockerham, 1984). In the case that we had two populations and their different pooling strategies, the F ST estimator is: where n i is the sample size andp i is the sample allele frequency of the populations (Bhatia et al., 2013). H p and negative Z-transformed Hp (−ZH p ) of the Mame Shiba Inu were calculated using a custom python3 script from 25-kb nonoverlapped sliding windows using the following formulas: Z transformation allowed us to compare the two breeds at the same level, since ZH p values indicate the number of standard deviations by which the H p value deviates from the mean (Rubin et al., 2010). XP-CLR statistics (Chen et al., 2010) between Mame Shiba Inu and Shiba Inu was calculated using xpclr 5 in 25-kb non-overlapped windows. Windows containing < 10 SNPs were discarded to prevent spurious signals.
Windows that were all in the top 1% of the F ST , -ZH p and normalized XP-CLR score distributions were considered to be candidate selective sweep regions. Genes overlapping with these regions were identified using the ENSEMBL dog genome gene annotations (CanFam 3.1).

Sequencing Data, Mapping, and SNPs Calling
In total, we obtained 192.5 Gb of clean sequence data with an average sequencing depth of 39.98× for the two breeds, which 5 https://github.com/hardingnj/xpclr was sufficient for the downstream analyses (

Detection of Selective Sweeps
After discarding windows containing < 10 SNP in the F ST sliding window approach, 86,886 windows were retained. The mean F ST value was 0.058, with the highest and lowest F ST values being 0.311 and 0.008, respectively. A total of 148 gene colocalized in the 869 windows in the top 1% of the F ST distribution (F ST > 0.146, Figure 2A). Co-localized genes included FGFRL1, GBX1, IQCA1L, RNF212 and FAM189A1, as well as some previously well-studied body size genes such as IGF1, SMAD2 and LCORL (Supplementary Table 1). Similarly, the mean −ZH p value for the windows was −0.029, with the highest and lowest −ZH p values being 3.767 and −2.019, respectively. We identified 411 genes (Supplementary Table 2) in the 869 windows in the top 1% of the −ZH p distribution of the Mame Shiba Inu (−ZH p > 2.86, Figure 2B). SLTM and SRGAP1 were found in windows with the greatest -ZH p divergence.
In XP-CLR approach, 861 putative selection windows were detected using the threshold of 1% cutoff of normalized XP-CLR scores (Figure 3), with normalized XP-CLR scores ranged from 3.07 to 40.58. We identified 442 genes in these windows (Supplementary Table 3), and the windows with highest XP-CLR scores co-localize with genes ACSBG2, UBA5, and NPHP3.
Notably, only 12 windows were located all in the top 1% of the F ST , -ZH p , and XP-CLR score distributions. Within these 12 windows with selective signatures, we identified nine co-localizing genes (PRDM16, ZNF382, ZNF461, ERGIC2, ENSCAFG00000033351, CCDC61, ALDH3A2, ENSCAFG00000011141, and ENSCAFG00000018533) that are strong candidate selected genes. These windows also included genomic regions, including on Chromosome 3, 8, and 36, which did not overlap with any annotated gene ( Table 2).

DISCUSSION
Due to human imposed artificial selection, some Shiba Inus now have body sizes that are smaller than their ancestors. In this  study, we conducted whole-genome pool sequencing on two breeds of primitive Japanese dogs, Shiba Inu and Mame Shiba Inu, to investigate the molecular mechanisms responsible for the differences in their body size, and to identify candidate genes that may regulate body size in all dogs. In the case that only allele frequencies were needed, pool-seq is well suited for detecting selective sweeps (Boitard et al., 2012). The Ts/Tv ratios were relatively low, probably due to selective forces (Wakeley, 1996). 28.22% of the variants matched dbSNP records, suggesting that there might be breed-specific variants for the two breeds. Using selective sweep analyses, we identified several genomic regions with signatures consistent with selection. When compared to the reference dog genome, we found several genes that are possibly involved in the change in body size of Shiba Inus. With the F ST , H p and XP-CLR with sliding window approaches, nine genes (PRDM16, ZNF382, ZNF461, ERGIC2, ENSCAFG00000033351, CCDC61, ALDH3A2, ENSCAFG00000011141, and ENSCAFG00000018533) were found to show selection signatures. Among these genes, PRDM16 was reported to be associated with growth traits in some species, such as bovine and chicken (Wang et al., 2010;Wu et al., 2021). It is known that PRDM16 controls a brown fat/skeletal muscle switch (Seale et al., 2008), and it might take a role in the growth of Mame Shiba Inu. Additionally, there are genes identified by two approaches, for instance, NCAPG located in a window that had both high F ST and low H p values (Supplementary Tables 1, 2). NCAPG-LCORL region, has previously been reported to be associated with body size or weight in dogs and several other species. The most significant SNP, located on Chromosome 6, identified in a GWAS of 1,781 sheep associated with body weight is in a genomic region that contains the LAP3, NCAPG, and LCORL genes (Al-Mamun et al., 2015). An indel in LCORL (Chr3: 105,755,416) and a SNP (Chr3: 105,363,241) 246 kb upstream of NCAPG-LCORL were identified to be associated with body size in the Tennessee Walking Horse (Staiger et al., 2016). A SNP (Chr3: 105,547,002) in the NCAPG-LCORL region was reported to be associated with the sizes of horses (Makvandi-Nejad et al., 2012). In dogs, a genomic region in LCORL (Chr3: 93,933,944,095) was previously reported to be under selection in dogs (Vaysse et al., 2011). To date, however, there has been no reports linking PRDM16 or NCAPG with body size in dogs. We identified a genomic region  (Chr3: 91,275,000-91,300,000) that has both a high F ST value (0.212) and low H p value (0.045), thus showing a selective signal, between two breeds of Shiba Inu. We also identified selective signals in three well-studied body size genes (IGF1, SMAD2, and LCORL) by the F ST approach (Supplementary Table 1), however they did not show extremely low H p values, which might indicate that they involved in the change in body size of Shiba Inus during artificial selection, but not in key roles.
Although eight of the nine identified genes (ZNF382, ZNF461, ERGIC2, ENSCAFG00000033351, ALDH3A2, ENSCAFG00000011141, and ENSCAFG00000018533) and the two genomic regions that do not overlap with any gene have not previously been associated with body size, they all showed signals for selection with the F ST , H p , and XP-CLR approaches. We suggest that these genes and genomic regions might possibly have an effect on body size in dogs, they also might be involved in some other traits. More in-depth studies of these genomic regions are required to resolve this question.
In conclusion, using pool-seq data and selective analyses, using the F ST , H p , and XP-CLR approaches, we identified nine genes and two genomic regions that experienced selection, including one that were have previously been reported to be associated with body size in mammals (PRDM16). Our study identified eight novel genes (ZNF382, ZNF461, ERGIC2, ENSCAFG00000033351, ALDH3A2, ENSCAFG00000011141, and ENSCAFG00000018533) that need further study. Our results are consistent with previous findings, and provide novel genes and genomic regions that potentially are associated with body size in dogs and other mammals.

DATA AVAILABILITY STATEMENT
Pool-seq data was submitted to NCBI Sequence Read Archive (accession number PRJNA636107).

ETHICS STATEMENT
The animal study was reviewed and approved by Animal Care and Use Committee of Shenyang Agricultural University. Written informed consent was obtained from the owners for the participation of their animals in this study.