Characterization of the Bronchoalveolar Lavage Fluid by Single Cell Gene Expression Analysis in Healthy Dogs: A Promising Technique

Single-cell mRNA-sequencing (scRNA-seq) is a technique which enables unbiased, high throughput and high-resolution transcriptomic analysis of the heterogeneity of cells within a population. This recent technique has been described in humans, mice and other species in various conditions to cluster cells in populations and identify new subpopulations, as well as to study the gene expression of cells in various tissues, conditions and origins. In dogs, a species for which markers of cell populations are often limiting, scRNA-seq presents with elevated yet untested potential for the study of tissue composition. As a proof of principle, we used scRNA-seq to identify cellular populations of the bronchoalveolar lavage fluid (BALF) in healthy dogs (n = 4). A total of 5,710 cells were obtained and analyzed by scRNA-seq. Fourteen distinct clusters of cells were identified, further identified as macrophages/monocytes (4 clusters), T cells (2 clusters) and B cells (1 cluster), neutrophils (1 cluster), mast cells (1 cluster), mature or immature dendritic cells (1 cluster each), ciliated or non-ciliated epithelial cells (1 cluster each) and cycling cells (1 cluster). We used for the first time in dogs the scRNA-seq to investigate cellular subpopulations of the BALF of dog. This study hence expands our knowledge on dog lung immune cell populations, paves the way for the investigation at single-cell level of lower respiratory diseases in dogs, and establishes that scRNA-seq is a powerful tool for the study of dog tissue composition.


INTRODUCTION
The cells can be considered as the fundamental unit in biology. They are working in concert to respond to stimuli in order to maintain health. However, until recently, they were only characterized and distinguished using microscopy-based methods, flow cytometry, or bulk RNA sequencing, all techniques that are quite limiting for demonstration of cell heterogeneity (1)(2)(3)(4). With the development of next-generation sequencing technologies, it is now possible to profile the transcriptome of each individual cell composing a sample. The single-cell mRNA sequencing (scRNA-seq) enables high throughput and high-resolution transcriptomic analysis of the cellular heterogeneity with an unbiased assessment of the cells as it gives the opportunity to identify cells without relying on previously known cell markers. It has become a powerful tool to identify cell subpopulations sharing similar transcriptome within a population, as well as to provide information related to cell fate, development, lineage, physiology, homeostasis and underlying molecular mechanisms (2,5,6). The use of this recent technique has been described in humans (7)(8)(9), mice (10) and other species (11,12) in various conditions and samples. In dogs, the use of this technique has not yet been reported so far.
Bronchoscopy and combined analysis of the bronchoalveolar lavage fluid (BALF) are largely used in the diagnosis of canine lower airway diseases either acute or chronic (13,14). In dogs, bronchoalveolar lavage is a well-tolerated procedure and few adverse effects are reported (13,14). Common analyses performed on BALF include determination of total (TCC) and differential cell counts (DCC) (including macrophages, neutrophils, lymphocytes, eosinophils, and mast cells count), cytological examination of cytospin preparations, bacterial cultures and detection of specific respiratory pathogens using quantitative polymerase chain reactions (13,14). Only few studies have characterized the lymphocyte populations in the canine BALF by flow cytometry (15)(16)(17)(18)(19) while the other cell types have not been studied. In depth examination of BALF cellular composition and subpopulations as well as the comparison of these cell subpopulations in healthy and diseased conditions could lead to the identification of new cell subsets involved in disease and could help to better understand the pathophysiology of lung diseases, the cell adaptations in disease context as well as to find new or more specific therapeutic targets (6,20).
In this study, we aimed to use the scRNA-seq technique in healthy client-owned dogs to analyze BALF cell subpopulations. Results will contribute to provide a base resource regarding cell subpopulations composing the BALF of healthy dogs which could be of great interest for further investigations of the BALF cell subpopulations in disease.

Dog Population
For the scRNA-seq analysis, BALFs were obtained from healthy dogs prospectively recruited at the veterinary clinic of the University of Liège (CVU, Liège, Belgium) between December 2017 and June 2018. All dogs were privately owned, and samples were obtained with owners' consent. The study was validated by the ethical committee of the University of Liège (approval no. 1435).
The healthy status of the dogs was confirmed by history, normal physical examination, blood work (plasma biochemistry and hematology), bronchoscopy and analysis of the BALF (including a macroscopic evaluation, a TCC and a DCC). Dogs from various breed and age were chosen to better represent the diversity of the canine population.

Samples Collection
BALFs were obtained under anesthesia with butorphanol at 0.2 mg/kg (Butomidor R , Richter Pharma AG, Wels, Austria) as premedication and propofol (Diprivan R , Asen Pharma Trading Limited, Dublin, Irland) infusion on demand.
Dogs were not intubated for the procedure. A bronchoscope (FUJINON© Pediatric Video-Bronchoscope EB-530S), cleaned and disinfected using the washer-disinfector Serie 4 (Soluscope R , Aubagne, France), was inserted into the bronchi until the extremity was wedged. Three to four mL/kg of a sterile NaCl 0.9% solution divided into 3 aliquots were instilled through the endoscope channel into the lung (2 aliquots were obtained in the right diaphragmatic lobe and one in the left diaphragmatic lobe) and directly reabsorbed by gentle suction into the same sterile recipient. About 1 mL of BALF was kept for total and differential cell count calculation performed using, respectively, a hemocytometer and a cytospin preparation (centrifugation at 221 g, for 4 min at 20 • C, Thermo Shandon Cytospin©4), by counting a total of 200 cells at high power field. The rest of the BALF was then transferred within 15 to 20 min following collection on ice to the GIGA laboratory of Cellular and Molecular Immunology.

BALF Samples Preparation
BALFs were filtered to remove mucus and total cell count was assessed using a hemocytometer and Türk coloration (Supplementary Table 1). BALFs were then centrifuged at 400 g for 7 min and the pellet resuspended in phosphate-buffered saline solution (Gibco TM 1x DPBS, Cat.14190-169) to obtain a cell concentration around 1,000 cells/µL. A second filtration through a cell strainer (BD Falcon TM , Biosciences, USA, Cat.352350) was performed to remove any remaining cell debris and large clumps and cells were again counted with Trypan blue staining to assess cell viability considered as acceptable above 70% (Supplementary Table 1). The volume of the cell suspension was then adjusted to obtain a final cell concentration between 500 and 1,000 cells/µL suspended in phosphate-buffered saline solution containing 0.04% (w/v) bovine serum albumin (Supplementary Table 1).
For each sample, ∼3,500 cells (Supplementary Table 1) were loaded into the Chromium TM Controller (10x Genomics, Pleasanton, CA, USA) ∼30 min after the first filtration and were then partitioned into nanoliter scale vesicles containing 10x barcoded beads from Chromium TM Single Cell 3 ′ Gel Bead Kit v2 (10x Genomics, Pleasanton, CA, USA) according to manufacturer's instructions. The following steps take place in the vesicles containing cell: [1] cell lysis, [2] capture of polyadenylated mRNAs oligonucleotides containing cell specific 16 bp barcode and 10 bp Unique Molecular Identifier (UMI) and [3] reverse transcription of mRNAs into cell specific barcoded cDNAs on a Veriti© 96-Well Thermal Cycler (ThermoFisher Scientific, Merelbeke, Belgium).

Single-Cell Library Preparation and Sequencing
Emulsion breakage, cDNA amplification and libraries construction were performed using Chromium TM Single Cell 3 ′ Reagent kit v2 (10x Genomics, Pleasanton, CA, USA) according to manufacturer's instructions as already described (21). Briefly, cDNAs obtained were amplified in a Veriti© 96-Well Thermal Cycler. Amplified cDNA products were cleaned up, quality controlled and quantified. Illumina's P5, P7, and Read2 primers, as well as Sample Index were then added to generate sequencing libraries. The barcoded sequencing libraries were also quality controlled and quantified by quantitative PCR (KAPA Biosystems Library Quantification Kit for Illumina platforms). Sequencing libraries were loaded on an Illumina NextSeq500. The sequencing depth was set at 50,000 reads per cell, taking into account that ∼2,000 cells should be captured (55-60% efficiency). Cell Ranger software (v1.2.0) (10x Genomics, Pleasanton, CA, USA) was used to demultiplex Illumina BCL files to FASTQ files (cellranger mkfastq), to perform alignment to dog genome (CanFam3.1, GenBank assembly accession: GCA_000002285.2), filtering, UMIs counting and to produce gene-barcode matrices (cellranger count).

Data Analysis and Visualization
Analyses were performed using R package Seurat (version 3.1.2) (22). Briefly, we have first selected cells with a minimum of 100 and a maximum of 2,500 unique mapped genes to exclude lowquality cells or empty droplets and cell doublets or multiplets, respectively. Only genes present in at least 3 different cells were kept. Expression values were normalized to 10,000 transcripts per cell and the "FindVariableFeatures" function was used to identify the top 2,000 variable genes in each BALF sample. "FindIntegrationAnchors" and "IntegrateData" functions were used to combine the data of all BALF specimens, while minimizing batch effects. Next, a linear transformation using the "ScaleData" function was applied so that highly-expressed genes do not dominate. A principal component analysis (PCA) was performed on the scaled data using the command "RunPCA." The statistically principal components taken into account for the next analysis were identified using the "PCElbowPlot" and the "DimHeatmap" functions and were set to 1:30. A Knearest neighbor graph, based on the Euclidean distance in PCA space and the Jaccard similarity index, was obtained using the "FindNeighbors" function. Cells were then clustered with the "FindClusters" command based on the Louvain algorithm. Several cluster resolutions were tested, and the resolution of 0.3 was chosen, since higher resolutions created additional subdivisions of non-well-defined clusters or clusters containing singlets, which were considered not biologically relevant. The data were visualized by a non-linear dimensional reduction, the t-distributed stochastic neighbor embedding (t-SNE) plots, using the "RunTSNE" function, with the number of dimensions to use set to 30 (PC 1:30).
Cell types within each cluster were characterized based on the identification of differentially expressed genes (DEGs) specific for each cluster compared to all others. The "FindMarkers" function was used to identify DEGs across clusters. Clusters with the same identified cell type were also further characterized by comparing DEGs between each other. The differential expression was measured using non-parametric Wilcoxon rank sum tests adjusted for multiple testing with Bonferroni correction. Only DEGs with an adjusted P < 0.05 were retained. The genes not well-annotated were further blasted on the Ensembl genome browser (v99.31) (23) for dog species to increase the annotation rate. Specific cell markers average expression and percentage of cells expressing the indicated genes within clusters were visualized with the "DotPlot" function. Alternatively, the "FeaturePlot" function was used to show specific gene expression within single cells.
The different common biological processes between clusters with the same identified cell type were also assessed using the gene set enrichment analysis (GSEA) using the online GSEA-P software (24). GSEA was carried out by computing overlaps between significantly enriched genes calculated between clusters with the same identified cell type and gene ontology (GO) biological process gene sets using hypergeometric tests with Benjamini Hochberg correction for multiple testing (P-value adjusted). Only the 10 first gene sets that best overlapped with our gene set were retained.

Statistical Analysis
Single-cell mRNA sequencing data from the 4 samples were pooled for all analysis. A P-value lower than 0.05 was considered as significant. Details about statistical analysis for the scRNA-seq data and the gene set enrichment analysis can be found in the "Data analysis and visualization" section above.

Dogs Population Characteristics
Four healthy client-owned dogs were recruited for the bronchoalveolar lavage procedure. The cohort was exclusively composed by adult females including one 4-year-old French bulldog, one 6-year-old Australian shepherd, one 9-yearold West Highland white terrier and one 11-year-old Yorkshire terrier.

BALF Cells Analysis
Information about TCC and DCC for each BALF can be found in the Table 1.

Single-Cell RNA Sequencing
The transcriptomic profile from a total of 5,710 cells was obtained from each of the four BALF specimens using 10x Genomics based scRNA-seq analysis. Cells had a mean read depth of ∼54,000 reads per cell. Summary of sequencing and mapping quality control metrics for each BALF sample is presented in Table 2. The distribution of transcripts and genes counts can be found in the Supplementary Figures 1A,B.
Cells from all dogs were compiled after identification of anchors using Seurat. The clustering in Seurat allowed the detection of 14 well-defined clusters ( Figure 1A). The contribution of each individual sample in the compiled t-SNE figure is displayed in Figures 1B,C. Cells coming from each of the four BALF specimens were present in all identified clusters except for the cluster 11 which did not contain cells from BALF 1 ( Figure 1C). The average expression of all transcripts detected by clusters is provide in the Supplementary Table 2.
The cell identity of each cluster was determined based on the DEGs in each cluster compared to all others. All DEGs are reported in the Supplementary Table 3. In each cluster, a selection of the most overexpressed transcripts able to differentiate cell types according to the literature is displayed in Table 3. Cells of clusters 0, 3, 5, and 8 expressing MARCO and/or MSR1 and/or HLA-DRB1 and/or CD163 and/or CD86 and/or MRC1 and/or CD68 and/or CD63 were identified as macrophages/monocytes (25)(26)(27)(28)(29)(30)(31)(32). Cells of cluster 1 and 2 expressing CD3 markers were identified as T lymphocytes (28,33). Cells of clusters 4 and 12 expressing TFF1, TFF3 and KRT19 or just KRT19, respectively, were identified as epithelial cells (34,35). Cells of clusters 7 and 13 expressing CD83 and either CD1E or CCR7, respectively, were identified as dendritic cells (DC). Finally, cells of cluster 6 expressing CD62L and ITGAM were identified as neutrophils (36), cells of cluster 9 expressing PCLAF, TOP2A and Ki-67 as cycling cells (8), cells of cluster 10 expressing BCR, FCRLA and CD19 as B lymphocytes (37)(38)(39) and finally, cells of cluster 11 expressing MS4A2, FCER1G, KIT and CD63 as mast cells (40) ( Table 3 and Figure 2). The proportions of the different identified cell types in the global dataset corresponded to 50.4% of macrophages/monocytes, 28.9% of lymphocytes B and T, 9.5% of epithelial cells, 4.1% of neutrophils, 3.9% of DC 2.2% of cycling cells and 1.0% of mast cells. Of note, we were not able to identify eosinophils, cells known to be present in BALF (13). DEGs and biological processes were further compared between clusters sharing the same cell identity, namely macrophages/monocytes, T lymphocytes, epithelial cells and DC, to better characterize each cluster.
The graph-based clustering of merged single-cells identified four transcriptionally distinct clusters of macrophages/monocytes. In this study, MARCO, a class A scavenger receptor involved in host defense and demonstrated to be highly expressed in embryonic-derived or alveolar macrophages (AMs) and not expressed in monocyte-derived macrophages was used to identify AMs (25,27). MARCO was overexpressed in the clusters 0, 3, and 5 compared to all remaining clusters (Figure 2 and Supplementary Table 3).
The first cluster of AMs (cluster 0) represented the majority of the macrophages/monocytes cells and showed a unique transcriptional signature including upregulation of transcripts coding for cell surface markers such as MHC-II molecules (e.g., DLA-DQA1, DLA-DRA, DLA-DMA), CD63, the Fc fragment of IgG receptor IIIa (FCGR3A, encoding CD16), the selectin L (SELL, encoding CD62L), the CD36 molecule, the CD68 molecule and the lysosomal associated membrane protein 2 (LAMP2) (Supplementary Table 4). Other most upregulated transcripts (average log2 fold change (avg_logFC) > 0.5, P < 0.05) included the apolipoprotein E (APOE) known as an anti-inflammatory, anti-proliferative and immunemodulatory protein (41) and transcripts involved in the immune response such as for example the bactericidal permeability increasing protein (BPI) and the complement C1q A chain (C1QA) (Figure 3 and Supplementary Table 4). The principal biological functions exerted by cells in cluster 0 are reported in Table 4.
Compared with macrophages composing clusters 0, 5 and 8, the AMs composing cluster 3 overexpressed (avg_logFC > 0.5, P < 0.05) transcripts encoding cell surface markers such Data were generated after passing quality control including the exclusion of cells with <100 and >2,500 genes. Reads mapped confidently to genome are the number of reads that mapped only to the genome. Reads mapped confidently to transcriptome are the fraction of the reads mapped to a unique gene in the transcriptome and are considered for UMI counting. Median genes per cell correspond to the median number of genes with at least one UMI count. Total genes detected is the detected number of genes with at least one UMI count in any cell. BALF 1, female Yorkshire terrier of 11-year-old; BALF 2, female French bulldog of 4-year-old; BALF 3, female West Highland white terrier of 9-year-old; BALF 4, female Australian shepherd of 6-year-old; UMI, unique molecular identifier.
Frontiers in Immunology | www.frontiersin.org as the macrophage mannose receptor (MRC1, encoding CD206), the integrin subunit alpha 5 (ITGA5), the scavenger receptor CD163, the CD80 molecule and the CD83 molecule (Supplementary Table 4). The cells in cluster 3 also largely overexpressed transcripts (avg_logFC > 0.5, P < 0.05) encoding cytokines, including the interleukin 18 (IL18), the C-C motif chemokine ligand 4 and 5 (CCL4 and CCL5) and the interleukin 10 (IL10) (Figure 3 and Supplementary Table 4). Such combination of pro-inflammatory and immunoregulatory cytokines is consistent with the enriched functional properties of the cells in cluster 3 which include regulation of the immune response and cell activation ( Table 4). AMs in cluster 5, in comparison with cells from clusters 0, 3 and 8 also overexpressed transcripts encoding cell surface markers (avg_logFC > 0.5, P < 0.05), including the CD9 molecule, the CD5 molecule like (CD5L), CD68, the carcinoembryonic antigen related cell adhesion molecule 5 (CEACAM5) and the CD300C molecule (Supplementary Table 4). The principal functions of AMs composing cluster 5 were quite similar to those associated with AMs in cluster 0. However, those cells seemed to be more involved in cellular homeostasis ( Table 4), mostly metal ion homeostasis. Indeed, the most enriched transcripts in the cluster 5 were the metallothionein 1X (MT1X) and the metallothionein 2A (MT2A) which encode anti-oxidant proteins that are important in the homeostasis of metal in the cell, and in the detoxification of heavy metals (Figure 3 and Supplementary Table 4) (42,43).
The cells in cluster 8 compared with other clusters did not overexpress the transcript encoding MARCO and were not considered as AMs. Overexpressed transcripts coding for surface markers in cluster 8 compared with clusters 0, 3, and 5 included MHC-II (DLA-DQA1) and MHC-I (DLA-88) molecules, the tumor necrosis factor superfamily member 13b (TNFSF13B), the colony stimulating factor 2 receptor subunit beta (CSF2RB), the integrin subunit alpha X (ITGAX) and the CD1e molecule (Supplementary Table 4). The cells in cluster 8 were characterized by the overexpression (avg_logFC > 0.5, P < 0.05) of transcripts encoding cytokines including the interleukin 1 receptor antagonist (IL1RN), the C-C motif chemokine ligand 23 (CCL23), the C-C motif chemokine ligand 2 (CCL2) and the C-X-C motif chemokine ligand 10 (CXCL10) (Figure 3 and  Supplementary Table 4). The high level of cytokine transcripts in cluster 8 is consistent with the enrichment for processes related to the inflammatory response, the defense response and the response to cytokines ( Table 4). By looking at the DEGs and the GSEA between cluster 1 and cluster 2 corresponding each to T cells (Supplementary Table 5 and Table 5), we were able to characterize cells in cluster 1 as cytotoxic or CD8 + T cells. Indeed, the transcripts encoding granzyme B, K, and A (GZMB, GZMK, and GZMA, respectively) were overexpressed with an avg_logFC > 1 in cluster 1 compared to cluster 2 (Supplementary Table 5). Those genes are expressed by cytotoxic T lymphocytes and natural killer cells  (44,45). Other transcripts with an avg_logFC > 1 in cluster 1 compared to cluster 2 included the killer cell lectin like receptor D1 and K1 (KLRD1 and KLRK1, respectively) also expressed primarily in natural killer cells and CD8 + T cells (45,46). Finally, the CD8b molecule was also overexpressed in cluster 1 (Supplementary Table 5).
When comparing cluster 2 to cluster 1, enriched biological processes were more in favor of CD4 + T cells, as reported in Table 5. Although classical surface marker of this cell type was not expressed by the cells in our dataset (e.g., CD4), the cells in cluster 2 overexpressed transcripts encoding for the interleukin 7 receptor (IL7R) and the CD40 ligand (Cd40L) commonly found in CD4 + T cells (47,48). Principal overexpressed transcripts in cluster 2 included ICOS (inducible T cell costimulatory) an important costimulatory factor expressed in activated T cells (49)(50)(51), PLAC8 (placenta associated 8), GATA3 (GATA binding protein 3) and ANXA1 (annexin A1) (Supplementary Table 5). Those transcripts are associated with the activation of CD4 + T cells, T cell differentiation in CD4 + T cells and immune response modulation (52)(53)(54)(55), which is coherent with the functions of CD4 + T cells as reported in Table 5.
Diverse epithelial populations were captured and corresponded to clusters 4 and 12. Cluster 12 was identified as composed by ciliated epithelial cells as it included functions such as cilium movement, microtubule-based process and movement, cytoskeleton and cell projection organization ( Table 6). Indeed, the most enriched transcript in cluster 12 compared to cluster 4 included notably the transcripts encoding SNTN (sentan, cilia apical structure protein), DPCD [deleted in primary ciliary dyskinesia homolog (mouse)], ROPN1L (rhophilin associated tail protein 1 like), SPA17 (sperm autoantigenic protein 17) and WDR78 (WD repeat domain 78) for example (Supplementary Table 6 Table 7) and their Gene set enrichment analysis carried on by computing overlaps between significantly upregulated genes (P < 0.05, avg_logFC > 0.25) and the gene ontology biological process gene set. "Genes in Gene Set" refers to the number of genes in the gene set, "Positive DEGs included" corresponds to the number of positive differentially expressed genes in the cluster of interest compared to the others and "Genes in Overlap" to the number of genes upregulated in the cluster and involved in the biological process. Avg_logFC, average log2 fold change.
major functions concerned the activation of immune cells and the defense response.
Overexpressed surface marker transcripts identified in cells of cluster 13 compared to cluster 7 (avg_logFC > 1) included the C-C motif chemokine receptor 7 and the C-X-C motif chemokine receptor 4 (CCR7 and CXCR4, respectively), the CD83 molecule and the programmed cell death 1 ligand 2 (PDCD1LG2) (Supplementary Table 7). The major functions of cells in cluster 13 concerned mostly the regulation of the activation of immune cells ( Table 7). Because of their overexpression of CD83 and CCR7, we considered that DC of cluster 13 correspond to mature DC (56,57). Gene set enrichment analysis carried on by computing overlaps between significantly upregulated genes (P < 0.05, avg_logFC > 0.25) and the gene ontology biological process gene set. "Genes in Gene Set" refers to the number of genes in the gene set, "Positive DEGs included" corresponds to the number of positive differentially expressed genes in the cluster of interest compared to the other and "Genes in Overlap" to the number of genes upregulated in the cluster and involved in the biological process. Avg_logFC, average log2 fold change. Gene set enrichment analysis carried on by computing overlaps between significantly upregulated genes (P < 0.05, avg_logFC > 0.25) and the gene ontology biological process gene set. No overlap was found between the gene set and upregulated genes of the cluster 4 compared to the cluster 12. "Genes in Gene Set" refers to the number of genes in the gene set, "Positive DEGs included" corresponds to the number of positive differentially expressed genes in the cluster of interest compared to the other and "Genes in Overlap" to the number of genes upregulated in the cluster and involved in the biological process. Avg_logFC, average log2 fold change.

DISCUSSION
In this paper, we report for the first-time a comprehensive singlecell expression profiling of the canine BALF cells in healthy condition. We were able to cluster cells in 14 distinct subsets identified as macrophages/monocytes, CD8 + T cells, CD4 + T cells, epithelial cells, ciliated epithelial cells, mature DC and DC, neutrophils, B cells, mast cells and cycling cells.
Until recently, cells of the dog BALF were only characterized by microscopic evaluation or, in rare cases, by flow cytometry. The cell populations identified by these techniques included macrophages, CD4 + and CD8 + lymphocytes, neutrophils, eosinophils, mast cells and epithelial cells (13)(14)(15)(16)(17)(18)(19)58). With the use of the scRNA-seq, we highlighted the presence of 14 subpopulations of cells using an unbiased technique. We were able to characterized the cells composing those subpopulations Gene set enrichment analysis carried on by computing overlaps between significantly upregulated genes (P < 0.05, avg_logFC > 0.25) and the gene ontology biological process gene set. "Genes in Gene Set" refers to the number of genes in the gene set, "Positive DEGs included" corresponds to the number of positive differentially expressed genes in the cluster of interest compared to the other and "Genes in Overlap" to the number of genes upregulated in the cluster and involved in the biological process. Avg_logFC, average log2 fold change.
in depth and to deduce their main functions based on their transcriptome. In addition to offer a way to overcome the lack of qualitative reagents designed for flow cytometry in dogs, the scRNA-seq allows a better characterization of cell heterogeneity without prior knowledge by highlighting, in better agreement with pulmonary physiology, all cell types and cell functions. Indeed, the scRNA-seq provides comprehensive profiles of cells without limitations due to pre-selected cells by probing a few selected markers (5,6,20,59,60). Four subpopulations of macrophages/monocytes were found. Among them, 3 subpopulations corresponded to AMs based on their expression of MARCO (25,27). AMs are the most abundant cells found in the airways in homeostatic conditions. They are self-maintaining with minimal contributions from circulating monocytes in healthy conditions (25,61,62). The first subpopulation of AMs, representing the major cell subpopulation, exerted functions involved in immune defense and response. The second was enriched in a combination of pro and anti-inflammatory cytokines transcripts and exerted functions involved in the regulation of the immune response. Finally, the third population had similar functions as the first with more implications in the homeostasis and the detoxification of metal ions. The last subpopulation was not considered as AMs and could correspond to monocytesderived macrophages or monocytes. Indeed, the cluster was the smallest and expressed macrophages markers but not MARCO.
We found a large population of non-ciliated cells and a small population of ciliated cells corresponding to tracheobronchial epithelial cells.
T lymphocytes were subdivided into 2 subpopulations identified as CD8 + and CD4 + with a majority of CD8 + T cells which was already reported in healthy dogs particularly in aged animals (16,17). Indeed, cells in cluster 1 expressed the CD8b molecule. However, cells in cluster 2 did not express neither CD8 nor CD4 while they overexpressed markers associated with classical CD4 + T cells such as GATA3, IL7R and the CD40 ligand (47,48,53,54). The absence of CD4 mRNA expression could be due to weakness or absence of transcription of this protein (60). Indeed, in dogs, a population of CD8 − CD4 − T cell has been described representing ∼15% of the TCRαβ + T cells in the lung, also expressing GATA3 (63). Cells from cluster 2 cells possibly belong to this population. B cells were identified using BCR, FCRLA and CD19 markers (37)(38)(39). CD19 has only recently been described as B cell marker in dogs (39) which highlights the benefice of the scRNA-seq for the identification of new surface markers to better isolate different cell types (5,6). Common B cell surface markers used and described in dogs include CD21 and CD79A (64). In this study, CD21 mRNA was not detected which could be due to absent or weak transcription of this protein (60). CD79A was expressed in B cells but its expression was low and it was not significantly differentially expressed in the cluster 10 compared to others.
The identified granulocytic populations included neutrophils and mast cells. Basophils and mast cells shared common markers including MS4A2, KIT and FCER1G used in our study (40). However, overexpressed DEGs in cluster 11 cells also included chymase (encoding CMA1) and tryptase enzymes (encoding PPSAB1 and TSP2) which are almost entirely mast cell-specific (40,65,66). The cells in cluster 11 also expressed CD63 which is considered as one of the most useful markers of mast cell and basophil activation (40,66). We did not identify mast cells in BALF 1 which is probably due to the small proportion of that cell type into BALF samples (13). Indeed, it represents only 1.0% of the total cells recovered in this analysis and it is possible that rare cell populations may not be properly captured with the 10X Genomics Chromium system (60). We were not able to identify a cluster of eosinophils, which are normally present in dog BALF specimens (13). Although the number of eosinophils found in the BALF of healthy dogs is rather low (13) and may not be properly captured, their total absence from our dataset is most likely related to their high content of RNase (67) inducing the rapid degradation of mRNA, thus preventing their detection by scRNA-seq.
Finally, two subpopulations of DC were also found one being identified as mature DC because of its higher expression of CD83 and CCR7 (56,57).
The use of the scRNA-seq in dogs has some limitations. The principal impediment to apply scRNA-seq to canine samples is the necessity to map sequenced RNAs on a sufficiently wellannotated database to be able to identify genes. The percentage of reads mapped confidently to the transcriptome in this study was considered as low (∼28%) as it is expected to be > 30% 1 . This can be due to a poor annotation of the reference transcriptome (overlapping genes for example), but could also be related to a poor library, sequencing or reads quality 1 . However, despite this suboptimal mapping, we were able to identify cell clusters and deduce clusters principal functions based on the cell transcriptome obtained. Another limitation is the lack of information for the identification of specific cell markers in dogs. For example, in the 4 clusters identified as macrophages/monocytes, AMs were recognized only by their expression of MARCO. In the literature in human and mouse, the expression of SIGLECF, MERTK, CD14, CCR2, and Ly-6c are commonly used to distinguish AMs from monocytes and monocyte-derived macrophages (8,21,26,27,68,69). However, those markers were not detected in our dataset. The use of the 10X Genomics Chromium system although being unbiased, time saving and allowing high throughput and high-resolution transcriptomic analysis, also implies that rare cell populations may not be properly captured and that sensitivity is reduced decreasing the detection of weakly expressed genes (60). It is possible that common markers used to identify different cell types are only weakly expressed making cell populations difficult to identify. Finally, a limited number of dogs was used in the study. Indeed, as the use of the scRNA-seq is quite expensive, only 4 BALF samples were analyzed with a relatively low 1 https://support.10xgenomics.com/single-cell-gene-expression/software/ pipelines/latest/troubleshooting#alerts (accessed March 16, 2020). median number of cells and reads per sample (∼1,300 cells and ∼54,000 reads, respectively). The 4 selected dogs included young to old adult dogs from 4 breeds differing in size and body conformation, in order to be, as much as possible, representative of the whole healthy canine population, even if no males were sampled. However, we don't expect the sex to alter BALF cells transcriptome. While it has been shown that the age could alter the cell proportions in the BALF from healthy dogs (70), we are not aware of studies assessing its effect on BALF cells transcriptome either. To our knowledge, no study has specifically investigated the effect of the sex and the breed on canine BALF cells proportions and transcriptome. Besides, in the present study, cells coming from each of the four BALF specimens were present in nearly all identified clusters indicating that similar cell populations were present in all dogs.

CONCLUSION
ScRNA-seq is a new technique which enables unbiased, high throughput and high-resolution transcriptomic analysis and which can be used to identify cell populations in the BALF of healthy dogs. In this study, we provide a comprehensive single-cell transcriptome tool. It represents a highly informative dataset for the identification and subsequent interpretation of cell populations and molecular signatures alterations in lung diseases in dogs.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ebi.ac.uk/ arrayexpress/, E-MTAB-9265.

ETHICS STATEMENT
The animal study was reviewed and approved by Ethical committee of the University of Liège, Liège, Belgium. Written informed consent was obtained from the owners for the participation of their animals in this study.

AUTHOR CONTRIBUTIONS
AF, CC, and FB designed the study. AF, DP, LF, and CC conducted experiments and acquired the data. AF and DP analyzed the data. FB, CD, and TM provided their expertise in lung immune cells. AF wrote the manuscript. All authors interpreted the results and reviewed and approved the final version of the manuscript.

FUNDING
This work was supported by a grant from the Fonds Spéciaux de la Recherche from the University of Liège.