Single-Cell Transcriptome Profiling of Immune Cell Repertoire of the Atlantic Cod Which Naturally Lacks the Major Histocompatibility Class II System

The Atlantic cod’s unusual immune system, entirely lacking the Major Histocompatibility class II pathway, has prompted intriguing questions about what mechanisms are used to combat bacterial infections and how immunological memory is generated. By single-cell RNA sequencing we here report an in-depth characterisation of cell types found in immune tissues, the spleen and peripheral blood leukocytes of Atlantic cod. Unbiased transcriptional clustering revealed eleven distinct immune cell signatures. Resolution at the single cell level enabled characterisation of the major cell subsets including the cytotoxic T cells, B cells, erythrocytes, thrombocytes, neutrophils, and macrophages. Additionally, to our knowledge we are the first to uncover cell subsets in Atlantic cod which may represent dendritic cells, natural killer-like cells, and a population of cytotoxic cells expressing GATA-3, a master transcription factor of T helper 2 cells. We further identify putative gene markers for each cluster and describe the relative proportions of each cell type in the spleen and peripheral blood leukocytes. Of the major haematopoietic cell populations, the lymphocytes make up 55 and 68% of the spleen and peripheral blood leukocytes respectively, while the myeloid cells make up 45 and 32%. By single-cell analysis, this study provides the most detailed molecular and cellular characterisation of the immune system of the Atlantic cod so far.


Supplementary figure 2.
Cumulative distribution of reads used to identify STAMPs in a pool of amplified beads. Each number, 1-11, represents the sample identity. Drop-seq involves the generation of single-cell RNA profiles by adding a single cell to a single droplet containing a single bead. The large majority of amplified beads are not encapsulated with a cell and so are not exposed to a single cell's RNA, only the ambient RNA present in solution following droplet breakage. To identify the cell barcodes which do correspond to STAMPs, cell barcodes from each sample used in the experiment are arranged in decreasing number of reads, and the cumulative fraction of reads is plotted. The predicted number of individual cells calculated from the counted and aliquoted beads for each sample was estimated during the laboratory procedure. However, there is no clear inflection point at these estimated numbers. To be sure to include all salient information it was decided to include the first 600-3000 cell barcodes (shown by the black line), to be further filtered at later steps.

Supplementary figure 3. Feature scatter (top) and violin plots (bottom)
showing the 11 samples before (left) and after (right) quality control cut offs are applied. Cells with a gene count (nFeature_RNA) of fewer than 150 and more than 1500, cells with a total number of molecules of more than 4000 (nCount_RNA) were filtered away in order to remove low-quality cells and possible cell multiplets.

Supplementary figure 4.
Elbow plot showing a ranking of principle components based upon the percentage of variance explained by each. Here we observe an 'elbow' around PC25-30, suggesting that the majority of true signal is captured in the first 30 PCs.

Supplementary table 1.
Overview of sample origin and sequencing results. Fresh Atlantic cod blood and spleen samples were taken from two Atlantic cod and were each flow-sorted into four populations based on forward scatter and side scatter gating: B1-B4 for the blood, and S1-S4 for the spleen. The populations of interest (B1, containing lymphocytes, and S3, containing myeloid cells) were sent for sequencing. Peripheral blood leukocytes (PBL) were obtained by centrifugation and also flow-sorted and the populations P1 (containing mostly lymphocytes), P2 (containing mostly lymphocytes and thrombocytes) and P3 (containing mostly myeloid cells).were sent for sequencing. An average of 61% of sequencing reads were mapped to the Atlantic cod genome (gadmor3). Following drop-seq computational protocol pipeline and the Seurat package filtering steps gene data from 8180 cells are included, with an average of 409 genes expressed per cell. *Number of transcripts per cell given are after accounting for PCR amplification **Averages are weighted by number of cells per sample.

Supplementary figure 5.
Cell types from unsorted blood and spleen visualised in two dimensions using UMAP. Cells are from pilot study using wild-caught Atlantic cod in Oslo Fjord. Putative cell cluster labels are based upon the differential gene expression of known markers in mammals. The same parameters used in the main study have been applied: cells with a gene count of fewer than 150 or a gene count of more than 1500, and cells with a total number of molecules of more than 4000 were filtered away in order to remove low-quality cells and possible cell multiplets. Genes expressed in less than 5 cells were excluded. 30 principal components were included and the resolution for clustering was 0.35. The T cells and erythrocyte cell clusters are split into two subclusters at this resolution due to batch effects.