Cost-Effective Mapping of Genetic Interactions in Mammalian Cells

Comprehensive maps of genetic interactions in mammalian cells are daunting to construct because of the large number of potential interactions, ~ 2 × 108 for protein coding genes. We previously used co-inheritance of distant genes from published radiation hybrid (RH) datasets to identify genetic interactions. However, it was necessary to combine six legacy datasets from four species to obtain adequate statistical power. Mapping resolution was also limited by the low density PCR genotyping. Here, we employ shallow sequencing of nascent human RH clones as an economical approach to constructing interaction maps. In this initial study, 15 clones were analyzed, enabling construction of a network with 225 genes and 2,359 interactions (FDR < 0.05). Despite its small size, the network showed significant overlap with the previous RH network and with a protein-protein interaction network. Consumables were ≲$50 per clone, showing that affordable, high quality genetic interaction maps are feasible in mammalian cells.


INTRODUCTION
Intelligent intervention in normal and diseased mammalian cells requires a comprehensive map of their biological networks. Protein-protein interactions (PPIs) have been identified using a variety of technologies, including yeast two hybrid assays and immunoprecipitation-mass spectrometry (Yugandhar et al., 2019;Luck et al., 2020). Although these approaches are resource intensive, most human PPIs have been evaluated using large experimental efforts and cataloged in publicly available databases (Bajpai et al., 2020;Luck et al., 2020).
Because of the cost, a survey of all PPIs in various cell types is not feasible. Further, PPIs do not provide information on the cellular consequences of the relevant interactions. For example, proteins may have no physical interaction, even though their genes show strong interactions.
Genetic interactions have been evaluated in a wide variety of organisms, ranging from bacteria to Drosophila (Costanzo et al., 2019). The most thorough catalog is for the yeast Saccharomyces cerevisiae (Costanzo et al., 2016). This network has provided surprising new information on the connections between cellular pathways. Data in other organisms is far less complete. Assuming 20,000 protein coding genes in mammals, the number of potential interactions is ∼ 2 × 10 8 . The addition of non-coding genes trebles the number of genes, bringing the number of possible interactions to ∼ 1.8 × 10 9 .
Multiple opportunities for therapeutic intervention in cancer will emerge from genetic interaction maps . In particular, geneticists are beginning to use CRISPR-Cas9 genetic editing technology to study the viability of cancer cells with double loss-of-function mutations in trans. One recent study employed CRISPR interference (CRISPRi) to identify genetic interactions among 222,784 gene pairs in two cancer cell lines (Horlbeck et al., 2018). However, even this effort only evaluated ∼ 0.1% of all possible coding gene pairs.
Overexpression alters cell physiology differently to loss-offunction, and is a complementary strategy to understanding cellular networks in both yeast and mammalian cells (Sopko et al., 2006;Prelich, 2012;Khan et al., 2020). One recent approach to obtaining targeted increases in gene expression is CRISPR activation (CRISPRa), but this method causes constitutive and non-physiological overexpression (Gilbert et al., 2014;Kampmann, 2018).
Radiation hybrid (RH) mapping has been widely used to construct genetic maps for the genome projects (Goss and Harris, 1975;Cox et al., 1990;Walter et al., 1994;McCarthy, 1996;McCarthy et al., 2000;Avner et al., 2001;Hudson et al., 2001;Olivier et al., 2001;Kwitek et al., 2004). In this technique, lethal doses of radiation are used to randomly fragment the genome of a human cell line ( Figure 1A) (Goss and Harris, 1975). RH clones are then created by transferring a sample of the DNA fragments to living hamster cells using cell fusion. Linked markers are likely to reside on the same fragment, and hence FIGURE 1 | Creating RH interaction networks. (A) RH clones are made by lethally irradiating human cells (HEK293). The human cells are fused to living hamster cells (A23) to rescue the DNA fragments, and RH clones selected using TK1. Conventional RH mapping exploits the fact that nearby makers tend to be found on the same DNA fragment and thus co-inherited (green triangles). RH interaction networks seek co-inheritance of distant markers (green and pink triangles). (B) Human DNA copy number, clone 8, Chromosome 8. One of the human fragments encompasses the centromere. co-inherited in the RH clones. Because of the small size of the DNA fragments, genotyping a panel of RH clones allows high resolution mapping.
We previously used publicly available RH data to construct a genetic interaction map for mammalian cells (Lin et al., 2010;Lin and Smith, 2015). We reasoned that significant co-inheritance of marker pairs separated from each other in the human genome would disclose genetic interactions promoting cell survival. Rather than loss-of-function, this interaction network depends on extra gene copies. Moreover, the genes are expressed using their natural promoters instead of constitutively.
We increased the statistical power of the RH network by combining PCR genotyping datasets from six RH panels: three human, one mouse, one rat, and one dog. The network consisted of 18,324 genes linked by 7,248,479 interactions. The overwhelming majority of interactions consisted of higher than expected co-retention of the gene pairs. Despite its statistical power, the quality of the network suffered because of the need to combine datasets from different species. In addition, the mapping resolution was limited by the legacy PCR genotyping.
In this study, we demonstrate the cost-effectiveness of the RH approach in creating genetic interaction maps for the mammalian genome. We used low pass DNA sequencing to genotype emergent human RH clones, decreasing labor and cell culture costs, while also improving mapping resolution. Expanding this strategy will render construction of whole genome interaction networks feasible.

Human DNA Retention in the RH Clones
We exploited the sensitivity of modern DNA technologies to analyze RH clones upon appearance, avoiding the customary need for additional growth. Our strategy cuts costs and saves time. Fifteen independent RH clones were evaluated using low pass sequencing at a depth of 0.31 ± 0.03 times the human genome (14.5 ± 1.4 M single reads of 65 bp). Reads were aligned to the human genome and those that also aligned to hamster were discarded. Only human-specific reads remained for subsequent analyses using 1 Mb windows and 10 kb steps (Khan et al., 2020).
The human DNA fragment length was 2.3 ± 0.1 Mb and the retention frequency was 0.25 ± 0.08 (Figures 1B-D,  Supplementary Figures 1A-D). Due to the small number of clones, 3% of the human genome had zero retention. Clone 2 had the highest retention (0.97) and harbored a nearly complete copy of the human genome, plus additional fragments (Supplementary Figure 2). The human genome was represented with 3.8-fold ± 1.3 redundancy in the 15 clones.
Centromeres showed increased retention, since they stabilize the donated human DNA fragments (Wang et al., 2011;Khan et al., 2020) (Figures 1B-D, Supplementary Figure 1E). Fragments containing centromeres were also significantly longer than other fragments, due to the large size of these chromosomal elements (Supplementary Figure 1F). The human TK1 gene was used as the marker to select hybrids, and its copy number was therefore 1.

Mapping Accuracy
Based on the average fragment length, retention frequency and panel size, the expected mapping resolution was 0.3 ± 1.2 Mb . We further estimated mapping resolution by evaluating the distance at which peak − log 10 P-values for cis linked 1 Mb windows decreased by one (−1 log 10 P-values) (Supplementary Figure 3). The −1 log 10 P-value was 1.0 Mb using − log 10 P-values plotted against distance and 2.2 Mb using distance against − log 10 P.
As a surrogate measure of mapping accuracy, we also evaluated the distance between the retention peak of TK1 and its known location (Khan et al., 2020). We did the same for the centromeres. The mapping resolution for TK1 was 33.3 ± 59.5 kb and for the centromeres, −0.5 ± 2.5 Mb (Supplementary Figure 4), neither of which were significantly different from the expected value of zero (P = 0.06 and 0.80, respectively).
Although there were substantial differences between the mapping accuracy estimates for this small RH panel, the resolution is probably of the order of 1 Mb.

Human Genetic Interactions
We identified human genetic interactions by seeking coinheritance of distant genes (>2.4 Mb apart) across the 15 clones (Figure 2). Interactions are hence identified using data from all clones, rather than single clones. For example, if a gene shows a retention pattern of {100110001000000} across the clones and a distant gene shows the same pattern, there is significant coinheritance. Fisher's exact test was used to assess significance (false discovery rate, FDR < 0.05).
Interacting genes were identified as genes closest to a − log 10 P peak consisting of a single window pair, with neighboring window pairs showing decreased significance. Because the selection procedure for interaction peaks ignores "plateaus" of − log 10 P-values, the realized mapping resolution may be better than our empirical estimates. In fact, the resolution may be of the order of the 10 kb steps, close to the estimate using TK1 retention.
We restricted our analysis to coding region genes to facilitate comparison with the legacy RH-PCR network as well as PPI networks. A total of 2,359 interactions connecting 225 genes were found in the RH-Seq network, with a mean degree of 21.0 ± 1.4 ( Figures 3A,B Table 1).
All interactions in the RH-Seq network were "attractive, " in which gene co-retention occurred more often than expected by chance. Gene pairs that interact as extra copies thus promote cell growth. The interactions in our original RH-PCR network were also overwhelmingly attractive (Lin et al., 2010). This finding contrasts with loss-of-function alleles in yeast and cancer cells, where some allele combinations promoted, while others inhibited, growth (Costanzo et al., 2016(Costanzo et al., , 2019Horlbeck et al., 2018).
PCDH7 had the largest number of interactions in the RH-Seq network, 81 (Figures 3B-D). In addition, PCDH7 regulated the expression of the most genes (614) in a mouse RH panel (Park et al., 2008;Ahn et al., 2009). Perhaps genes with many interactions also regulate the expression of many genes.

Clone Numbers and Genetic Interactions
Because the RH-Seq network used only 15 clones, the − log 10 P plots displayed discrete values (Figure 2). This phenomenon was also shown by the number of interactions for each gene ( Figure 3C). The null expectation for the number of clones harboring two distant genes is 0.96 ± 0.04 , while the observed number for the significant genes was 3.35 ± 0.01 (Supplementary Table 1). The minimum clone number for significant interactions was 2, corresponding to 104 interactions (4.4%).
There was no obvious relationship between the number of clones harboring interacting genes and the discrete interaction numbers of 24-28 (number of clones, 4.0 ± 0.0) and 54-57 (number of clones, 3 ± 0) ( Figure 3C). In particular, PCDH7, which showed the largest number of interactions, had 3.3 ± 0.1 clones harboring the gene and its interacting partners. Thus, genes with many interactions did not appear to be driven by an inordinately small number of clones.

Overlaps With RH-PCR and PPI Networks
We found highly significant FDR corrected overlap between the RH-Seq interaction network and the legacy RH-PCR network (Figures 4A,B). The overlap significance diminishes as threshold increases, due to decreased numbers (maximum overlap, FDR < 2.2 × 10 −16 , odds ratio = 3.1, 97 common interactions, 32 expected). Considering the limited power of the RH-Seq dataset, the overlap is encouraging.
We also found less strong, but still significant, overlap between the RH-Seq interaction network and the STRING v11 PPI database (maximum overlap, FDR = 4.64 × 10 −2 , odds ratio = 2.23, 25 common interactions, 11 expected) (Figures 4C,D) (Szklarczyk et al., 2019). The decreased overlap may reflect the different data ascertainment methods of RH-Seq and STRING.
The peak overlap significance between RH-Seq and STRING occurred at intermediate thresholds and is likely driven by two competing trends-as the STRING score increases, the quality of interactions improves while the number of potential overlaps decreases. Concerns that the overlap is only detected at specific thresholds is alleviated by the observation that it survives FDR correction.
The overlap between the RH-Seq and RH-PCR networks is not caused by a few genes with many interactions, since the most significant overlap included all significant interactions (Figures 4A,B). Similarly, the RH-Seq genes at the most significant overlap of the RH-Seq network with STRING ( Figures 4C,D) showed significantly lower degree (10.2 ± 0.8) than the null (P = 7.4 × 10 −11 ).
We found no overlap of the RH-Seq network with four other PPI datasets, BioGRID, HIPPIE, HINT and a yeast two-hybrid dataset, reflecting the modest size of the RH-Seq network (Das and Yu, 2012;Alanis-Lobato and Schaefer, 2020;Bajpai et al., 2020;Luck et al., 2020;Oughtred et al., 2021).
Nevertheless, the overlap of the RH-Seq and STRING networks suggests that gene products which interact to promote cell growth may also interact physically. A similar overlap was found between loss-of-function genetic interactions using CRISPRi in two cancer cell lines and the STRING database (Horlbeck et al., 2018). There was no overlap between the RH-Seq and the CRISPRi networks, but it is difficult to draw firm conclusions about the similarities and differences of these networks given that each are of limited size.

Overlaps With Disease and Gene Ontology Networks
The original RH-PCR interaction network showed significant overlap with networks constructed using a gene-disease database and using gene ontology (GO) (Lin et al., 2010;Pletscher-Frankild et al., 2015;Gene Ontology Consortium, 2021). We found that the RH-Seq network also overlapped with these networks (Supplementary Figure 6).
Due to diminishing interaction numbers, the overlap significance of the RH-Seq and gene-disease networks decreased as the − log 10 P threshold increased (Supplementary Figure 6A). Since maximum significance occurred when all interactions were considered, the overlaps were not dominated by a small number of genes with many interactions.
The RH-Seq and GO networks showed maximum overlap at intermediate thresholds, representing the competing trends of improving interaction quality and decreasing overlap numbers (Supplementary Figure 6B). The RH-Seq network genes at the peak overlap had significantly lower degree (5.56 ± 0.6) than the null (P < 2.2 × 10 −16 ), suggesting that the overlap was not due to genes of high degree. The significant overlaps of RH-Seq with the gene-disease and GO datasets suggest that interacting genes may cause similar diseases and have similar functions.

Clustering of the RH-Seq Network
The RH-Seq network appeared to have many mutually interacting genes, or cliques (Figure 3). The clustering coefficient (or transitivity) of a network evaluates its propensity to be divided into clusters (Pavlopoulos et al., 2011). The RH-Seq network had a much higher clustering coefficient (0.82 ± 0.02) than the RH-PCR network (0.13 ± 9.6 × 10 −5 ).
We constructed a subnetwork from the RH-PCR dataset using the same genes as the RH-Seq network. This subnetwork showed a significantly increased clustering coefficient (0.31 ± 1.58 × 10 −3 .), although not as high as the original RH-PCR network (P < 2.2 × 10 −16 , all comparisons). Nevertheless, the increased clustering coefficient of the RH-Seq network is likely genuine. Genes with many mutual interactions may be preferentially ascertained in small networks. Larger RH-Seq datasets are likely to have decreased clique sizes and clustering coefficients.
In each cluster, we evaluated the number of RH clones containing both interacting genes. Of the five cliques, three had increased clone numbers and two had decreased. Cluster 4 had the most clones (5 ± 0) and cluster 5 had the least (2 ± 0). The cliques did not appear to be driven solely by small clone numbers.
We examined the functional enrichment of the five largest clusters using the biological process term of GO (Gene Ontology Consortium, 2021) (Figure 5). The genes in each cluster showed nominally significant enrichment (P < 0.05, but FDR > 0.05). For example, cluster 1 was enriched in genes related to transcription and cell division, cluster 2 in morphogenesis and cluster 4 in transmembrane ion transport. This enrichment suggests that the clusters represent functionally relevant gene groups.

RH-Seq Network and Growth Genes
We recently used selection of pooled RH cells to identify mammalian growth genes (Khan et al., 2020). Consistent with the idea that the RH-Seq interaction network is involved in control of cell proliferation, there was significant overlap between the RH-Seq network and the RH growth genes (Figures 6A,B). In contrast, both the RH-Seq network genes and RH growth genes (Khan et al., 2020) showed significant non-overlap with growth genes identified in CRISPR loss-of-function screens, depending on the cell type (Hart et al., 2015;Wang et al., 2015) (Figures 6C-F). These observations support the idea that over-expression alters cell physiology differently from lossof-function, and suggests that the two approaches will give complementary interaction networks.

RH-Seq Network and Expression
The genes in the RH-Seq network showed decreased expression in a human RH panel (Supplementary Figure 7A), but not in the GTEx dataset (v8) of human tissue expression (Supplementary Figure 7B) (Wang et al., 2011;GTEx Consortium, 2020). Similarly, the RH growth genes showed decreased expression in both the human RH panel and the GTEx dataset (Khan et al., 2020). Genes identified for their interactions and growth effects as a result of an extra copy are likely to be evolutionarily selected for decreased expression.

Functional Enrichment
The interacting genes in the RH-Seq network were highly enriched in cancer terms from a disease database (FDR < 0.05), with the top eight terms being cancer related (Supplementary Figures 8A,B, 9A) (Pletscher-Frankild et al., 2015;Kuleshov et al., 2016). This observation is consistent with a key role for the RH-Seq network in cell proliferation. Examples of cancer related genes in the network included GAS6, MAP3K4, PRKD1, PTPRD, and VEGFA.
The RH-Seq interaction network was also significantly enriched in the catalog of human genome-wide association studies (GWAS) (Kuleshov et al., 2016) and in the NCBI Database of Genotypes and Phenotypes (dbGaP; https://www.ncbi.nlm. nih.gov/gap/) (FDR < 0.05; Supplementary Figures 8C, 9B,C). The same was true for the RH growth genes (Khan et al., 2020). In contrast, growth genes from loss-of-function CRISPR screens showed no such enrichment. The effects of an extra gene copy in the RH cells may be closer to the mild effects of common disease variants than the more severe effects of a knockout.
Gene ontology analysis (GO) of all the genes in the RH-Seq network revealed enrichment in a number of categories related to cell growth (P < 0.05, but FDR > 0.05), including cell proliferation, p38MAPK, growth factor, tyrosine kinase and replication fork (Supplementary Figures 10, 11) (Gene Ontology Consortium, 2021).

Evolutionary Properties of RH-Seq Network Genes
The RH-Seq network genes displayed significantly decreased numbers of duplications (paralogs) (Supplementary Figure 12A). The evolutionary selection against duplication of the RH-Seq network genes is consistent with their effects on cell survival as an extra copy. Similarly, the RH growth genes also showed decreased numbers of paralogs (Khan et al., 2020).
The RH-Seq network genes exhibited increased evolutionary conservation, with decreased tolerance to loss-of-function (LOF) variants (Supplementary Figure 12B) and decreased mousehuman sequence divergence (Supplementary Figure 12C). However, there was no significant increase in the number of species with orthologs of the network genes ("phyletic retention"), another measure of evolutionary conservation (Supplementary Figure 12D). The RH growth genes also displayed increased evolutionary conservation (Khan et al., 2020). Both the RH-Seq network genes and the RH growth genes had increased gene lengths (Supplementary Figures 12E-H).

DISCUSSION
We created a genetic interaction network using 15 RH clones. Gene interactions were identified by seeking unlinked gene pairs that showed significant co-retention. We used low pass sequencing of nascent RH clones to save labor and consumable costs, while obtaining high quality genotyping.
Unlike approaches such as weighted correlation network analysis (WGCNA), which use similarity of gene expression to assign function (Langfelder and Horvath, 2008), the endpoint of the RH approach is cell proliferation. The RH network therefore plays a causative, rather than correlative, role in cell viability.
The RH-Seq network showed highly significant overlap with the original RH-PCR network. Unidentified systematic error is unlikely to the cause of this agreement. The RH-Seq and RH-PCR networks were obtained from separate data sources (our laboratory vs. six different laboratories studying four species), distinct genotyping technologies (low pass genome sequencing vs. PCR) and independent coding and bioinformatics pipelines. Overlap of RH-Seq network and RH growth genes, thresholded on RH-Seq network − log 10 P. Overlap significance diminishes as threshold increases due to decreased numbers. (B) Overlap of RH-Seq network and RH growth genes, thresholded on RH growth gene − log 10 P. (C) RH-Seq interaction genes (RH+) show lower growth effects when inactivated using CRISPR than non-interaction genes (RH−). Higher −CS values (CRISPR score times −1) mean stronger growth effects of CRISPR null alleles (Wang et al., 2015). P-values shown above comparisons. (D) RH-Seq interaction genes show lower growth effects when inactivated using CRISPR. Higher Bayes factors (BF) means stronger growth effect of CRISPR null alleles (Hart et al., 2015). (E) Significant non-overlap of RH-Seq network genes and CRISPR growth genes, thresholded on −CS score. (F) Non-overlap of RH-Seq network genes and CRISPR growth genes, thresholded on BF score. Horizontal red lines, FDR = 0.05.
The RH-Seq network also showed less strong, but still significant, overlap with the STRING PPI database.
The large number of potential interactions in the networks means that significance can occur even with modest overlaps. Nevertheless, considering the small size of the RH-Seq dataset, the significant overlaps suggest that this strategy is a reproducible and scalable approach to genetic interaction networks.
The high clustering coefficient of the RH-Seq network is likely genuine, since a RH-PCR subnetwork with the same genes also showed significantly increased clustering. We speculate that the high modularity of the RH-Seq network may reflect its derivation from nascent clones. Newly emerging clones may rely heavily on three way and higher order interactions for viability, leading to the apparent increased clustering of the network (Crona et al., 2017;Costanzo et al., 2019). A properly powered analysis of this supposition will require a larger dataset.
The RH-Seq network genes displayed significant overlap with RH growth genes but non-overlap with growth genes identified using CRISPR loss-of-function screens. The extra gene copies used by the RH approach will provide complementary insights into cell physiology compared to loss-of-function networks.
The interacting RH-Seq genes were enriched in terms related to cancer in a gene-disease database, suggesting that the RH network strategy is relevant to cell proliferation and may provide new therapeutic insights into tumorigenesis.
Both the RH-Seq interaction genes and the RH growth genes showed significant enrichment in the GWAS and dbGaP databases, while the CRISPR growth genes did not (Khan et al., 2020). Most variants that contribute to complex traits are in non-coding regions and affect gene expression (Gallagher and Chen-Plotkin, 2018). The significant enrichment of the RH-Seq network genes in the GWAS and dbGaP databases may reflect the milder physiological effects of an extra gene copy driven by its natural promoter, compared to the more severe effects caused by CRISPR loss-of-function alleles or overexpression using CRISPRa.
It has been suggested that cataloging genetic interactions will help illuminate the "missing heritability" in complex traits (Costanzo et al., 2019). The enrichment of the RH-Seq network genes in the GWAS and dbGaP catalogs suggests that the RH genetic interaction networks will be particularly relevant to understanding complex traits.
The cost of RH-Seq for mapping genetic interactions compares favorably to other approaches. Comprehensive coverage of the protein encoding interactome using CRISPR would require ∼20,000 separate experiments. The RH-Seq approach offers the cost savings of low pass sequencing and lack of clone propagation. Further, each RH clone harbors ∼0.25 times the human genome and therefore evaluates multiple genes in each cell. In contrast, CRISPR strategies evaluate only a single gene in each cell. The additional layer of multiplexing in RH-Seq lends efficiency to the construction of genetic interactions maps. In fact, the size of the RH-Seq network obtained using 15 clones (225 genes) is comparable to a recent large CRISPRi study, which examined genetic interactions for <450 genes in cancer cells (Horlbeck et al., 2018).
The consumable cost per clone is $50 in RH-Seq, making the cost of analyzing 1,000 clones feasible. The labor costs of data acquisition are also far lower than the other strategies. One individual could easily create and analyze 1,000 clones. In addition to high statistical power, such a panel would have mapping resolution of 4 ± 18 kb , sufficient to confidently map individual genes (Lin et al., 2010).
The cost advantages of the RH-Seq strategy will allow it to be applied to wider areas. Analyzing nascent clones identifies the genetic interactions necessary for initial viability. Further culture of the clones will allow the evolution of genetic interactions to be evaluated over time. Expanding the cell fusion reaction to isogenic human cell lines will allow genetic networks to be mapped in the presence or absence of oncogenic mutations. Although these more ambitious experiments will increase costs, using 1,000 RH clones would only be 5 % of the consumable cost of CRISPR approaches.
An additional advantage of the RH-Seq approach is that non-coding genes can be incorporated into the genetic network on an equal footing with coding genes (Khan et al., 2020). This extension will not increase the cost of the RH-Seq network, but would increase the cost of a CRISPR network nine-fold. In addition, the RH-Seq approach can evaluate haploinsufficient genes, which are difficult to assess using lossof-function methods.
Low pass sequencing of nascent RH clones is a realistic and affordable approach to constructing comprehensive genetic interaction maps of the human genome. The strategy can be scaled in a cost-effective fashion to evaluate genetic networks that contribute to cancer and other complex disorders, providing new therapeutic insights.

Sequencing
DNA was purified from 24 clones (20 BrdU and 4 non-BrdU) using the Illumina Nextera TM DNA Flex Library kit, following manufacturer's instructions. Human DNA quantities as low as 100 ng can be sequenced using this kit. Illumina sequencing employed 65 bp single reads.
We obtained human-specific reads by aligning reads to the GRCh38/hg38 human genome assembly (hg38.fa) at high stringency, allowing only one mismatch (Khan et al., 2020). Reads that also aligned to the Chinese hamster (Cricetulus griseus) genome assembly (RAZU01) (Rupp et al., 2018) were then discarded, leaving human-specific reads. Alignments were quantitated using the number of human-specific reads per 1 Mb window with 10 kb steps (Khan et al., 2020).
A total of 16.0 ± 1.3 M reads were obtained for each of the 24 clones. Based on the similarity of human-specific reads across the genome a total of 9 clones were duplicates, presumably due to cell dispersion before clone picking (R > 0.9). Seven clones were duplicates of clone 1, and two were duplicates of clone 9. The 15 independent clones, 11 from BrdU and 4 from non-BrdU, were used to construct the interaction network.
There was no significant difference in the retention frequency of the BrdU (0.25 ± 0.10) and non-BrdU clones (0.27 ± 0.18, P = 0.90). While there were sufficient BrdU clones to identify significant co-inheritance (P ≥ 2.2 × 10 −3 ), there were insufficient non-BrdU clones by themselves (P ≥ 0.25). In the independent RH clones, there were 589 ± 135 reads in 1 Mb windows containing human fragments and 18 ± 3 in those without.

Evaluating Mapping Accuracy
To measure mapping resolution, we used the significance of cis linkage for windows separated by <20 Mb. Linear models were employed to estimate initial slopes relating the − log 10 P-values and the distances between the windows and vice versa. The slopes were then employed to quantitate −1 log 10 P-values.
We also used the human DNA retention profiles averaged across the 15 clones to evaluate mapping accuracy. The midpoints of retention peaks for TK1 or the centromeres were taken as the estimated location of each element (Khan et al., 2020). The distance between the estimated and actual midpoint positions was the mapping resolution. Standard errors of the mean were assessed by bootstrapping.

Genetic Interactions
Relative copy numbers in the 1 Mb windows were calculated by normalizing human-specific sequence reads to those at TK1 in each clone. TK1 has a retention frequency of one, since it is the marker used to select RH clones. A window was deemed to harbor a human DNA fragment if log 2 (relative copy number + 1) > 0.2, corresponding to the upper 28th percentile of values and a relative copy number of 0.15.
Fisher's exact test was used to identify pairs of windows separated by > 2.4 Mb (upper 49th percentile of fragment lengths) with significant co-inheritance (FDR < 0.05). FDR correction used a sample of P-values from the 4.5 × 10 5 window pairs, a conservative procedure (Benjamini and Hochberg, 1995).
To ensure uniform sampling of the P-values while preventing systematic bias, the rows of the interaction matrix were sampled with a spacing of 1 in 100, and the columns were sampled pseudorandomly, with a minimum spacing of one window, a maximum of 2.4 × 10 4 and a mean of 1.2 × 10 4 . FDR values were calculated from the 3.9 × 10 4 sampled P-values.
A genetic interaction was ascertained if the − log 10 P peak was a single window pair with FDR < 0.05, and adjacent window pairs had decreased − log 10 P-values. The genes corresponding to the interaction peak were taken as the nearest protein coding genes using GENCODE v31 (Frankish et al., 2021).

Networks
The gene-disease network was constructed using a literature based database (Pletscher-Frankild et al., 2015). Genes sharing the same disease were linked together, yielding a network with 16,254 genes and 23,983,208 interactions. The GO network was constructed by linking genes sharing the same GO categories containing ≥70 and ≤1,000 members (Lin et al., 2010). The network consisted of 16,902 genes and 9,581,750 interactions.
The significance of network overlaps were FDR corrected for the number of thresholded comparisons, as previously described (Ahn et al., 2009).

Public Data
GO analyses used the 2021-01-01 release and the PANTHER Overrepresentation Test, as well as Enrichr (Kuleshov et al., 2016;Gene Ontology Consortium, 2021). The duplicated genes database (DGD) together with Ensembl release 71 was used to identify paralogs (Ouedraogo et al., 2012).
Intolerance of human genes to predicted loss-of-function (pLoF) variants was evaluated using the observed/expected ratio from the Genome Aggregation Database (gnomAD) release 2.1.1. (Karczewski et al., 2020). We assessed the evolutionary divergence of mouse-human homologs using the ratio of nonsynonymous to synonymous substitutions (dN/dS) in Ensembl release 97 (Cunningham et al., 2019). The number of species with gene orthologs was evaluated using Homologene release 68 (Sayers et al., 2019). Unless otherwise noted, tests of significance used Welch's Two Sample t-test.

AUTHOR CONTRIBUTIONS
AK: acquisition of data, analysis and interpretation of data, and drafting or revising the article. DS: conception and design, analysis and interpretation of data, and drafting or revising the article. Both authors contributed to the article and approved the submitted version.