Plasma cortisol-linked gene networks in hepatic and adipose tissues implicate corticosteroid-binding globulin in modulating tissue glucocorticoid action and cardiovascular risk

Genome-wide association meta-analysis (GWAMA) by the Cortisol Network (CORNET) consortium identified genetic variants spanning the SERPINA6/SERPINA1 locus on chromosome 14 associated with morning plasma cortisol, cardiovascular disease (CVD), and SERPINA6 mRNA expression encoding corticosteroid-binding globulin (CBG) in the liver. These and other findings indicate that higher plasma cortisol levels are causally associated with CVD; however, the mechanisms by which variations in CBG lead to CVD are undetermined. Using genomic and transcriptomic data from The Stockholm Tartu Atherosclerosis Reverse Networks Engineering Task (STARNET) study, we identified plasma cortisol-linked single-nucleotide polymorphisms (SNPs) that are trans-associated with genes from seven different vascular and metabolic tissues, finding the highest representation of trans-genes in the liver, subcutaneous fat, and visceral abdominal fat, [false discovery rate (FDR) = 15%]. We identified a subset of cortisol-associated trans-genes that are putatively regulated by the glucocorticoid receptor (GR), the primary transcription factor activated by cortisol. Using causal inference, we identified GR-regulated trans-genes that are responsible for the regulation of tissue-specific gene networks. Cis-expression Quantitative Trait Loci (eQTLs) were used as genetic instruments for identification of pairwise causal relationships from which gene networks could be reconstructed. Gene networks were identified in the liver, subcutaneous fat, and visceral abdominal fat, including a high confidence gene network specific to subcutaneous adipose (FDR = 10%) under the regulation of the interferon regulatory transcription factor, IRF2. These data identify a plausible pathway through which variation in the liver CBG production perturbs cortisol-regulated gene networks in peripheral tissues and thereby promote CVD.


S1.2 Causal gene network reconstruction
Cis-eQTL discovery was carried out to identify genetic instruments to be used for causal inference analysis with Findr 2 (Figure S4).An automated pipeline was established to use the secondary linkage test (P2) to calculate SNP-gene associations when supplied with a list of genes.SNP-gene associations were obtained between all SNPs within 1 Mb of the trans-gene and all other transcripts using the same tissue dataset as the trans-gene.
Associations between all SNPs and the trans-gene were extracted from the output.A primary cis-eQTL was selected for each gene, defined as the SNP-gene association with the highest Findr P2 score for the trans-gene.An alternate, independent, cis-eQTL was selected as the second strongest cis-association not in LD with the primary cis-eQTL.LD between SNPs was calculated as the Pearson correlation coefficient between the primary cis-eQTL genotype and all other SNP genotypes.The alternate cis-eQTL was defined as the top cis-association, which was not in LD with the primary cis-eQTL (R 2 < 0.5).
To test for pleiotropy between the selected instrument and other cis-genes, cis associations between all cis genes (± 1 Mb of the trans-gene) and the primary instrument were obtained, as detailed in supplementary results (section S2.2) All genes with a valid cis-eQTL (P2 > 0.75) were taken forward for causal analysis with Findr.
Causal relationships were inferred between these cis-eQTL genes (A-genes) and all other transcripts expressed in the same tissue (B-genes).The input was as follows: (dg) array of eQTL genotypes A-gene in 012 format, (d) array of normalised A-gene expression levels, (dt) array of expression levels for all B-genes in the relevant tissue sorted with d appearing on top.
The output of all tests in Findr was calculated using the pijs_gassist function from the Findr Python package.The posterior probability of a causal interaction (P(A →B)) was calculated from the product of the alternative hypotheses from the secondary linkage test (P2) and the controlled test (P5).The controlled test (P5) is a likelihood ratio test, which can be used as a composite test with secondary linkage (P2*P5) to infer a causal A →B relationship while using a cis-eQTL, E, as an instrumental variable.P5 examines whether A and B are not associated independently with E (i.e.whether they are still coexpressed after adjusting for E), while P2 tests for a direct association between E →B.Previous work has demonstrated that most cis-eQTLs are only associated with a single gene 3 , therefore selecting cis-eQTLs specifically as an instrument allows E →B to be used as a proxy for estimating causal effects between A →B.When combined with P2, P5 can then be used to account for the comparatively few instances where E is a cis-eQTL for more than one gene, although in such cases a false positive may still occur when A and B are confounded by a common regulator 2,4 .Therefore, we also examined manually all cis-associations for selected E of interest, to account for any sources of pleiotropy that may have been missed by P5 (section S2.2).This approach was undertaken for each A-gene in a given tissue in a iterative fashion.Following completion of analysis for all A-genes in a tissue, the output was converted from the default matrix format to a Pandas DataFrame.Each tissue-specific gene set of A →B pairwise interactions was filtered according to a local precision FDR threshold (Findr score) for each interaction, to correspond to a global FDR for all interactions in the tissue set.
Networks were assembled, using the network visualisation tool, Cytoscape (version 3.8.0),from FDR thresholded pairwise gene interactions previously described.These were assembled as directed networks where the A-gene acts as the parent node and the B-gene as the child node, with the posterior probability of an A →B interaction forming the network edge.
The Findr score for a given A →B pairwise interaction, or E →B in the case of P2 testing (Table S3), is calculated as 1 minus the probability of that interaction being a false positive.To obtain the probability of a false positive across all interactions in a gene set, this was calculated as 1 minus the mean of all local precision FDR scores for a given tissue.A Findr score cut off was then set to obtain interaction sets at 10%, 15% and 20% global FDR thresholds 5 .

S1.3 Functional annotation and clustering for GO enrichment
Gene sets were functionally annotated using the Database for Annotation, Visualization and Integrated Discovery (DAVID) 6 .This web-based application allows for the generation of gene clusters that have been grouped in relation to an enrichment of functional terms, including but not limited to Gene Ontology (GO) terms.The strength of the gene-term interactions are measured by EASE scores, a modified Fisher's exact test.An enrichment score for a given cluster is generated as the geometric mean of all the EASE scores within a cluster that has undergone -log transformation.For all analyses Ensembl Gene IDs were used as the input format for DAVID as opposed to universal gene symbols.
For the analyses conducted, all of the default annotation options were selected in addition to: GAD DISEASE, GO TERM BP FAT, GO TERM CC FAT, GO TERM MF FAT, PUBMED ID, REACTOME PATHWAY, BIOGRID INTERACTIONS and UP TISSUE.Gene sets were then run using DAVID and functionally enriched clusters generated using high classification stringency.Tissue-specific gene sets from STARNET RNA-seq datasets were used as background for enrichment (Table S13).

S2.1.1 Liver
A trans-association was identified between cortisol-associated SNP rs4905194 and CPEB2 in STARNETliver (Figure 2A).Following cis-eQTL discovery for CPEB2, a SNP peak was identified upstream of the CPEB2 TSS represented by the instrument rs62410848, which was used as an instrumental variable for network reconstruction (Figure S5).48 causal interactions were obtained at a global 10% FDR threshold (Posterior probability > 0.855) (Table S9).When filtering to a minimum of 4 targets, the only GR-regulated trans-gene that remained was CPEB2 (Figure 2E).Notably, this was also the trans-gene that appeared in the most GR target datasets, forming a network with 44 target genes.Functional enrichment was performed using DAVID for all CPEB2 target genes (Table S8).The strongest cluster was related to fatty acid beta oxidation and lipid metabolism, including 5 genes related to GO:0006635fatty acid beta-oxidation (adj p-value = 0.002).Other enrichments stem from 8 genes related to acquired immunodeficiency syndrome and disease progression (adj p-value = 0.003).
The strongest causal relationship within this network was between CPEB2 and the gene HADHA (Posterior Probability = 0.99), responsible for encoding the alpha subunit of the mitochondrial trifunctional protein 7 .Mutations affecting this protein have been linked to long-chain 3-hydroxyacyl-CoA dehydrogenase (LCHAD) deficiency, which affects the ability to metabolise fatty acids in the liver 8 .These mutations have also been linked to maternal acute fatty liver during pregnancy 9 .

S2.1.2 Subcutaneous fat
In STARNET subcutaneous fat, 486 causal relationships were detected at a 10% FDR threshold (Posterior probability = 0.87), which is the most out of all tissues examined (Table S10).When fil-represented under the regulation of the genes RNF13 and IRF2.This includes a total of 343 causal relationships across both sub-networks, including two genes shared by both sub-networks.
RNF13 was found to be trans-associated with the cortisol-linked SNP rs11622665 (Figure 2E).
A cis-eQTL peak of SNPs associated with RNF13 in STARNET subcutaneous fat was identified upstream of the RNF13 TSS, represented by the lead SNP rs9853321 which was used as a causal instrument in the reconstruction of the causal network driven by RNF13 (Figure S5).
RNF13 represents the largest subcutaneous fat sub-network with 215 gene targets at a 10% FDR threshold (Figure 2F).The strongest functional enrichment term for RNF13 targets is related to Poly(A) RNA binding, where 33 targets are included for this term, GO:0044822 poly(A) RNA binding (adj p-value = 0.01), and 39 targets are included for RNA binding, GO:0003723 RNA binding (adj p-value = 0.04).Other notable terms include 23 genes related to Zinc finger motifs (adj p-value = 0.05).
IRF2 was found to be associated with the cortisol-linked SNP rs8022616 (Figure 2C).Cis-eQTL discovery revealed associations between rs34985265 and IRF2 expression in subcutaneous fat to obtain an instrument that could be used for causal network reconstruction (Figure S5).
The IRF2 sub-network contains 128 targets at a 10% FDR threshold (Figure 2D).Following functional enrichment of IRF2 targets, the strongest enrichment term included 19 genes related to Poly(A) RNA binding (p-value = 0.009), however this association was not retained following multiple testing correction.Some notable targets of IRF2 include LDB2 (Posterior probability = 0.94) and LIPA (Posterior probability = 0.91).GWAS suggests functions for LIPA related to CAD and ischaemic cardiomyopathy and LDB2 has been demonstrated to be involved in the development of atherosclerosis 10 .Additionally, cortisol has been shown to induce a 5-fold reduction in LDB2 expression in adipocytes 11 .
An additional subcutaneous fat gene network was identified for the transcription factor PBX2 containing 138 targets at a 10% FDR threshold.However, the cis-eQTL instrument that was used to reconstruct this network was found to be associated with many other genes at the PBX2 locus, which include causal targets within the PBX2 network.This indicates that PBX2 is not independently linked to this instrument and the PBX2 causal network could be driven by a cis-gene other than PBX2.

S2.1.3 Visceral abdominal fat
In STARNET visceral abdominal fat, trans-associations were identified for the genes CD163 and LUC7L3 with the same cortisol associated SNP rs2005945 (Figure S6A-B).Although STARNETvisceral abdominal fat contained the largest number of trans-associations with cortisol SNPs, the fewest causal relationships were detected in this tissue at 10% FDR (Table S11).Two small subnetworks were detected, regulated by the genes LUC7L3 and CD163 composed of eleven and four targets (Figure S6C).Interestingly, when the FDR threshold is reduced to 15% the sub-network for CD163 is expanded to include 378 targets, a much more dramatic expansion compared to reducing the threshold to 15% FDR with other regulators.The networks for CD163 and LUC7L3 were identified using the cis-eQTLs rs73059776 and rs6504682, respectively (Figure S6D).Due to the small size of the 10% FDR networks, functional enrichment and clustering was not carried out for either of the networks identified in visceral abdominal fat.

S2.2 Application of independent genetic instruments for gene network reconstruction
To study the impact of instrument selection on the reconstruction of causal networks we examined the distribution of local cis-eQTLs for each of the GR-regulated trans-genes that was found to regulate a network.Primary instruments were selected as the strongest cis-eQTL within a 1 Mb window of the associated gene, as determined by secondary linkage test posterior probability.However, the landscape of gene expression-linked genetic variation can involve several loci associated with the expression of the same gene to differing degrees.In addition to selecting a primary cis-eQTL as an instrument, alternate independent instruments were also identified.These were defined as the second strongest cis SNP-gene association which was not in LD with the primary instrument (R 2 < 0.5) (Figure S5).
Causal relationships in STARNET liver were defined by a GR-regulated network under the regulation of CPEB2 (FDR = 10%).The genetic instrument used to construct this network, rs62410848 (posterior probability = 0.90), is the strongest cis-eQTL for CPEB2, located less than 100 Kb upstream of the CPEB2 locus.An independent peak was identified 400 Kb upstream of CPEB2, represented by rs6847363 as the top cis-association in this region (posterior probability = 0.48).
As this independent instrument fell below the required threshold (posterior probability > 0.75), causal analysis was not carried out using rs6847363 as an instrument.
To determine the robustness of the primary instrument, we examined cis-associations with other genes within this locus (± 1 Mb).While CPEB2 was the strongest cis-eQTL association in this region, rs62410848 was also seen to be associated with CD38 (posterior probability = 0.85), a gene ∼800 Kb downstream of CPEB2.Although CD38 is not associated with any cortisol variants at the SERPINA6/SERPINA1 locus, it has been identified as being regulated by glucocorticoids in smooth muscle cells 12 and has been identified as a GR target in ENCODE.However, CD38 does not appear as a target of CPEB2, which suggests a low P5 score.This suggests that CPEB2 and CD38 are independently associated with rs62410848 and that CPEB2 is the true network regulator in this cis region.
In STARNET subcutaneous fat, the IRF2 sub-network was generated using the SNP rs34985265 (posterior probability = 0.94) located ∼500 Kb upstream of IRF2.The strongest independent cis-eQTL for IRF2, rs2171838 (posterior probability = 0.72), is located closer to IRF2, ∼300 Kb of the IRF2 TSS.This association did not reach the association threshold for use as a causal instrument (posterior probability = 0.72).Examining cis-associations between rs34985265 and all genes within 1 Mb of IRF2, IRF2 is the only gene to show an association with this SNP.
For RNF13 the primary instrument, rs9853321 (posterior probability = 0.81), was located in a peak 400 Kb upstream of the RNF13 transcription start site.The strongest independent cis-eQTL, rs62282739, is located nearly 1 Mb downstream of RNF13 and was too weak to be taken froward for causal analysis (posterior probability = 0.70).Cis-associations for rs9853321 in this region, include an association with the gene TM4SF1 (posterior probability = 0.93) at a higher level than the association with RNF13.There is some indication of a causal relationship between RNF13 and TM4S1 (posterior probability = 0.73), however TM4S1 is not a target of RNF13 at either a 10% or 15% global FDR threshold, suggesting that TM4S1 is independently associated with rs9853321.
The third subcutaneous fat sub-network was predicted using the SNP rs35571244 as a cis-eQTL for PBX2 (posterior probability = 0.93).This SNP is located ∼800 Kb downstream of the PBX2 transcription start site and is the strongest cis-eQTL for a peak of SNPs in this region.An alternate cis-eQTL, rs3128947 (posterior probability = 0.73), is located 500 Kb upstream of the PBX2 transcription start site.Again, this cis-eQTL was too weak to be taken forward for causal analysis.There are 31 cis-associations between rs35571244 and genes within a 1 Mb window of PBX2 at a 15% FDR threshold (posterior probability > 0.8), of which PBX2 is the 7th strongest association.Of these cis associations, 10 are causal targets of PBX2 when using rs35571244 as a genetic instrument at a 15% FDR threshold and 4 are targets at a 10% FDR threshold.This suggests that these genes are not independently linked to rs35571244, which raises the possibility that these targets would be predicted from other cis-genes and not just PBX2 specifically.
The primary instrument used to reconstruct the CD163 sub-network in visceral abdominal fat, rs7954905 (posterior probability = 0.86), is located less than 100 Kb downstream of CD163.
The strongest independent cis-eQTL, rs2377237 (posterior probability = 0.72), is located ∼500 Kb upstream of the CD163 transcription start site, however this SNP was below the threshold for use as a causal instrument.There were 6 cis-associations at a 15% FDR threshold (posterior probability > 0.78).One of these cis-genes is a target of CD163 (posterior probability = 0.86) at a 15% FDR threshold, but no genes are targets at a 10% FDR threshold.This target gene is CD163L1, which is a paralog of CD163 located downstream of of CD163.The peak represented by rs7954905 is located in the CD163L1 gene body.CD163L1 arose as a gene duplication of CD163 and colocalises with CD163 13 .The primary instrument used to reconstruct the LUC7L3 sub-network, rs6504682 (posterior probability = 0.8), is located within the LUC7L3 gene body.An independent cis-eQTL, rs2412130 (posterior probability = 0.7) is located in a peak ∼1000 Kb upstream of LUC7L3.Again, this alternate cis-eQTL did not meet the threshold for use as an instrument.There is only one other cis-gene associated with rs6504682, ANKRD40 (Findr score = 0.81), however this gene is not a target of LUC7L3 in either the 15% or 10% FDR causal networks in visceral adipose.Table S4: Cortisol associated trans-genes from STARNET-liver (FDR = 15%) with evidence of GR regulation.Transcription factor db includes ENCODE, TRANSFAC and CHEA transcription factor datasets.

Table S5:
Cortisol associated trans-genes from STARNET-visceral adipose fat (FDR = 15%) with evidence of GR regulation.Transcription factor db includes ENCODE, TRANSFAC and CHEA transcription factor datasets.ChIP-seq and Microarray fields are from Yu et al experiments in adipocytes 14 .* Indicates genes that have been identified as GR targets from both global TF binding and perturbation experiments.Direction of effect is estimated from the Pearson correlation coefficient of the gene expression level and cortisol associated genotype.
Table S6: Cortisol associated trans-genes from STARNET-subcutaneous fat (FDR = 15%) with evidence of GR regulation.Transcription factor db includes ENCODE, TRANSFAC and CHEA transcription factor datasets.ChIP-seq and Microarray fields are from Yu et al experiments in adipocytes 14 .Murine dex is from dexamethasone treated adrenalectomised mice 15 .Indicates genes that have been identified as GR targets from both global TF binding and perturbation experiments.Direction of effect is estimated from the Pearson correlation coefficient of the gene expression level and cortisol associated genotype.
Table S8: Functional enrichment of causal network targets using DAVID.Filtered to enrichment score > 1.

Table S12:
Transcription factor enrichment within gene network targets.Fisher's exact test for IRF2 targets from TRANSFAC predicted targets and GR targets from ENCODE.STARNET subcutaneous fat genes used as background for enrichment.

Figure S1 :
Figure S1: Distribution of RNA-seq read counts across all STARNET tissues.

Figure S2 :
Figure S2: Principal component analysis of gene expression samples across all STARNET tissues.

Figure S3 :
Figure S3: Correlations between randomly sampled genes from discovery and replication datasets both pre and post correction.

Figure S4 :
Figure S4: Instrument selection for causal analysis with Findr.Flowchart depicts identification of cis-eQTLs for use as genetic instruments for causal analysis with Findr.

Figure S5 :
Figure S5: Cis-eQTL discovery for network regulators.SNP-gene associations within a 1 Mb window of the associated gene calculated using the Findr secondary linkage test (P2) and presented as 1-findr score (-log 10 ) with LocusZoom.Lead cis-eQTL is primary instrument used for causal analysis.Red circle indicates independent (R 2 < 0.5) alternate instrument.

Figure S6 :
Figure S6: 10% FDR gene network in STARNET visceral abdominal fat driven by LUC7L3 and CD163.(A) Gene expression boxplot in STARNET visceral abdominal fat showing trans-association with cortisol-linked SNP rs20005945 and CD163 (B) and LUC7L3 (p-value obtained from Kruskal Wallis test statistic).(C) Causal gene network reconstructed from pairwise interactions from GR-regulated trans-genes against all other STARNET visceral abdominal fat genes.Edges represent Bayesian posterior probabilities of pairwise interaction between genes (nodes) exceeding 10% global FDR.Arrow indicates direction of regulation and interactions were only retained where parent node had at least 4 targets.(D) LocusZoom plot showing cis-eQTLs for CD163 and LUC7L3, with lead SNP used as instrumental variable indicated in purple.Significance of association is indicated on the y axis as -log10(1-fdr), where fdr represents the local false discovery rate as estimated by Findr.(E) Correlations between network targets in discovery vs replication datasets for LUC7L3 and (F) CD163.Kruskal Wallis test calculated for distribution of correlations between network targets compared to correlations within random gene set of same size.

Table S13 :
Summary of tissue specific gene expression data from STARNET.