Multiple sclerosis genetic and non-genetic factors interact through the transient transcriptome


MS-associated GWAS signals colocalize with regulatory regions of DNA plausibly coding for trRNAs

We set up our region-of-interest (ROI) inside GWAS catalogue32 by considering all MS GWAS that were published, extracting all SNP positions, and creating a single set of genomic coordinates that therefore encompass all GWAS-derived or GWAS-verified signal for MS. We then refined the SNP list by pruning out about 1.5% of the SNPs as they did not contain intelligible genomic annotations or were duplicates. The final ROI list is reported in Additional File: Table S1 and consists of 603 unique single-nucleotide regions; to provide a “threshold” against which the match ROI <  > Database would be benchmarked, we used 107,423 regions as Universe, that corresponded to the signals coming from the entire GWAS Catalog.

Next, we matched through colocalization analyses our ROI with lists of regions resulting from the work by Michel et al., which mapped the transient and stable transcriptome captured by TT-seq after T cell stimulation24. We found a significant enrichment of MS-associated genetic variants in the transient transcriptome (p-value = 2.80 × 10−9; Table 1). Of note, when we split the transcriptome list in two subsets for long (≥ 60 min) and short (< 60 min) half-life, we found that only the short half-life subset significantly colocalized with the ROI (p-value 2.06 × 10−8 vs. 0.09). This finding was indicative of the relationship between MS-associated GWAS signals and the regulatory regions of DNA coding for trRNAs.

Table 1 Enrichment of MS-associated genetic variants in lists of T-cell transient transcripts extracted from Michel et al., 2017.

When we further dissected the mapping of the ROI colocalization signals, we found a significant excess of intergenic and intron regions (as anticipated), as well as their prevalent distribution away from the transcription start site (TSS; Figure Supplement 1A). Notably, when we extended this analysis to GWAS data coming from other multifactorial diseases or traits, dividing immune-mediated and other complex conditions, we found highly comparable profiles (supplementary Fig. S1B–C, Additional File: Table S2), suggesting that the colocalization between MS-associated DNA intervals and intergenic or intronic sequences, plausibly referring to trRNA coding regions, is shared by the genetic architecture of most multifactorial disorders.

To consolidate this result and gain a deeper biological insight, we extended the colocalization analysis matching the ROI with a vast set of databases of regulatory DNA regions, including enhancers and super-enhancers, derived from experiments on diverse tissue types (a total of 4,697,782 DNA regions, plausibly coding for trRNA, were extracted from a wide variety of raw data sources; referenced in Additional File: Table S3). To improve interpretability of the results through ranking, we implemented a harmonic score (HS), based on the Odd Ratio, the − log (p-value), and the support of each match. Statistically significant results came from sets included in SEA, seDB, dbSuper and other single lists of enhancers and non-coding RNAs derived from experiments on diverse tissue types, not necessarily belonging to the immune cells lineages (Fig. 1, black dots; Additional File: Table S4). On another hand, we found a strong enrichment of MS-associated genetic variants in cell lines of hematopoietic lineage, including CD19 + and CD20 + B lymphocytes, CD4 + T helper cells, and CD14 + monocytes. This is in line with the GWAS data and the known immunopathogenesis of the disease, as well as with the fact that we consider a TT collection coming from a lymphoid line for the co-localization analysis. Moreover, among the significant hits, we found collections coming from brain resident cell, in particular microglial-specific enhancers, which is in line with recent reports on brain cell type-specific enhancer-promoter interactome activities, and the latest GWAS on MS genomic mapping33,34. Non-relevant tissues serving as controls (such as kidney, muscle, glands, etc.) scored low in the ranking, crowding the bottom-left corner of Fig. 1 (grey dots; see also Additional File: Table S4).

Figure 1
figure 1

Enrichment of MS-associated SNPs in databases of regulatory elements, sorted by experiment/cell lines. X-axis shows the Odd Ratio, y-axis shows the − log (pValue); dot size is proportional to to the Harmonic Score (HS), a comprehensive estimation of the relevance of hits, as derived by merging and balancing the OR, pValue and Support ( i.e., the number of hits resulting from the colocalization analysis) of each match. Thus, prioritized hits are represented by dots that occupy the upper-right area of the chart. Dots are coloured by cell type. Labeled points have HS > 50. Labels were arbitrarily designated according to the database of origin and the cell lineage where the enrichment occurred. Supplementary Table S4displays in detail the results that generated this plot.

Genetic and non-genetic factors for MS etiology converge in genomic regions plausibly coding for the transient transcriptome

Independent studies support the fact that MS GWAS intervals are enriched with DNA binding regions (DBRs) for protein ‘transducers’ mediating non-genetic factors of putative etiologic relevance in MS, such as vitamin D deficiency or EBV latent infection17,19. Therefore, we further inquired whether DNA regions plausibly coding for trRNA would share these features (i.e., they colocalize with such DRBs). We set up 4 new ROIs corresponding to the DBRs for VDR, activation-induced cytidine deaminase (AID), EBNA2, and Epstein Barr nuclear antigen 3 (EBNA3C), chosen among viral or host’s nuclear factors potentially associated to MS etiopathogenesis35,36,37. The DBRs for each nuclear factor were derived from recent literature (Additional File: Table S5) and matched with the GWAS-derived MS signals to confirm and expand previous results. We found statistically significant results for VDR, EBNA2, and AID for all the SNP position extensions (± 50, 100, 200 kb up- and down-stream), while for EBNA3C significant results came out at extension of ± 100 and 200 kilobases. This finding suggests that several DBRs can impact on the MS-associated DNA intervals through colocalization (Table 2).

Table 2 Enrichment of MS-GWAS regions (at ± 50,100,200 kb range of extension) in lists (number in brackets in the right-most column) of DNA binding sites of human and viral molecular transducers; significant results (p < 0.05, corresponding to a − log (p) > 1.301) in bold.

Building once again on the work by Michel et al.25, we inquired whether there was a colocalization between genomic regions containing MS-associated variants, DBRs for VDR, EBNA2, EBNA3C, AID, and DNA intervals plausibly coding for trRNA. To this end, we considered the transient transcriptome that proved to be enriched with MS-associated variants (Table 1), and we then matched the corresponding coding regions with the DBRs for the four molecular transducers. For this analysis DBRs for EBNA2 (6880 regions), EBNA3C (3835 regions), AID (4823 regions), and VDR (23,409 regions), represented the ROI, while the ENCODE database of Transcription Factors Binding Sites served as Universe (13,202,334 regions; Fig. 2a). We report the results of this analysis in Table 3, which shows the significant colocalization of DNA regions plausibly coding for trRNAs with both MS-relevant GWAS signals, and DBR of 3 out of 4 factors active at nuclear level, and potentially associated with MS. The DBR for EBNA3C did not reach statistical significance, though it showed higher values of support for short half-life transcripts.

Figure 2
figure 2

Colocalization analysis of DBRs for human and viral molecular transducers, MS-associated SNPs and DNA regulatory regions derived from databases. (A) Schematic representation of the colocalization analysis. (ROI region of interest, DBR DNA binding regions, ENCODE TFBS transcription factor binding site). The figure shows the tracks we considered for the colocalization analyses. In brief, the ROI included the DBRs of MS-related viral and human transducers and was matched with MS-associated SNPs extended by 50, 100, and 200 kilobases that colocalize with regions plausibly coding for trRNAs (Database). As a control (Universe), we took from ENCODE the entire list of transcription factors binding sites. Results were considered significant if a colocalization was found across ROI and a Databases element without occurring in the Universe as a statistically significant match. (BE), Colocalization results of EBNA2, EBNA3C, AID, VDR. The charts display results of all matches, i.e., with MS-associated SNPs and their extension at ± 50, 100, 200 kb. X-axis shows the Odd Ratio, y-axis shows the − log (pValue); dot size is proportional to to the Harmonic Score (HS). Thus, prioritized hits are represented by dots that occupy the upper-right area of the chart. Dots are coloured by cell type. Top-scoring hits in each subgroup are labeled; labels were arbitrarily designated according to the database of origin and the cell lineage where the enrichment occurred.

Table 3 Colocalization of human and viral transducer DBRs and MS-GWAS positions (at ± 50,100,200 kb range of extension) in DNA regions coding for transient transcripts; significant results (p < 0.05, corresponding to a − log (p) > 1.301) in bold.

To review and confirm previous colocalizations, we considered the genomic regions resulting from the above reported match between the MS-associated GWAS intervals and the databases of regulatory DNA regions, containing enhancers and super-enhancers, plausibly enriched in trRNA-coding sequences (Fig. 2 and the online data resource). We therefore matched these DNA regions with the DBR for VDR, EBNA2, EBNA3C and AID, finding significant enrichments that allow to contextualize and prioritize genomic positions, cell/tissue identity or cell status associated to MS. Considering the harmonic score obtained from these colocalization analyses, the top hits in EBNA2, EBNA3C, and AID involved lymphoid (CD19 + B cell lines and lymphomas; T regulatory cells; tonsils) and monocyte-macrophage lineages (peripheral macrophages; dendritic cells) from experiments included in the ENCODE, dbsuper, roadmapEpigenomics databases; however, also global collections of superenhancers/enhancers and brain resident lineages appeared far from the bottom-left corner of Fig. 2 (the control datasets) (Fig. 2A–C, see also Additional File: Table S6 and the online resource). Even though immune cells prevailed also in VDR top hits, a less stringent polarization was seen, somehow reflecting the wide-spreading actions of this transducer in human biology (Fig. 2D). However, with a more stringent cutoff of Harmonic Score > 40 that selects the most significant hits (Fig. Supplement 2), a core subset of MS-relevant cell lineages, shared across all four examined transducers, became evident (Additional File: Table S7).

A data resource for future research on transcriptional regulation in MS

A public web interface for browsing the results of our colocalization analysis is freely available at This is a comprehensive genomic atlas disentangling specific aspects of MS gene-environment interactions to support further research on transcriptional regulation in MS. It includes the whole list of results derived from ROI, DBRs and database matches (Fig. 3a) across all performed experiments that yielded significant results. The user can navigate across the results and perform tailored queries searching and filtering for a variety of parameters, including MS-associated variant, DBR, experimental cell type, other match details (see Fig. 3b for all available search and filter modalities). Moreover, personalized HS, p-value, support and Odd Ratio threshold can easily be set to screen results, that are readily displayed in tabular format. To provide an example, we select “AID, EBNA2, EBNA3C, VDR” in the ‘Matched DBR region (s)’ panel and obtain the list of MS-associated SNPs (that proved to be enriched in genomic regions plausibly coding for trRNA) targeted by all four transducers (Fig. 3b,c). Through this approach we searched for MS-associated regions shared by the DBRs analyzed, and we were able to prioritize 275 genomic regions (almost half of the MS-associated GWAS SNPs) capable of binding at least 2 molecular transducers. These regions are ‘hotspots’ of interactions between genetic and nongenetic modifier of MS risk/protection: all four proteins (VDR, AID, EBNA2, EBNA3C) proved to target 24 regions, 3 of them 115 regions, and 2 of them 136 regions. A detailed legend and more example queries may be found on the online data resource website.

Figure 3
figure 3

A comprehensive genomic atlas on gene-environment interactions regulating transcription in MS. (a) Searchable results at derive from the matches of GWAS MS regions, DNA binding regions of selected genomic transducers, and more than 4 million of regions annotated as plausible transient RNAs. (b) The user interface includes text panels and range sliders allowing extremely personalized queries, that combine statistical significance level (including Odd Ratio, pValue, support, and Harmonic Score), study source, SNP or reported gene, and so on. Filtered results are shown as tables ranked by HS, that can be saved, printed or shared through URL. In the example, the cursor selects ‘AID, EBNA2, EBNA3C, VDR’ in the ‘matched DBR region (s)’ panel looking for MS-associated SNPs (from the ROI, Additional File: Table S1) and their extensions at ± 50, 100, 200 kb that colocalized within DNA binding regions of the molecular transducers. The top hit represents the colocalization of the DBRs, a super-enhancer region derived from experiments on CD19 + B cells included in sedb, and the rs8007846 MS-associated SNP on chromosome 14. (c) The Venn diagram shows the number of non-redundant MS-associated SNPs derived from the query: for each transducer, SNPs are considered only once if present in more than one match. Intersections show the numbers of regions colocalizing with DBRs of multiple transducers. For instance: 8 regions colocalize with both EBNA2 and EBNA3C DBRs, but not with AID nor VDR DBRs; 24 regions colocalize with all four DBRs, and could be identified as regulatory “hotspots” in MS.

Finally, to obtain a functional mapping of MS-TrRNA regions, we attempted to identify MS-relevant genes by integrating our results with the ‘activity-by-contact’ (ABC) model (Fulco et al., 2019; Nasser et al., 2021), which was recently developed to define cell-specific, gene-enhancers connections according to chromatin conformation and accessibility, as well as to histone acethylation-methylation status. We retrieved a total of 77 gene-enhancers pairings (Additional File: Table S8), enriched in IL6-JAK-STAT3, IL-18, IL2RB pathways. Among these, we focused on MS variants-trRNA colocalization hotspots targeted by all four (AID, EBNA2, EBNA3C, VDR, n = 24) or three (AID, EBNA2, VDR, n = 60; see also Fig. 3c) molecular transducers, excluding EBNA3C, as it did not reach statistical significance in previous analysis (Table 3): ABC gene-enhancers connections were found for for 10 out of 84 hotspot SNPs, corresponding to 31 genes (Table 4 and Fig. 4). As expected from the pleiotropy of enhancer activity, many MS-trRNA hotspots were linked to multiple genes differentially regulated in distinct cell types: for example, the MS-trRNA hotspot in rs11026091 was linked to MRGPRE in T cells and MRGPRG-AS1 in B cells (see also Additional File: Table S9). Results included regulators of immune cell activity (MAP3K8, GIMAP8, TMEM176A, TMEM176B), ion channels and solute carriers (KCNH2, KCNMA1, SLC25A42), and transcriptional modulators (ICE2, SIN3B, NWD1).

Table 4 Activity-By-Contact (ABC) functional mapping of MS-trRNA hotspots bound by 4/4 MS-relevant transducers or only 3/4 (AID,EBNA2, VDR).
Figure 4
figure 4

ABC gene-enhancer mapping of MS-trRNAs regulatory hotspots. The figure displays the positional mapping of selected MS-trRNAs-DBRs hotspots, their colocalization with regions encoding for regulatory elements active in specific cell types (Mononuclear phagocytes, B cells, T cells, Other haematopoietic cells, epithelial cells, Other), and the enhancer-gene connections that determine the ABC mapping. The gene with the highest ABC score for each hotspot is highlighted in pink.

Moreover, in most cases, the ABC-identified genes differed from the candidate genes reported in MS GWAS, underscoring the relevance of integrative approaches to annotate statistical genomic associations.


Source link

Leave a Comment

Your email address will not be published.

%d bloggers like this: