Multiple sclerosis (MS) is one of the most common autoimmune diseases which is commonly diagnosed and monitored using magnetic resonance imaging (MRI) with a combination of clinical manifestations. The purpose of this review is to highlight the main applications of Machine Learning (ML) models and their performance in the MS field using MRI. We reviewed the articles of the last decade and grouped them based on the applications of ML in MS using MRI data into four categories: 1) Automated diagnosis of MS, 2) Prediction of MS disease progression, 3) Differentiation of MS stages, 4) Differentiation of MS from similar disorders.
Background: 16S sequencing results are often used for Machine Learning (ML) tasks. 16S gene sequences are represented as feature counts, which are associated with taxonomic representation. Raw feature counts may not be the optimal representation for ML.
Methods: We checked multiple preprocessing steps and tested the optimal combination for 16S sequencing-based classification tasks. We computed the contribution of each step to the accuracy as measured by the Area Under Curve (AUC) of the classification.
Results: We show that the log of the feature counts is much more informative than the relative counts. We further show that merging features associated with the same taxonomy at a given level, through a dimension reduction step for each group of bacteria improves the AUC. Finally, we show that z-scoring has a very limited effect on the results.
Conclusions: The prepossessing of microbiome 16S data is crucial for optimal microbiome based Machine Learning. These preprocessing steps are integrated into the MIPMLP - Microbiome Preprocessing Machine Learning Pipeline, which is available as a stand-alone version at: https://github.com/louzounlab/microbiome/tree/master/Preprocess or as a service at http://mip-mlp.math.biu.ac.il/Home Both contain the code, and standard test sets.
There is an urgent need to identify biomarkers for diagnosis and disease activity monitoring in rheumatoid arthritis (RA). We leveraged publicly available microarray gene expression data in the NCBI GEO database for whole blood (N=1,885) and synovial (N=284) tissues from RA patients and healthy controls. We developed a robust machine learning feature selection pipeline with validation on five independent datasets culminating in 13 genes: TNFAIP6, S100A8, TNFSF10, DRAM1, LY96, QPCT, KYNU, ENTPD1, CLIC1, ATP6V0E1, HSP90AB1, NCL and CIRBP which define the RA score and demonstrate its clinical utility: the score tracks the disease activity DAS28 (p = 7e-9), distinguishes osteoarthritis (OA) from RA (OR 0.57, p = 8e-10) and polyJIA from healthy controls (OR 1.15, p = 2e-4) and monitors treatment effect in RA (p = 2e-4). Finally, the immunoblotting analysis of six proteins on an independent cohort confirmed two proteins, TNFAIP6/TSG6 and HSP90AB1/HSP90.
Purpose: To explore the intrinsic functional connectivity (FC) alteration of the primary visual cortex (V1) between individuals with iridocyclitis and healthy controls (HCs) by the resting-state functional magnetic resonance imaging (fMRI) technique, and to investigate whether FC findings be used to differentiate patients with iridocyclitis from HCs.
Methods: Twenty-six patients with iridocyclitis and twenty-eight well-matched HCs were recruited in our study and underwent resting-state fMRI examinations. The fMRI data were analyzed by Statistical Parametric Mapping (SPM12), Data Processing and Analysis for Brain Imaging (DPABI), and Resting State fMRI Data Analysis Toolkit (REST) software. Differences in FC signal values of the V1 between the individuals with iridocyclitis and HCs were compared using independent two-sample t-tests. Significant differences in FC between two groups were chosen as classification features for distinguishing individuals with iridocyclitis from HCs using a support vector machine (SVM) classifier that involved machine learning. Classifier performance was evaluated using permutation test analysis.
Results: Compared with HCs, patients with iridocyclitis displayed significantly increased FC between the left V1 and left cerebellum crus1, left cerebellum 10, bilateral inferior temporal gyrus, right hippocampus, and left superior occipital gyrus. Moreover, patients with iridocyclitis displayed significantly lower FC between the left V1 and both the bilateral calcarine and bilateral postcentral gyrus. Patients with iridocyclitis also exhibited significantly higher FC values between the right V1 and left cerebellum crus1, bilateral thalamus, and left middle temporal gyrus; while they displayed significantly lower FC between the right V1 and both the bilateral calcarine and bilateral postcentral gyrus (voxel-level P<0.01, Gaussian random field correction, cluster-level P<0.05). Our results showed that 63.46% of the participants were correctly classified using the leave-one-out cross-validation technique with an SVM classifier based on the FC of the left V1; and 67.31% of the participants were correctly classified based on the FC of the right V1 (P<0.001, non-parametric permutation test).
Conclusion: Patients with iridocyclitis displayed significantly disturbed FC between the V1 and various brain regions, including vision-related, somatosensory, and cognition-related regions. The FC variability could distinguish patients with iridocyclitis from HCs with substantial accuracy. These findings may aid in identifying the potential neurological mechanisms of impaired visual function in individuals with iridocyclitis.
Systemic lupus erythematosus (SLE) is a chronic autoimmune disease characterized by the production of autoantibodies predominantly to nuclear material. Many aspects of disease pathology are mediated by the deposition of nucleic acid containing immune complexes, which also induce the type 1interferon response, a characteristic feature of SLE. Notably, SLE is remarkably heterogeneous, with a variety of organs involved in different individuals, who also show variation in disease severity related to their ancestries. Here, we probed one potential contribution to disease heterogeneity as well as a possible source of immunoreactive nucleic acids by exploring the expression of human endogenous retroviruses (HERVs). We investigated the expression of HERVs in SLE and their potential relationship to SLE features and the expression of biochemical pathways, including the interferon gene signature (IGS). Towards this goal, we analyzed available and new RNA-Seq data from two independent whole blood studies using Telescope. We identified 481 locus specific HERV encoding regions that are differentially expressed between case and control individuals with only 14% overlap of differentially expressed HERVs between these two datasets. We identified significant differences between differentially expressed HERVs and non-differentially expressed HERVs between the two datasets. We also characterized the host differentially expressed genes and tested their association with the differentially expressed HERVs. We found that differentially expressed HERVs were significantly more physically proximal to host differentially expressed genes than non-differentially expressed HERVs. Finally, we capitalized on locus specific resolution of HERV mapping to identify key molecular pathways impacted by differential HERV expression in people with SLE.
Cystatin F encoded by CST7 is a cysteine peptidase inhibitor known to be expressed in natural killer (NK) and CD8+ T cells during steady-state conditions. However, little is known about its expression during inflammatory disease states in humans. We have developed an analytic approach capable of not only identifying previously poorly characterized disease-associated genes but also defining regulatory mechanisms controlling their expression. By exploring multiple cohorts of public transcriptome data comprising 43 individual datasets, we showed that CST7 is upregulated in the blood during a diverse set of infectious and non-infectious inflammatory conditions. Interestingly, this upregulation of CST7 was neutrophil-specific, as its expression was unchanged in NK and CD8+ T cells during sepsis. Further analysis demonstrated that known microbial products or cytokines commonly associated with inflammation failed to increase CST7 expression, suggesting that its expression in neutrophils is induced by an endogenous serum factor commonly present in human inflammatory conditions. Overall, through the identification of CST7 upregulation as a marker of acute inflammation in humans, our study demonstrates the value of publicly available transcriptome data in knowledge generation and potential biomarker discovery.
Celiac disease (CeD) is a common autoimmune disorder caused by an abnormal immune response to dietary gluten proteins. The disease has high heritability. HLA is the major susceptibility factor, and the HLA effect is mediated via presentation of deamidated gluten peptides by disease-associated HLA-DQ variants to CD4+ T cells. In addition to gluten-specific CD4+ T cells the patients have antibodies to transglutaminase 2 (autoantigen) and deamidated gluten peptides. These disease-specific antibodies recognize defined epitopes and they display common usage of specific heavy and light chains across patients. Interactions between T cells and B cells are likely central in the pathogenesis, but how the repertoires of naïve T and B cells relate to the pathogenic effector cells is unexplored. To this end, we applied machine learning classification models to naïve B cell receptor (BCR) repertoires from CeD patients and healthy controls. Strikingly, we obtained a promising classification performance with an F1 score of 85%. Clusters of heavy and light chain sequences were inferred and used as features for the model, and signatures associated with the disease were then characterized. These signatures included amino acid (AA) 3-mers with distinct bio-physiochemical characteristics and enriched V and J genes. We found that CeD-associated clusters can be identified and that common motifs can be characterized from naïve BCR repertoires. The results may indicate a genetic influence by BCR encoding genes in CeD. Analysis of naïve BCRs as presented here may become an important part of assessing the risk of individuals to develop CeD. Our model demonstrates the potential of using BCR repertoires and in particular, naïve BCR repertoires, as disease susceptibility markers.
Although widely prevalent, Lyme disease is still under-diagnosed and misunderstood. Here we followed 73 acute Lyme disease patients and uninfected controls over a period of a year. At each visit, RNA-sequencing was applied to profile patients' peripheral blood mononuclear cells in addition to extensive clinical phenotyping. Based on the projection of the RNA-seq data into lower dimensions, we observe that the cases are separated from controls, and almost all cases never return to cluster with the controls over time. Enrichment analysis of the differentially expressed genes between clusters identifies up-regulation of immune response genes. This observation is also supported by deconvolution analysis to identify the changes in cell type composition due to Lyme disease infection. Importantly, we developed several machine learning classifiers that attempt to perform various Lyme disease classifications. We show that Lyme patients can be distinguished from the controls as well as from COVID-19 patients, but classification was not successful in distinguishing those patients with early Lyme disease cases that would advance to develop post-treatment persistent symptoms.
Within the last decade, numerous studies have demonstrated changes in the gut microbiome associated with specific autoimmune diseases. Due to differences in study design, data quality control, analysis and statistical methods, many results of these studies are inconsistent and incomparable. To better understand the relationship between the intestinal microbiome and autoimmunity, we have completed a comprehensive re-analysis of 42 studies focusing on the gut microbiome in 12 autoimmune diseases to identify a microbial signature predictive of multiple sclerosis (MS), inflammatory bowel disease (IBD), rheumatoid arthritis (RA) and general autoimmune disease using both 16S rRNA sequencing data and shotgun metagenomics data. To do this, we used four machine learning algorithms, random forest, eXtreme Gradient Boosting (XGBoost), ridge regression, and support vector machine with radial kernel and recursive feature elimination to rank disease predictive taxa comparing disease vs. healthy participants and pairwise comparisons of each disease. Comparing the performance of these models, we found the two tree-based methods, XGBoost and random forest, most capable of handling sparse multidimensional data, to consistently produce the best results. Through this modeling, we identified a number of taxa consistently identified as dysregulated in a general autoimmune disease model including Odoribacter, Lachnospiraceae Clostridium, and Mogibacteriaceae implicating all as potential factors connecting the gut microbiome to autoimmune response. Further, we computed pairwise comparison models to identify disease specific taxa signatures highlighting a role for Peptostreptococcaceae and Ruminococcaceae Gemmiger in IBD and Akkermansia, Butyricicoccus, and Mogibacteriaceae in MS. We then connected a subset of these taxa with potential metabolic alterations based on metagenomic/metabolomic correlation analysis, identifying 215 metabolites associated with autoimmunity-predictive taxa.