<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Artif. Intell.</journal-id>
<journal-title>Frontiers in Artificial Intelligence</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Artif. Intell.</abbrev-journal-title>
<issn pub-type="epub">2624-8212</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/frai.2023.1237542</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artificial Intelligence</subject>
<subj-group>
<subject>Technology and Code</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Application note: TDbasedUFE and TDbasedUFEadv: bioconductor packages to perform tensor decomposition based unsupervised feature extraction</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Taguchi</surname> <given-names>Y-h.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/80793/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Turki</surname> <given-names>Turki</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/795479/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Physics, Chuo University</institution>, <addr-line>Tokyo</addr-line>, <country>Japan</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Computer Sciences, King Abdulaziz University</institution>, <addr-line>Jeddah</addr-line>, <country>Saudi Arabia</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Enrico Capobianco, Jackson Laboratory, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Varij Nayan, Central Institute for Research on Buffaloes (ICAR), India; Shigao Huang, Fourth Military Medical University, China</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Y-h. Taguchi <email>tag&#x00040;granular.com</email></corresp>
</author-notes>
<pub-date pub-type="epub">
<day>01</day>
<month>09</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>6</volume>
<elocation-id>1237542</elocation-id>
<history>
<date date-type="received">
<day>22</day>
<month>06</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>07</day>
<month>08</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2023 Taguchi and Turki.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Taguchi and Turki</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<sec>
<title>Motivation</title>
<p>Tensor decomposition (TD)-based unsupervised feature extraction (FE) has proven effective for a wide range of bioinformatics applications ranging from biomarker identification to the identification of disease-causing genes and drug repositioning. However, TD-based unsupervised FE failed to gain widespread acceptance due to the lack of user-friendly tools for non-experts.</p>
</sec>
<sec>
<title>Results</title>
<p>We developed two bioconductor packages&#x02014;TDbasedUFE and TDbasedUFEadv&#x02014;that enable researchers unfamiliar with TD to utilize TD-based unsupervised FE. The packages facilitate the identification of differentially expressed genes and multiomics analysis. TDbasedUFE was found to outperform two state-of-the-art methods, such as DESeq2 and DIABLO.</p>
</sec>
<sec>
<title>Availability and implementation</title>
<p>TDbasedUFE and TDbasedUFEadv are freely available as R/Bioconductor packages, which can be accessed at <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/TDbasedUFE">https://bioconductor.org/packages/TDbasedUFE</ext-link> and <ext-link ext-link-type="uri" xlink:href="https://bioconductor.org/packages/TDbasedUFEadv">https://bioconductor.org/packages/TDbasedUFEadv</ext-link>, respectively.</p>
</sec></abstract>
<kwd-group>
<kwd>tensor decomposition</kwd>
<kwd>feature selection</kwd>
<kwd>unsupervised learning</kwd>
<kwd>gene expression</kwd>
<kwd>multiomics</kwd>
</kwd-group>
<counts>
<fig-count count="1"/>
<table-count count="0"/>
<equation-count count="6"/>
<ref-count count="19"/>
<page-count count="5"/>
<word-count count="3165"/>
</counts>
<custom-meta-wrap>
<custom-meta>
<meta-name>section-at-acceptance</meta-name>
<meta-value>Medicine and Public Health</meta-value>
</custom-meta>
</custom-meta-wrap>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Tensor decomposition (TD)-based unsupervised feature extraction (FE) has been successfully applied to a wide range of problems (Taguchi, <xref ref-type="bibr" rid="B9">2020</xref>) since it was introduced several years ago (Taguchi, <xref ref-type="bibr" rid="B8">2017</xref>). Despite its success, the method failed to gain widespread acceptance, possibly due to the lack of practical tools to perform TD. To address this end, we have developed two bioconductor packages, TDbasedUFE and TDbasedUFEadv, which allow researchers to perform TD-based unsupervised FE easily without the need of detailed knowledge of TD. The purpose of this manuscript is not to demonstrate the superiority over the other methods, since the superiority over the other methods has already been demonstrated in numerous studies cited below. The purpose of this manuscript is to simply inform about the implementation of the established method into easy-to-use environment.</p>
</sec>
<sec sec-type="methods" id="s2">
<title>2. Methods</title>
<p>TD-based unsupervised FE (Taguchi, <xref ref-type="bibr" rid="B8">2017</xref>) was derived from principal component analysis (PCA)-based unsupervised FE (Taguchi and Murakami, <xref ref-type="bibr" rid="B10">2013</xref>), which was introduced 10 years ago. As datasets grew in complexity and began to include multiple measurement conditions, such as comparisons of multiple tissues from human subjects rather than just those from human patients restricted to a single tissue, tensors were employed instead of matrices. Tensors, which can have multiple indices, each of which can have multiple comparison criteria, better accomodate complex data structures. For example, a three mode tensor <italic>x</italic><sub><italic>ijk</italic></sub> can naturally store the expression of <italic>i</italic>th gene at <italic>k</italic>th tissue of <italic>j</italic>th human subjects. In contrast, matrices with only two indices corresponding to rows and columns require combining the tissue index and the human index into a single column, rendering data interpretation challenging.</p>
<p>TDbasedUFE and TDbasedUFEadv are user-friendly packages that allow individuals who are unfamiliar with tensors to perform unsupervised feature extraction. Since a matrix can be considered as a two-mode tensor, these packages can also be used to apply PCA-based unsupervised FE to the dataset. TDbasedUFE focuses on two popular functions developed for TD-based unsupervised FE, including the identification of differentially expressed genes (DEGs) and multiomics analyses. For the DEG identification, the basic algorithm is based on a recent study (Taguchi and Turki, <xref ref-type="bibr" rid="B14">2022b</xref>) that established a new standard deviation (SD) optimization approach. For multiomics analysis, the basic algorithm is based on the same study (Taguchi and Turki, <xref ref-type="bibr" rid="B15">2022c</xref>). However, TDbasedUFE also incorporates SD optimization, which was not available when the study was published. Although the algorithm is not specifically designed for DNA methylation profiles, we found that the approach described in the study (Taguchi and Turki, <xref ref-type="bibr" rid="B14">2022b</xref>) is also applicable to DNA methylation profiles (Taguchi and Turki, <xref ref-type="bibr" rid="B16">2023</xref>). In this regard, any type of differential analysis on single omics data can be performed by functions implemented in TDbasedUFE. In fact, we have shown (Turki et al., <xref ref-type="bibr" rid="B17">2023</xref>) that histone modification profiles can be analyzed using the algorithm described in the study (Taguchi and Turki, <xref ref-type="bibr" rid="B14">2022b</xref>).</p>
<p>TDbasedUFE and TDbasedUFEadv accept a multiple omics profile dataset formatted as a tensor. TD is applied on this dataset using Tucker decomposition based on higher order singular value decomposition (HOSVD) (Taguchi, <xref ref-type="bibr" rid="B9">2020</xref>) algorithm. For instance, if <inline-formula><mml:math id="M1"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>M</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> represents the gene expression of <italic>i</italic>th gene of <italic>j</italic>th human subject&#x00027;s <italic>k</italic>th tissue (<xref ref-type="fig" rid="F1">Figure 1</xref> left), TD is applied to <italic>x</italic><sub><italic>ijk</italic></sub>, and the following equation is obtained:</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M2"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mi>G</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>G</italic> &#x02208; &#x0211D;<sup><italic>N</italic>&#x000D7;<italic>M</italic>&#x000D7;<italic>K</italic></sup> is a core tensor that represents the weight of the product <italic>u</italic><sub>&#x02113;<sub>1</sub><italic>i</italic></sub><italic>u</italic><sub>&#x02113;<sub>2</sub><italic>j</italic></sub><italic>u</italic><sub>&#x02113;<sub>3</sub><italic>k</italic></sub> to <italic>x</italic><sub><italic>ijk</italic></sub>, and <inline-formula><mml:math id="M3"><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>N</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula><mml:math id="M4"><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, and <inline-formula><mml:math id="M5"><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>K</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> are singular value matrices and orthogonal matrices. Initially, singular value vectors attributed to samples, <italic>u</italic><sub>&#x02113;<sub>2</sub><italic>j</italic></sub> and <italic>u</italic><sub>&#x02113;<sub>3</sub><italic>k</italic></sub>, are investigated to identify those of interest. For instance, <italic>u</italic><sub>&#x02113;<sub>2</sub><italic>j</italic></sub> represents the distinction between healthy controls and patients, and <italic>u</italic><sub>&#x02113;<sub>3</sub><italic>k</italic></sub> represents tissue specificity (e.g., expressed only in the heart). Then, the singular value vectors attributed to genes (i.e., features) <italic>u</italic><sub>&#x02113;<sub>1</sub><italic>i</italic></sub> that share <italic>G</italic> of the largest absolute value with the identified <italic>u</italic><sub>&#x02113;<sub>2</sub><italic>j</italic></sub> and <italic>u</italic><sub>&#x02113;<sub>3</sub><italic>k</italic></sub> are selected. Features (<italic>i</italic>s) with larger absolute values of <italic>u</italic><sub>&#x02113;<sub>1</sub><italic>i</italic></sub> are identified based on <italic>P</italic>-values computed by assuming that <italic>u</italic><sub>&#x02113;<sub>1</sub><italic>i</italic></sub> obeys a Gaussian distribution (null hypothesis) as follows:</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M6"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>&#x003C7;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>&#x0003E;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <inline-formula><mml:math id="M7"><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>&#x003C7;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>&#x0003E;</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> is the cumulative &#x003C7;<sup>2</sup> distribution where the argument is larger than <italic>x</italic>, and &#x003C3;<sub>&#x02113;<sub>1</sub></sub> is the optimized standard deviation such that <italic>u</italic><sub>&#x02113;<sub>1</sub><italic>i</italic></sub> obeys Gaussian distribution as much as possible (see Taguchi and Turki, <xref ref-type="bibr" rid="B14">2022b</xref> for more details about how to optimize &#x003C3;<sub>&#x02113;<sub>1</sub></sub>). Then <italic>P</italic><sub><italic>i</italic></sub>s are, then, adjusted using the Benjamini&#x02013;Hochberg criterion to consider multiple comparison correction. Finally, <italic>i</italic>s with adjusted <italic>P</italic><sub><italic>i</italic></sub> less than threshold value (typically, 0.01) are selected.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Schematic diagram that explains TD-based unsupervised FE. <bold>Left</bold>: DEG identification, (1) <italic>u</italic><sub>&#x02113;<sub>2</sub><italic>j</italic></sub> associated with the distinction between patients and healthy controls is selected. (2) <italic>u</italic><sub>&#x02113;<sub>3</sub><italic>k</italic></sub> associated with tissue specificity is selected. (3) <italic>G</italic>(&#x02113;<sub>1</sub>&#x02113;<sub>2</sub>&#x02113;<sub>3</sub>) is investigated with fixed &#x02113;<sub>2</sub> and &#x02113;<sub>3</sub>. (4) <italic>u</italic><sub>&#x02113;<sub>1</sub><italic>i</italic></sub> with <italic>G</italic> of the largest absolute value is selected. (5) <italic>i</italic>s (indicated in red) whose absolute values are significantly larger than expected are selected. <bold>Right</bold>: Multiomics analysis, (1) <italic>u</italic><sub>&#x02113;<sub>1</sub><italic>j</italic></sub> associated with the distinction between patients and healthy controls is selected. (2) <italic>u</italic><sub>&#x02113;<sub>1</sub><italic>i</italic></sub> is computed from <italic>u</italic><sub>&#x02113;<sub>1</sub><italic>j</italic></sub>. (3) <italic>i</italic>s (indicated in red) whose absolute values are significantly larger than expected are selected.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="frai-06-1237542-g0001.tif"/>
</fig>
<p>When TDbasedUFE is applied to multiomics datasets (<xref ref-type="fig" rid="F1">Figure 1</xref> right), the multiomics profiles are formatted as <inline-formula><mml:math id="M8"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x000D7;</mml:mo><mml:mi>M</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> (i.e., <italic>k</italic>th omics datasets are associated with as many as <italic>N</italic><sub><italic>k</italic></sub> features). The <italic>x</italic><sub><italic>i</italic><sub><italic>k</italic></sub><italic>j</italic></sub>s are multiplied with each other to obtain the following equation:</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M9"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:msup><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>M</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>HOSVD is, then, applied to <inline-formula><mml:math id="M10"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:msup><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> as follows:</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M11"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:msup><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mi>G</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>.</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>After identifying <italic>u</italic><sub>&#x02113;<sub>2</sub><italic>j</italic></sub> coincident with labels (e.g., patients and healthy control), singular value vectors attributed to individual features associated with <italic>k</italic>th omics are computed as follows:</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M12"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>Moreover, <italic>P</italic><sub><italic>i</italic><sub><italic>k</italic></sub></sub> is, then, computed as follows:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M13"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>&#x003C7;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>&#x0003E;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x003C3;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x02113;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>and <italic>i</italic><sub><italic>k</italic></sub>s associated with adjusted <italic>P</italic><sub><italic>i</italic><sub><italic>k</italic></sub></sub> less than 0.01 are selected.</p>
<p>In contrast to TDbasedUFE, which can perform only two tasks, TDbasedUFEadv can perform more complicated tasks. For example, TDbasedUFEadv can perform (Ng and Taguchi, <xref ref-type="bibr" rid="B5">2020</xref>) integrated analysis of two omics profiles that share samples and reduce the memory required by summing up the sample index. TDbasedUFEadv can also perform integrated analysis of two omics profiles that share features (Taguchi and Turki, <xref ref-type="bibr" rid="B11">2019</xref>). TDbasedUFEadv can also perform integrated analysis of multiple (more than two) omics profiles that shared features (Taguchi and Turki, <xref ref-type="bibr" rid="B13">2022a</xref>) or samples (Taguchi and Turki, <xref ref-type="bibr" rid="B12">2021</xref>).</p>
</sec>
<sec sec-type="results" id="s3">
<title>3. Results</title>
<p>The full list of identified features, as well as the results of the enrichment analysis in this section, is presented in <xref ref-type="supplementary-material" rid="SM1">Supplementary material</xref>. For further details, please also refer to the <xref ref-type="supplementary-material" rid="SM1">Supplementary Document</xref>.</p>
<p>Numerous applications of TD-based FE were proposed since the publication of our book (Taguchi, <xref ref-type="bibr" rid="B9">2020</xref>). Here, we present a few examples to demonstrate the usefulness of TDbasedUFE based on the ACC.rnaseq data from RTCGA.rneseq (Kosinski, <xref ref-type="bibr" rid="B3">2023</xref>) package in Bioconductor. The labels used to select singular value vectors attributed to samples were patient.stage_event.pathologic_stage composed of four classes (&#x0201C;stage i&#x0201D; to &#x0201C;stage iv&#x0201D;). A tensor <inline-formula><mml:math id="M14"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mn>9</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> represents the expression of <italic>i</italic>th gene of <italic>j</italic>th replicates of <italic>k</italic>th stage. HOSVD was applied to <italic>x</italic><sub><italic>ijk</italic></sub>, and we obtained TD, as shown in Equation (1) (please refer to the <xref ref-type="supplementary-material" rid="SM1">Supplementary Document</xref> for the R code to perform DEG identification using TDbasedUFE). Since <italic>u</italic><sub>&#x02113;<sub>2</sub><italic>j</italic></sub> is attributed to replicates, <italic>u</italic><sub>&#x02113;<sub>2</sub><italic>j</italic></sub> is expected to have constant values, regardless of how <italic>j</italic> and &#x02113;<sub>2</sub> &#x0003D; 1 turned out to satisfy this requirement (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S1</xref> left). On the other hand, <italic>u</italic><sub>&#x02113;<sub>3</sub><italic>k</italic></sub> is expected to have monotonic dependence on <italic>k</italic> (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S1</xref> right); and we found that &#x02113;<sub>3</sub> &#x0003D; 3 was most coincident with monotonic dependence on <italic>k</italic>. Once &#x02113;<sub>2</sub> and &#x02113;<sub>3</sub> are selected by the user with the interactive interface, TDbasedUFE automatically selects <italic>u</italic><sub>&#x02113;<sub>1</sub><italic>i</italic></sub> with which <italic>i</italic>s are selected. As a result, 1,692 genes were selected with the threshold-adjusted <italic>P</italic>-value of 0.01.</p>
<p>To evaluate the ability of TDbasedUFE to select genes, we applied DESeq2 (Love et al., <xref ref-type="bibr" rid="B4">2014</xref>), a state-of-the-art method, on <italic>x</italic><sub><italic>ijk</italic></sub>. DESeq2 is not applied to <italic>x</italic><sub><italic>ijk</italic></sub> but to the unfolded matrix <inline-formula><mml:math id="M15"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mi>k</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mn>36</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> where <italic>j</italic> and <italic>k</italic> are merged into a column index (see the <xref ref-type="supplementary-material" rid="SM1">Supplementary Document</xref> for the R code to perform DEG identification using DESeq2). We identified as few as 138 genes associated with adjusted <italic>P</italic>-values less than 0.01 using DESeq2. Thus, from the perspective of the number of identified DEGs, TDbasedUFE is clearly superior to DESeq2.</p>
<p>However, identifying a higher number of DEGs does not necessarily mean that all of the identified DEGs are biologically relevant. To evaluate the biological relevance of the DEGs selected by TDbasedUFE, we used the enrichR (Jawaid, <xref ref-type="bibr" rid="B2">2023</xref>) package in CRAN, as demonstrated in the vignette &#x0201C;Enrichment&#x0201D; in the TDbasedUFEadv package considering the &#x0201C;KEGG 2021 HUMAN,&#x0201D; &#x0201C;GO Molecular Function 2015,&#x0201D; &#x0201C;GO Cellular Component 2015,&#x0201D; and &#x0201C;GO Biological Process 2015&#x0201D; categories. When 1,692 genes selected by TDbasedUFE are considered, 129, 151, 143, and 923 terms were found to be associated with adjusted <italic>P</italic>-values less than 0.05 for the &#x0201C;KEGG 2021 HUMAN,&#x0201D; &#x0201C;GO Molecular Function 2015,&#x0201D; &#x0201C;GO Cellular Component 2015,&#x0201D; and &#x0201C;GO Biological Process 2015&#x0201D; categories, respectively. On the other hand, when 138 genes selected by DESeq2 are considered, 0, 0, 3, and 12 terms are associated with adjusted <italic>P</italic> &#x0003C; 0.05 for the same categories. Thus, in terms of the number of biologically relevant terms identified, TDbasedUFE outperforms DESeq2.</p>
<p>To demonstrate the capabilities of TDbasedUFE on a multiomics dataset, we used the curatedTCGA (Ramos et al., <xref ref-type="bibr" rid="B6">2020</xref>) package to retrieve profiles other than the gene expression of the ACC dataset in TCGA (please refer to the <xref ref-type="supplementary-material" rid="SM1">Supplementary Document</xref> for the R code to perform DEG identification using TDbasedUFE). We have collected miRNA (<inline-formula><mml:math id="M16"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mn>1046</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>79</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>), gene expression (<inline-formula><mml:math id="M17"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mn>120501</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>79</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>), and methylation data(<inline-formula><mml:math id="M18"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mn>48577</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>79</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>) from curatedTCGA and applied TDbasedUFE on these data. After applying HOSVD to the generated tensor <inline-formula><mml:math id="M19"><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:msup><mml:mrow><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x02032;</mml:mi></mml:mrow></mml:msup><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x02208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0211D;</mml:mi></mml:mrow><mml:mrow><mml:mn>79</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>79</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, we found that <italic>u</italic><sub>7<italic>j</italic></sub> (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S2</xref> upper) is associated with the distinction between four stages, and <italic>u</italic><sub>1<italic>k</italic></sub> (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S2</xref> lower) is constant regardless of <italic>k</italic> (i.e., omics). <italic>P</italic><sub><italic>i</italic><sub><italic>k</italic></sub></sub> is attributed to <italic>i</italic><sub><italic>k</italic></sub> by Equation (6) using <italic>u</italic><sub>7<italic>i</italic><sub><italic>k</italic></sub></sub> generated from <italic>u</italic><sub>7<italic>j</italic></sub> by Equation (5). After correcting <italic>P</italic><sub><italic>i</italic><sub><italic>k</italic></sub></sub>, we found that 23 out of 1,046 miRNAs, 1,016 out of 20,501 mRNAs, and 7,295 out of 485,577 methylation probes are associated with adjusted <italic>P</italic><sub><italic>i</italic><sub><italic>k</italic></sub></sub> less than 0.01 (these features are expected to be distinct between the four stages as well).</p>
<p>To compare the performence of TDbasedUFE with those of SOTA methods, we employed DIABLO, which is implemented in the mixomics package (Rohart et al., <xref ref-type="bibr" rid="B7">2017</xref>) in Bioconductor (please refer to the <xref ref-type="supplementary-material" rid="SM1">Supplementary Document</xref> for the R code to perform mulitiomics analysis using DIABLO). Even we used the minimum setup (folds=2, nrepeat=1), DIABLO failed to converge to a solution within 3 h. When the recommended setup in the vignette (folds=10, nrepeat=10) was employed, DIABLO did not converge to the solution with few enough errors up to 10 components (ncomp=10) and showed no tendency for errors to decrease as the number of components increased (<xref ref-type="supplementary-material" rid="SM1">Supplementary Figure S3</xref>). As a result, we were unable to select features using DIABLO and had to conclude that TDbasedUFE outperformed DIABLO for this multiomics dataset.</p>
<p>To evaluate the biological relevance of miRNAs, mRNAs, and methylation probes identified by TDbasedUFE, we have uploaded these to various databases. First, we uploaded the identified miRNAs to DIANA-mirpath v3.0 (Vlachos et al., <xref ref-type="bibr" rid="B18">2015</xref>) and found that many cancer-related KEGG pathways are enriched (please refer to the <xref ref-type="supplementary-material" rid="SM1">Supplementary Document</xref> for URL to DIANA-mirpath using these miRNAs). Next, we uploaded the identified mRNAs to Enrichr (Xie et al., <xref ref-type="bibr" rid="B19">2021</xref>) and found many cancer-related pathways in the &#x0201C;KEGG 2021 Human&#x0201D; categories and various cancer cell lines. Finally, we uploaded 2,668 unique gene symbols associated with the identified 7,295 probes to Enrichr and found several cancer-related pathways in &#x0201C;KEGG 2021 Human&#x0201D; and various cancer cell lines. In conclusion, the miRNAs, mRNAs, and methylation probes identified by TDbasedUFE are biologically relevant.</p>
</sec>
<sec sec-type="discussion" id="s4">
<title>4. Discussion</title>
<p>Here, we have introduced TDbasedUFE and TDbasedUFEadv, two packages that can perform TD-based unsupervised FE without requiring extensive knowledge of tensor decompositions. Our results demonstrated that these packages outperform two SOTA methods, DESeq2 and DIABLO, when applied for DEG identification and multiomics analysis, respectively. With TDbasedUFE and TDbasedUFEadv, users can perform TD-based unsupervised FE easily and effectively.</p>
<p>In this implementation, TDbasedUFE/TDbasedUFEadv can accept variety of datasets generated from high throughput sequencing and/or old-fashioned microarray seamlessly. TDbasedUFE/TDbasedUFEadv can also accept the various combinations of these profiles as inputs (multiomics analysis). TDbasedUFE/TDbasedUFEadv can output the list of features associated with (adjusted) <italic>P</italic>-values. The possible output features are dependent on the input features. When genes are input, the output features are also genes. When genomic regions are input, the output features are also genomic regions. The list of features can be analyzed with enrichment analysis to understand biological meanings within the downstream analyses.</p>
<p>Current implementation does not have specific limitation since the implemented methods have already been tested over various topics in the numerous previous publications cited in this study. There are no future directions since it is a report to inform the implementation of established method.</p>
<p>As for other unsupervised gene selection methods, readers might check the review article Ang et al. (<xref ref-type="bibr" rid="B1">2016</xref>), although it listed as small as fifteen studies ranging from 2006 to 2012, which is relatively small compared with the number of our publications cited in this paper.</p>
</sec>
<sec sec-type="data-availability" id="s5">
<title>Data availability statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.18129/B9.bioc.TDbasedUFE">https://doi.org/10.18129/B9.bioc.TDbasedUFE</ext-link>; <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.18129/B9.bioc.TDbasedUFEadv">https://doi.org/10.18129/B9.bioc.TDbasedUFEadv</ext-link>.</p>
</sec>
<sec sec-type="author-contributions" id="s6">
<title>Author contributions</title>
<p>Y-hT and TT wrote an original manuscript, reviewed the manuscript, and validated the results. Y-hT has developed the package and performed analysis. All authors contributed to the article and approved the submitted version.</p>
</sec>
</body>
<back>
<sec sec-type="funding-information" id="s7">
<title>Funding</title>
<p>This study was supported in part by funds from the Chuo University (TOKUTEI KADAI KENKYU).</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s8">
<title>Publisher&#x00027;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec sec-type="supplementary-material" id="s9">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/frai.2023.1237542/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/frai.2023.1237542/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.ZIP" id="SM1" mimetype="application/zip" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ang</surname> <given-names>J. C.</given-names></name> <name><surname>Mirzal</surname> <given-names>A.</given-names></name> <name><surname>Haron</surname> <given-names>H.</given-names></name> <name><surname>Hamed</surname> <given-names>H. N. A.</given-names></name></person-group> (<year>2016</year>). <article-title>Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection</article-title>. <source>IEEE/ACM Trans. Comput. Biol. Bioinform</source>. <volume>13</volume>, <fpage>971</fpage>&#x02013;<lpage>989</lpage>. <pub-id pub-id-type="doi">10.1109/TCBB.2015.2478454</pub-id><pub-id pub-id-type="pmid">26390495</pub-id></citation></ref>
<ref id="B2">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Jawid</surname> <given-names>W.</given-names></name></person-group> (<year>2023</year>). <article-title>enrichR: Provides an R Interface to &#x02018;Enrichr&#x02019;</article-title>, in <source>R Package Version 3.2</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="https://cran.r-project.org/web/packages/enrichR/">https://cran.r-project.org/web/packages/enrichR/</ext-link></citation>
</ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kosinski</surname> <given-names>M.</given-names></name></person-group> (<year>2023</year>). <article-title>RTCGA.rnaseq: RNA-seq datasets from the cancer genome atlas project</article-title>, in <source>R Package Version 20151101.30.30</source>. <pub-id pub-id-type="doi">10.18129/B9.bioc.RTCGA.rnaseq</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Love</surname> <given-names>M. I.</given-names></name> <name><surname>Huber</surname> <given-names>W.</given-names></name> <name><surname>Anders</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2</article-title>. <source>Genome Biol</source>. <volume>15</volume>, <fpage>550</fpage>. <pub-id pub-id-type="doi">10.1186/s13059-014-0550-8</pub-id><pub-id pub-id-type="pmid">25516281</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ng</surname> <given-names>K.-L.</given-names></name> <name><surname>Taguchi</surname> <given-names>Y. H.</given-names></name></person-group> (<year>2020</year>). <article-title>Identification of miRNA signatures for kidney renal clear cell carcinoma using the tensor-decomposition method</article-title>. <source>Sci. Rep</source>. <volume>10</volume>, <fpage>15149</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-020-71997-6</pub-id><pub-id pub-id-type="pmid">32938959</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ramos</surname> <given-names>M.</given-names></name> <name><surname>Geistlinger</surname> <given-names>L.</given-names></name> <name><surname>Oh</surname> <given-names>S.</given-names></name> <name><surname>Schiffer</surname> <given-names>L.</given-names></name> <name><surname>Azhar</surname> <given-names>R.</given-names></name> <name><surname>Kodali</surname> <given-names>H.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Multiomic integration of public oncology databases in bioconductor</article-title>. <source>JCO Clin. Cancer Inform</source>. <volume>4</volume>, <fpage>958</fpage>&#x02013;<lpage>971</lpage>. <pub-id pub-id-type="doi">10.1200/CCI.19.00119</pub-id><pub-id pub-id-type="pmid">33119407</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rohart</surname> <given-names>F.</given-names></name> <name><surname>Gautier</surname> <given-names>B.</given-names></name> <name><surname>Singh</surname> <given-names>A.</given-names></name> <name><surname>L&#x000EA;Cao</surname> <given-names>K.-A.</given-names></name></person-group> (<year>2017</year>). <article-title>mixOmics: an R package for &#x000F3;mics feature selection and multiple data integration</article-title>. <source>PLoS Comput. Biol</source>. <volume>13</volume>, <fpage>1</fpage>&#x02013;<lpage>19</lpage>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1005752</pub-id><pub-id pub-id-type="pmid">29099853</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taguchi</surname> <given-names>Y.-H.</given-names></name></person-group> (<year>2017</year>). <article-title>Tensor decomposition-based unsupervised feature extraction applied to matrix products for multi-view data processing</article-title>. <source>PLoS ONE</source> <volume>12</volume>, <fpage>1</fpage>&#x02013;<lpage>36</lpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0183933</pub-id><pub-id pub-id-type="pmid">30020990</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Taguchi</surname> <given-names>Y.-H.</given-names></name></person-group> (<year>2020</year>). <source>Unsupervised Feature Extraction Applied to Bioinformatics</source>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>.</citation>
</ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taguchi</surname> <given-names>Y.-H.</given-names></name> <name><surname>Murakami</surname> <given-names>Y.</given-names></name></person-group> (<year>2013</year>). <article-title>Principal component analysis based feature extraction approach to identify circulating microRNA biomarkers</article-title>. <source>PLoS ONE</source> <volume>8</volume>, <fpage>1</fpage>&#x02013;<lpage>12</lpage>. <pub-id pub-id-type="doi">10.1371/journal.pone.0066714</pub-id><pub-id pub-id-type="pmid">23874370</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taguchi</surname> <given-names>Y.-H.</given-names></name> <name><surname>Turki</surname> <given-names>T.</given-names></name></person-group> (<year>2019</year>). <article-title>Tensor decomposition-based unsupervised feature extraction applied to single-cell gene expression analysis</article-title>. <source>Front. Genet</source>. <fpage>10</fpage>. <pub-id pub-id-type="doi">10.3389/fgene.2019.00864</pub-id><pub-id pub-id-type="pmid">31608111</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taguchi</surname> <given-names>Y.-H.</given-names></name> <name><surname>Turki</surname> <given-names>T.</given-names></name></person-group> (<year>2021</year>). <article-title>Tensor-decomposition-based unsupervised feature extraction in single-cell multiomics data analysis</article-title>. <source>Genes</source> <volume>12</volume>, <fpage>9</fpage>. <pub-id pub-id-type="doi">10.3390/genes12091442</pub-id><pub-id pub-id-type="pmid">34573424</pub-id></citation></ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taguchi</surname> <given-names>Y.-H.</given-names></name> <name><surname>Turki</surname> <given-names>T.</given-names></name></person-group> (<year>2022a</year>). <article-title>A tensor decomposition-based integrated analysis applicable to multiple gene expression profiles without sample matching</article-title>. <source>Sci. Rep</source>. <volume>12</volume>, <fpage>21242</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-022-25524-4</pub-id><pub-id pub-id-type="pmid">36481877</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taguchi</surname> <given-names>Y.-H.</given-names></name> <name><surname>Turki</surname> <given-names>T.</given-names></name></person-group> (<year>2022b</year>). <article-title>Adapted tensor decomposition and PCA based unsupervised feature extraction select more biologically reasonable differentially expressed genes than conventional methods</article-title>. <source>Sci. Rep</source>. <volume>12</volume>, <fpage>17438</fpage>. <pub-id pub-id-type="doi">10.1038/s41598-022-21474-z</pub-id><pub-id pub-id-type="pmid">36261574</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taguchi</surname> <given-names>Y.-H.</given-names></name> <name><surname>Turki</surname> <given-names>T.</given-names></name></person-group> (<year>2022c</year>). <article-title>Novel feature selection method via kernel tensor decomposition for improved multi-omics data analysis</article-title>. <source>BMC Med. Genomics</source> <volume>15</volume>, <fpage>37</fpage>. <pub-id pub-id-type="doi">10.1186/s12920-022-01181-4</pub-id><pub-id pub-id-type="pmid">35209912</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Taguchi</surname> <given-names>Y.-H.</given-names></name> <name><surname>Turki</surname> <given-names>T.</given-names></name></person-group> (<year>2023</year>). <article-title>Principal component analysis- and tensor decomposition-based unsupervised feature extraction to select more suitable differentially methylated cytosines: optimization of standard deviation versus state-of-the-art methods</article-title>. <source>Genomics</source> <volume>115</volume>, <fpage>110577</fpage>. <pub-id pub-id-type="doi">10.1016/j.ygeno.2023.110577</pub-id><pub-id pub-id-type="pmid">36804268</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Turki</surname> <given-names>T.</given-names></name> <name><surname>Roy</surname> <given-names>S. S.</given-names></name> <name><surname>Taguchi</surname> <given-names>Y.-H.</given-names></name></person-group> (<year>2023</year>). <article-title>Optimized Tensor Decomposition and PCA Outperforming State-of-the-Art Methods When Analyzing Histone Modification ChIP-seq Profiles</article-title>. <source>Algorithm</source>. <volume>16</volume>, <fpage>401</fpage>. <pub-id pub-id-type="doi">10.3390/a16090401</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vlachos</surname> <given-names>I. S.</given-names></name> <name><surname>Zagganas</surname> <given-names>K.</given-names></name> <name><surname>Paraskevopoulou</surname> <given-names>M. D.</given-names></name> <name><surname>Georgakilas</surname> <given-names>G.</given-names></name> <name><surname>Karagkouni</surname> <given-names>D.</given-names></name> <name><surname>Vergoulis</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2015</year>). <article-title>DIANA-miRPath v3.0: deciphering microRNA function with experimental support</article-title>. <source>Nucleic Acids Res</source>. <volume>43</volume>, <fpage>W460</fpage>&#x02013;<lpage>W466</lpage>. <pub-id pub-id-type="doi">10.1093/nar/gkv403</pub-id><pub-id pub-id-type="pmid">25977294</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname> <given-names>Z.</given-names></name> <name><surname>Bailey</surname> <given-names>A.</given-names></name> <name><surname>Kuleshov</surname> <given-names>M. V.</given-names></name> <name><surname>Clarke</surname> <given-names>D. J. B.</given-names></name> <name><surname>Evangelista</surname> <given-names>J. E.</given-names></name> <name><surname>Jenkins</surname> <given-names>S. L.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Gene set knowledge discovery with Enrichr</article-title>. <source>Curr. Protoc</source>. <volume>1</volume>, <fpage>e90</fpage>. <pub-id pub-id-type="doi">10.1002/cpz1.90</pub-id><pub-id pub-id-type="pmid">33780170</pub-id></citation></ref>
</ref-list> 
</back>
</article> 