<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="review-article" dtd-version="2.3" xml:lang="EN">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Oncol.</journal-id>
<journal-title>Frontiers in Oncology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Oncol.</abbrev-journal-title>
<issn pub-type="epub">2234-943X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fonc.2022.960984</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Oncology</subject>
<subj-group>
<subject>Review</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Semi-supervised learning in cancer diagnostics</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name>
<surname>Eckardt</surname>
<given-names>Jan-Niklas</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="author-notes" rid="fn001">
<sup>*</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/1849076"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Bornh&#xe4;user</surname>
<given-names>Martin</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff3">
<sup>3</sup>
</xref>
<xref ref-type="aff" rid="aff4">
<sup>4</sup>
</xref>
<uri xlink:href="https://loop.frontiersin.org/people/667412"/>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Wendt</surname>
<given-names>Karsten</given-names>
</name>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
<xref ref-type="aff" rid="aff5">
<sup>5</sup>
</xref>
</contrib>
<contrib contrib-type="author">
<name>
<surname>Middeke</surname>
<given-names>Jan Moritz</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
<xref ref-type="aff" rid="aff2">
<sup>2</sup>
</xref>
</contrib>
</contrib-group>
<aff id="aff1">
<sup>1</sup>
<institution>Department of Internal Medicine I, University Hospital Carl Gustav Carus</institution>, <addr-line>Dresden</addr-line>, <country>Germany</country>
</aff>
<aff id="aff2">
<sup>2</sup>
<institution>Else Kr&#xf6;ner Fresenius Center for Digital Health, Technical University Dresden</institution>, <addr-line>Dresden</addr-line>, <country>Germany</country>
</aff>
<aff id="aff3">
<sup>3</sup>
<institution>German Consortium for Translational Cancer Research</institution>, <addr-line>Heidelberg</addr-line>, <country>Germany</country>
</aff>
<aff id="aff4">
<sup>4</sup>
<institution>National Center for Tumor Disease (NCT)</institution>, <addr-line>Dresden</addr-line>, <country>Germany</country>
</aff>
<aff id="aff5">
<sup>5</sup>
<institution>Institute of Software and Multimedia Technology, Technical University Dresden</institution>, <addr-line>Dresden</addr-line>, <country>Germany</country>
</aff>
<author-notes>
<fn fn-type="edited-by">
<p>Edited by: Humberto Rocha, University of Coimbra, Portugal</p>
</fn>
<fn fn-type="edited-by">
<p>Reviewed by: Niccol&#xf2; Marini, HES-SO Valais-Wallis, Switzerland; Alireza Sadeghian, Toronto Metropolitan University, Canada; Wenbin Chen, Southern Medical University, China</p>
</fn>
<fn fn-type="corresp" id="fn001">
<p>*Correspondence: Jan-Niklas Eckardt, <email xlink:href="mailto:jan-niklas.eckardt@uniklinikum-dresden.de">jan-niklas.eckardt@uniklinikum-dresden.de</email>
</p>
</fn>
<fn fn-type="other" id="fn002">
<p>This article was submitted to Cancer Imaging and Image-directed Interventions, a section of the journal Frontiers in Oncology</p>
</fn>
</author-notes>
<pub-date pub-type="epub">
<day>14</day>
<month>07</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>12</volume>
<elocation-id>960984</elocation-id>
<history>
<date date-type="received">
<day>03</day>
<month>06</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>24</day>
<month>06</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#xa9; 2022 Eckardt, Bornh&#xe4;user, Wendt and Middeke</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Eckardt, Bornh&#xe4;user, Wendt and Middeke</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
<p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p>
</license>
</permissions>
<abstract>
<p>In cancer diagnostics, a considerable amount of data is acquired during routine work-up. Recently, machine learning has been used to build classifiers that are tasked with cancer detection and aid in clinical decision-making. Most of these classifiers are based on supervised learning (SL) that needs time- and cost-intensive manual labeling of samples by medical experts for model training. Semi-supervised learning (SSL), however, works with only a fraction of labeled data by including unlabeled samples for information abstraction and thus can utilize the vast discrepancy between available labeled data and overall available data in cancer diagnostics. In this review, we provide a comprehensive overview of essential functionalities and assumptions of SSL and survey key studies with regard to cancer care differentiating between image-based and non-image-based applications. We highlight current state-of-the-art models in histopathology, radiology and radiotherapy, as well as genomics. Further, we discuss potential pitfalls in SSL study design such as discrepancies in data distributions and comparison to baseline SL models, and point out future directions for SSL in oncology. We believe well-designed SSL models to strongly contribute to computer-guided diagnostics in malignant disease by overcoming current hinderances in the form of sparse labeled and abundant unlabeled data.</p>
</abstract>
<kwd-group>
<kwd>semi-supervised learning</kwd>
<kwd>cancer</kwd>
<kwd>diagnostics</kwd>
<kwd>artificial intelligence</kwd>
<kwd>machine learning</kwd>
</kwd-group>
<contract-sponsor id="cn001">Deutsche Krebshilfe<named-content content-type="fundref-id">10.13039/501100005972</named-content>
</contract-sponsor>
<counts>
<fig-count count="2"/>
<table-count count="3"/>
<equation-count count="0"/>
<ref-count count="46"/>
<page-count count="10"/>
<word-count count="5321"/>
</counts>
</article-meta>
</front>
<body>
<sec id="s1" sec-type="intro">
<title>Introduction</title>
<p>In the daily routine of cancer diagnostics, an abundance of medical data in the form of images, health records and genetic assays are gathered. Potentially, these data can serve as training input for supervised machine learning classifiers, however, the availability of large-scale labeled datasets represents a substantial bottleneck that limits the advancement of supervised learning (SL) techniques for diagnostic purposes. As the currently most popular technique in ML-guided diagnostics, SL requires data with high-quality labels to train a classifier that is subsequently tested on previously unseen data and evaluated based on its hit-rate to accurately predict labels in a test set that is withheld from training. The major obstacle in this setting is the disparity between overall available data and available data with labels. The latter is the essential prerequisite for supervised learning, however, obtaining a sufficiently large set of labeled data is time- and cost-intensive, especially in highly specialized domains as cancer diagnostics. The discrepancy between an increasing number of cancer patients in an aging society and the receding physician workforce as well as the correspondingly ever-growing workload of radiologists, pathologists and oncologists poses a further constraint on the labeling process as their experience and knowledge is needed to provide high-quality labels. Still, time and resources for the generation of such large-scale labeled data sets is often missing (<xref ref-type="bibr" rid="B1">1</xref>, <xref ref-type="bibr" rid="B2">2</xref>). Therefore, strategies are needed that leverage the overall amount of available data while imposing manageable needs for labeling.</p>
<p>Conceptually, Semi-Supervised Learning (SSL) can be positioned at midway between Unsupervised Learning (UL), where no labels are provided and algorithms deconstruct patterns from unlabeled data e. g. for cluster analysis, and SL, where a classifier is trained on labeled data to correctly map labels to unseen data from the same distribution (<xref ref-type="bibr" rid="B3">3</xref>). Hence, SSL offers the opportunity to leverage the vast amounts of unlabeled medical data that are acquired in clinical routine to boost classification performance in a diagnostic setting without the need for fully-labeled extensive data sets. Nevertheless, there are critical assumptions for SSL to function properly and models have to be conceptualized and developed with diligence in order to actually provide a performance boost compared to SL models.</p>
<p>In this review, we aim to provide medical professionals with an outline of key concepts of SSL and how to apply it to medical data with a focus on oncology. First, we introduce main functionalities of SSL and delineate it from SL and UL. Subsequently, we provide an overview of SSL techniques applied to cancer diagnostics and care differentiating between image-based and non-image-based use-cases. Finally, we discuss pitfalls in SSL research design for medical applications and provide an outlook on future prospects.</p>
</sec>
<sec id="s2">
<title>What is semi-supervised learning?</title>
<p>The key concept to delineate SL, SSL and UL is the labeling process as well as whether at all and if so, how labeled data is being processed. Labeling refers to the process of attaching meaningful information for classification to raw data. One way to do this is to have experts, e. g. medical doctors, evaluate the raw data, e. g. medical images (<xref ref-type="bibr" rid="B4">4</xref>). For example, whole-slide images (WSI) of tumor tissue can be labeled by pathologists or chest CAT scans for potentially malignant lesions can be labeled by radiologists. Alternatively in SSL, a limited number of labels can be used to self-train an algorithm iteratively to attach labels to unlabeled raw data and subsequently train a classifier on these self-labeled data (<xref ref-type="bibr" rid="B5">5</xref>). Conceptually, these labeled data provide the basis for training SL algorithms (training stage) that are subsequently supposed to apply previously learned patterns to unseen data and assign correct labels (testing stage, <xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1A</bold>
</xref>) (<xref ref-type="bibr" rid="B6">6</xref>). UL on the other hand does not use any labeled data at all. In UL, unlabeled data is sorted according to inherent patterns that delineate different clusters (<xref ref-type="bibr" rid="B7">7</xref>), e. g. UL can identify patient clusters with co-occurring genetic variants (<xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1B</bold>
</xref>). SSL uses both labeled and unlabeled data in the sense that labeled data are used to train a classifier for a given use-case and the addition of unlabeled data is intended to leverage information gain and thus boost classification performance (<xref ref-type="fig" rid="f1">
<bold>Figure&#xa0;1C</bold>
</xref>) (<xref ref-type="bibr" rid="B8">8</xref>). It is therefore advantageous when a large dataset is available for which only a limited number of labels can be obtained, i. e. due to time or cost constraints as is usually the case for medical data.</p>
<fig id="f1" position="float">
<label>Figure&#xa0;1</label>
<caption>
<p>Inputs and Outputs of supervised, unsupervised and semi-supervised learning. In supervised learning <bold>(A)</bold> all data is labeled. Labels are used to train a classifier to map learned labels to previously unseen data. Unsupervised learning <bold>(B)</bold> does not use labels. Data is being clustered into groups based on inherent patterns. Semi-supervised learning <bold>(C)</bold> uses both labeled and unlabeled data. Labels are used to train a classifier which is augmented by unlabeled data of the same distribution to derive additional information in order to boost performance.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fonc-12-960984-g001.tif"/>
</fig>
<p>While the addition of unlabeled data can be advantageous, it can also cause issues with model performance leading to stagnation or even degradation if crucial assumptions of SSL design are not met (<xref ref-type="bibr" rid="B9">9</xref>). For SSL models to work robustly, it is necessary that the unlabeled data should contain information that is relevant for label prediction. Therefore, it is crucial that both labeled and unlabeled data follow the same distribution (<xref ref-type="bibr" rid="B10">10</xref>). For example, if a classifier is trained on labeled histopathological images of colorectal cancer, the unlabeled data should ideally encompass the same tumor entity, same staining procedure and same magnification. Hence, the algorithm can infer that two samples that are close to each other at the input level (according to their features) should also be close to each other at the output level, i. e. should receive the same labels (smoothness assumption) (<xref ref-type="bibr" rid="B8">8</xref>). If these high-dimensional data points at the input level are mapped to a lower dimension in Euclidean space, they are usually clustered along low-dimensional structures, so-called manifolds. Data points that lie on the same manifold should therefore be of the same class (<xref ref-type="bibr" rid="B8">8</xref>). If both previous assumptions &#x2013; inputs with similar feature vectors will be close to each other in an <italic>n</italic>-dimensional feature space and be located on the same manifold if mapped to a lower dimensional space - are true, the decision boundary for a classifier should then lie in an area with low density, i. e. where data points are separate and of different classes (<xref ref-type="bibr" rid="B8">8</xref>). Thus, the inclusion of unlabeled data (as long as it is from the same distribution as labeled data) can improve the designation of the decision boundary and therefore boost classification performance (<xref ref-type="fig" rid="f2">
<bold>Figure&#xa0;2</bold>
</xref>).</p>
<fig id="f2" position="float">
<label>Figure&#xa0;2</label>
<caption>
<p>How does unlabeled data boost classification performance? Consider a number of features <italic>n</italic> at the input level which corresponds to an <italic>n</italic>-dimensional feature space. In such an n-dimensional coordinate system, every input is located according to its feature vector given by its <italic>n</italic> features and can thus be sorted by similarities and differences in relation to other inputs which is represented by proximity or distance points in the feature space. For clarity reasons, we only consider two features (x, y) in a two-dimensional feature space. When labeled data is sparse <bold>(A)</bold>, as is often the case in medical data sets, the decision boundary of a classifier is less constraint. This may lead to inaccuracies and poor generalization on external data. If many labels are given, the decision boundary is more constraint and thus a more accurate classifier is given that can potentially generalize better. However, manual labeling of such large data sets is often time- and cost-ineffective. Unlabeled data is often available in abundance <bold>(C)</bold> and can be used to constrain the decision boundary of a classifier in a way as large labeled data sets could do, however, without the need for excessive labeling. The decision boundary then lies in an area with low density. Nevertheless, as can be derived from <bold>(B)</bold> and <bold>(C)</bold>, the performance gap between supervised and semi-supervised learning shrinks as the amount of labeled data grows if no further unlabeled samples are provided.</p>
</caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fonc-12-960984-g002.tif"/>
</fig>
<p>As is the case for most machine learning applications, there is no &#x2018;one-size fits all&#x2019; approach and different methods and algorithms have to be evaluated for any given use-case. What further complicates model selection in SSL is a non-standardized taxonomy of methodologies which makes it harder to reproduce techniques proposed in the literature. Van Engelen et&#xa0;al. (<xref ref-type="bibr" rid="B3">3</xref>) recently proposed a taxonomy based on the distinction of inductive or transductive methods. The former encompass methods such as clustering with subsequent label assignment, pseudo-labeling or self- and co-training, i. e. methods that assign labels to unseen data and thus can potentially generalize, and the latter include graph-based methods that transfer information along connections of dataset-specific graphs only including data points in a given sample which then cannot be generalized to other data outside the specific sample (<xref ref-type="bibr" rid="B3">3</xref>). As for medical applications, the development of robust generalizable algorithms is desirable for utilization in clinical practice and hence most applied techniques in cancer diagnostics should be developed as inductive methods.</p>
</sec>
<sec id="s3">
<title>Studies on semi-supervised learning in cancer diagnostics</title>
<p>Research efforts in applying SSL for diagnostics and care in oncology can broadly be divided data-wise by usage of images or non-image data for model development. Naturally, image-based use-cases most frequently stem from the fields of histopathology, radiology and radiotherapy, while non-image-based applications most frequently include genetic data.</p>
<sec id="s3_1">
<title>Image-based semi-supervised learning for cancer detection</title>
<sec id="s3_1_1">
<title>Histopathology</title>
<p>In histopathology as a use-case, classification tasks using computer vision have to be divided into patch- or image-level diagnosis, i. e. whether areas with suspected malignancies should be distinguished from normal surrounding tissue or whether the sample as a whole should be labeled &#x2018;malignant&#x2019; if any sign of neoplastic tissue is present. Importantly for model building, patch-level classification requires image segmentation <italic>a priori</italic> to classification, i. e. different areas of the sample have to be discriminated according to e. g. shapes, patterns and colors. Using a multi-center dataset of &gt; 13.000 colorectal cancer WSI, Yu et&#xa0;al. (<xref ref-type="bibr" rid="B11">11</xref>) developed a mean teacher model to detect malignant patches that achieves a comparable area under the curve (AUC) compared to a multi-pathologist benchmark. They report a substantial improvement of SSL over SL when only a limited number of labels is available also validating their model on lung cancer and lymph node samples, but add that with a fully labeled set (with well above 10.000 labels) no difference between SSL and SL was detected. Similarly, Shaw et&#xa0;al. (<xref ref-type="bibr" rid="B12">12</xref>) deploy a student-teacher chain model where an iterative process of training a student model that subsequently becomes the teacher model for the following student and so on allows to utilize only 0.5% labeled data to detect colorectal adenocarcinoma from WSI. Wenger et&#xa0;al. (<xref ref-type="bibr" rid="B13">13</xref>) utilized consistency regularization and self-ensembling in order to detect and grade bladder cancer samples and report a 19% higher accuracy over baseline SL using only 3% labeled data. Jaiswal et&#xa0;al. (<xref ref-type="bibr" rid="B14">14</xref>) compared pre-trained models in detecting neoplastic infiltration of lymph node WSI and reported a high risk of overfitting after short training epochs which was tackled using ensemble learning. Addressing the challenge of variation within classes and similarities between classes, Su et&#xa0;al. (<xref ref-type="bibr" rid="B15">15</xref>) propose association cycle consistency loss and maximal conditional association to optimize the loss function reporting improved performance over learning by association on breast cancer histopathological images. Comparing SL and SSL, Al Azzam et&#xa0;al. (<xref ref-type="bibr" rid="B16">16</xref>) report similar accuracies for SSL when using only half the number of labels needed for SL in breast cancer prediction from fine needle aspirates. To grade breast cancer samples, Das et&#xa0;al. (<xref ref-type="bibr" rid="B17">17</xref>) employ a Generative Adversarial Network (GAN) where the discriminator uses an unsupervised model that is stacked over a supervised model with shared parameters to utilize both labeled and unlabeled samples. An Auxiliary Classifier GAN that divides lung cancer samples into malignant and benign patches which allows for subsequent pixel-based PD-L1 scoring is reported by Kapil et&#xa0;al. (<xref ref-type="bibr" rid="B18">18</xref>) for non-small cell lung cancer tissue needle aspirates. Both Marini et&#xa0;al. (<xref ref-type="bibr" rid="B19">19</xref>) and Li et&#xa0;al. (<xref ref-type="bibr" rid="B20">20</xref>) address the challenge of Gleason scoring prostate cancer samples. The former use a teacher-student approach with different combinations of a pseudo-labeling teacher training a student model utilizing both SSL and semi-weakly supervised learning that are compared to a student-only baseline (<xref ref-type="bibr" rid="B19">19</xref>). The latter use a pixel-based approach on prostate WSI with expectation maximization by a fully convolutional encoder-decoder net incorporating both internally annotated and external weakly annotated image data compared to a model trained on a fully labeled dataset alone (<xref ref-type="bibr" rid="B20">20</xref>). Both report performance improvements for the SSL methods using additional un- or weakly-labeled data. Lastly, to detect melanoma, Masood et&#xa0;al. (<xref ref-type="bibr" rid="B21">21</xref>) train deep belief networks in parallel to support vector machines that are supposed to counteract misclassified data with adjusted weights and finally compare their model to several SL-based models and report superior performance for their SSL-based approach. <xref ref-type="table" rid="T1">
<bold>Table&#xa0;1</bold>
</xref> provides an overview of recent studies that use SSL in histopathology.</p>
<table-wrap id="T1" position="float">
<label>Table&#xa0;1</label>
<caption>
<p>Overview of Studies on Semi-Supervised Learning in Histopathology.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left">Authors and Reference</th>
<th valign="top" align="center">Entity</th>
<th valign="top" align="center">Objective</th>
<th valign="top" align="center">Technique</th>
<th valign="top" align="center">Publicly Available Code</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Yu et&#xa0;al. (<xref ref-type="bibr" rid="B11">11</xref>)</td>
<td valign="top" align="center">colorectal and lung cancer as well as lymph nodes</td>
<td valign="top" align="center">detecting malignant patches in WSI</td>
<td valign="top" align="center">mean teacher</td>
<td valign="top" align="center">yes</td>
</tr>
<tr>
<td valign="top" align="left">Shaw et&#xa0;al. (<xref ref-type="bibr" rid="B12">12</xref>)</td>
<td valign="top" align="center">colorectal cancer</td>
<td valign="top" align="center">detecting malignant patches in WSI</td>
<td valign="top" align="center">student-teacher-chain</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Wenger et&#xa0;al. (<xref ref-type="bibr" rid="B13">13</xref>)</td>
<td valign="top" align="center">bladder cancer</td>
<td valign="top" align="center">detection and grading</td>
<td valign="top" align="center">consistency regularization and self-ensembling</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Jaiswal et&#xa0;al. (<xref ref-type="bibr" rid="B14">14</xref>)</td>
<td valign="top" align="center">metastasized tumors</td>
<td valign="top" align="center">detecting metastases in lymph node WSI</td>
<td valign="top" align="center">pseudo-labeling</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Su et&#xa0;al. (<xref ref-type="bibr" rid="B15">15</xref>)</td>
<td valign="top" align="center">breast cancer</td>
<td valign="top" align="center">detecting malignant samples in WSI</td>
<td valign="top" align="center">combination of association cycle consistency loss and maximal conditional association loss</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Das et&#xa0;al. (<xref ref-type="bibr" rid="B17">17</xref>)</td>
<td valign="top" align="center">breast cancer</td>
<td valign="top" align="center">grading samples</td>
<td valign="top" align="center">stacked semi-supervised GAN</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Al Azzam et&#xa0;al. (<xref ref-type="bibr" rid="B16">16</xref>)</td>
<td valign="top" align="center">breast cancer</td>
<td valign="top" align="center">cancer detection from nuclei morphologies</td>
<td valign="top" align="center">comparison of 9 SL and SSL classifiers</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Kapil et&#xa0;al. (<xref ref-type="bibr" rid="B18">18</xref>)</td>
<td valign="top" align="center">lung cancer</td>
<td valign="top" align="center">PD-L1 scoring</td>
<td valign="top" align="center">auxiliary classifier GAN and pixel-based quantification</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Marini et&#xa0;al. (<xref ref-type="bibr" rid="B19">19</xref>)</td>
<td valign="top" align="center">prostate cancer</td>
<td valign="top" align="center">Gleason scoring</td>
<td valign="top" align="center">teacher-student chain and pseudo-labeling</td>
<td valign="top" align="center">yes</td>
</tr>
<tr>
<td valign="top" align="left">Li et&#xa0;al. (<xref ref-type="bibr" rid="B20">20</xref>)</td>
<td valign="top" align="center">prostate cancer</td>
<td valign="top" align="center">Gleason scoring</td>
<td valign="top" align="center">expectation maximization-based fully convolutional encoder-decoder network</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Masood et&#xa0;al. (<xref ref-type="bibr" rid="B21">21</xref>)</td>
<td valign="top" align="center">melanoma</td>
<td valign="top" align="center">detecting malignant samples</td>
<td valign="top" align="center">Co-training of Deep Belief Network and advised SVM</td>
<td valign="top" align="center">no</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>GAN, generative adversarial networks; SL, supervised learning; SLL, semi-supervised learning; SVM, support vector machines; WSI, whole-slide-images.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s3_1_2">
<title>Radiology and radiotherapy</title>
<p>The detection of lung nodules in computer-assisted tomography (CAT) scans is a common theme in SSL-based research in radiology. Khosravan et&#xa0;al. (<xref ref-type="bibr" rid="B22">22</xref>) use a multi-tasking CNN to concomitantly learn nodule segmentation and false positive nodule reduction on chest CAT scans incorporating SSL to accommodate for unlabeled data in the segmentation process and report high accuracies compared to baseline. Xie et&#xa0;al. (<xref ref-type="bibr" rid="B23">23</xref>) address the task of differentiating between benign and malignant nodules using a semi-supervised adversarial model with an autoencoder unsupervised reconstruction net, learnable transition layers, and a supervised classification net and report high accuracies on a benchmark dataset for lung nodule classification. Using a similarity metric function to iteratively include unlabeled samples <italic>via</italic> SSL, Shi et&#xa0;al. (<xref ref-type="bibr" rid="B24">24</xref>) use a transfer learning approach with a pre-trained network that differentiates between nodules and nodule-like tissue to identify lung nodules and report high accuracies in their initial dataset, but acknowledge performance drops in an independent validation set. For breast cancer detection in mammogram images, both Sun et&#xa0;al. (<xref ref-type="bibr" rid="B25">25</xref>) and Azary et&#xa0;al. (<xref ref-type="bibr" rid="B26">26</xref>) use a co-training approach. In the former study, a three-step method of adjusting weights, selecting features and co-training-based labeling is proposed and a 7.4% performance gain for the combination of labeled and unlabeled data compared to labeled data only is reported (<xref ref-type="bibr" rid="B25">25</xref>). The latter study incorporates SSL in pixel-based tumor segmentation and proposes co-training with support vector machines and Bayesian classifiers (<xref ref-type="bibr" rid="B26">26</xref>). Using breast ultrasound images for tumor detection in a joint dataset of many weakly and few strongly annotated images, Shin et&#xa0;al. (<xref ref-type="bibr" rid="B27">27</xref>) propose a self-training method and report similar accuracies for only ten strongly annotated images joined by a large number of weakly annotated ones compared to 800 strongly annotated images only. Wodzinski et&#xa0;al. (<xref ref-type="bibr" rid="B28">28</xref>) aim to identify target volumes for postoperative tumor bed irradiation in breast cancer using a semi-supervised volume penalty <italic>via</italic> a multi-level encoder decoder architecture and report a decrease in target registration error and tumor volume ratio. For brain tumor detection, Ge et&#xa0;al. (<xref ref-type="bibr" rid="B29">29</xref>), Chen et&#xa0;al. (<xref ref-type="bibr" rid="B30">30</xref>), and Meier et&#xa0;al. (<xref ref-type="bibr" rid="B31">31</xref>) investigate brain magnetic resonance imaging (MRI) scans. Ge et&#xa0;al. (<xref ref-type="bibr" rid="B29">29</xref>) utilize a graph-based approach to create pseudo-labels and accommodate for moderate-sized data sets by generating additional images with GANs. They use their model for glioma grading and IDH-mutation status prediction (<xref ref-type="bibr" rid="B29">29</xref>). In a step-wise approach, Chen et&#xa0;al. (<xref ref-type="bibr" rid="B30">30</xref>) deploy a student-teacher-based model and extract hierarchical features using an adversarial network to detect lesions in brain MRI scans that correspond to either multiple sclerosis, ischemic stroke or tumor tissue. In a pre- and postoperative comparative setting, Meier et&#xa0;al. (<xref ref-type="bibr" rid="B31">31</xref>) investigate residual tumor tissue in brain MRI scans of ten high-grade glioma patients with semi-supervised decision forest and report improved performance and computation time compared to conventional segmentation methods. Lastly, Turk et&#xa0;al. (<xref ref-type="bibr" rid="B32">32</xref>) address thyroid cancer detection in ultrasound texture data with linked clinical scoring systems as additional features using an autoencoder-based model and report a high sensitivity despite their imbalanced dataset by using synthetic minority oversampling. <xref ref-type="table" rid="T2">
<bold>Table&#xa0;2</bold>
</xref> provides an overview of studies using SSL in radiology or radiotherapy.</p>
<table-wrap id="T2" position="float">
<label>Table&#xa0;2</label>
<caption>
<p>Overview of Studies on Semi-Supervised Learning in Radiology and Radiotherapy.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left">Authors and Reference</th>
<th valign="top" align="center">Entity</th>
<th valign="top" align="center">Objective</th>
<th valign="top" align="center">Technique</th>
<th valign="top" align="center">Publicly Available Code</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Khosravan et&#xa0;al. (<xref ref-type="bibr" rid="B22">22</xref>)</td>
<td valign="top" align="center">lung cancer</td>
<td valign="top" align="center">detecting malignant nodules in chest CAT scans</td>
<td valign="top" align="center">SSL-based multi-task network</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Xie et&#xa0;al. (<xref ref-type="bibr" rid="B23">23</xref>)</td>
<td valign="top" align="center">lung cancer</td>
<td valign="top" align="center">detecting malignant nodules in chest CAT scans</td>
<td valign="top" align="center">semi-supervised adversarial autoencoders, learnable transition layers, and supervised classification</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Shi et&#xa0;al. (<xref ref-type="bibr" rid="B24">24</xref>)</td>
<td valign="top" align="center">lung cancer</td>
<td valign="top" align="center">detecting malignant nodules in chest CAT scans</td>
<td valign="top" align="center">transfer learning and semi-supervised feature matching</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Sun et&#xa0;al. (<xref ref-type="bibr" rid="B25">25</xref>)</td>
<td valign="top" align="center">breast cancer</td>
<td valign="top" align="center">detecting breast cancer in mammogram images</td>
<td valign="top" align="center">co-training</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Azary et&#xa0;al. (<xref ref-type="bibr" rid="B26">26</xref>)</td>
<td valign="top" align="center">breast cancer</td>
<td valign="top" align="center">detecting breast cancer in mammogram images</td>
<td valign="top" align="center">co-training</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Shin et&#xa0;al. (<xref ref-type="bibr" rid="B27">27</xref>)</td>
<td valign="top" align="center">breast cancer</td>
<td valign="top" align="center">detecting breast cancer in ultrasound images</td>
<td valign="top" align="center">joint weakly- and strongly-supervised framework and self-training</td>
<td valign="top" align="center">yes</td>
</tr>
<tr>
<td valign="top" align="left">Wodzinski et&#xa0;al. (<xref ref-type="bibr" rid="B28">28</xref>)</td>
<td valign="top" align="center">breast cancer</td>
<td valign="top" align="center">identifying target volumes for radiotherapy</td>
<td valign="top" align="center">semi-supervised multilevel encoder-decoder</td>
<td valign="top" align="center">yes</td>
</tr>
<tr>
<td valign="top" align="left">Ge et&#xa0;al. (<xref ref-type="bibr" rid="B29">29</xref>)</td>
<td valign="top" align="center">brain tumor</td>
<td valign="top" align="center">glioma grading and IDH-mutation prediction in MRI scans</td>
<td valign="top" align="center">GAN-augmented networks in a graph-based framework</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Chen et&#xa0;al. (<xref ref-type="bibr" rid="B30">30</xref>)</td>
<td valign="top" align="center">brain tumor, multiple sclerosis, ischemic stroke</td>
<td valign="top" align="center">detecting pathological samples in MRI scans</td>
<td valign="top" align="center">student-teacher chain combined with adversarial learning</td>
<td valign="top" align="center">yes</td>
</tr>
<tr>
<td valign="top" align="left">Meier et&#xa0;al. (<xref ref-type="bibr" rid="B31">31</xref>)</td>
<td valign="top" align="center">brain tumor</td>
<td valign="top" align="center">detecting residual tumor tissue in postoperative brain MRI</td>
<td valign="top" align="center">semi-supervised decision forest</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Turk et&#xa0;al. (<xref ref-type="bibr" rid="B32">32</xref>)</td>
<td valign="top" align="center">thyroid cancer</td>
<td valign="top" align="center">detecting thyroid cancer from ultrasound textures and clinical scoring systems</td>
<td valign="top" align="center">autoencoders and synthetic minority oversampling</td>
<td valign="top" align="center">no</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn>
<p>CAT, computer-assisted tomography; GAN, generative adversarial networks; MRI, magnetic resonance imaging.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec id="s3_2">
<title>Non-image-based semi-supervised learning for cancer management</title>
<p>While the aforementioned image-based studies primarily focus on detection of cancer, research efforts of SSL in oncology that do not use images predominantly address the task of estimating survival, predicting relapse and identifying genetic subtypes. Examining gene expression data from patients with breast, lung, gastric and liver cancer, Chai et&#xa0;al. (<xref ref-type="bibr" rid="B33">33</xref>) use a semi-supervised self-paced learning framework with Cox proportional hazard and accelerated failure time models to classify cancer patients and predict censored data thereby reporting improved separation of survival curves for their model compared to baseline supervised models. Also using gene expression data but in the context of colorectal and breast cancer, Shi et&#xa0;al. (<xref ref-type="bibr" rid="B34">34</xref>) predict recurrence <italic>via</italic> low density separation. They report increasing accuracies for SSL over baseline SL classifiers with increasing numbers of unlabeled data. Addressing the same task in the same tumor entities, Park et&#xa0;al. (<xref ref-type="bibr" rid="B35">35</xref>) resort to a semi-supervised graph regularization algorithm to identify functionally similar gene pairs and thereby predict recurrence in breast and colorectal cancer gene expression data including labeled and unlabeled nodes. Hassanzadeh et&#xa0;al. (<xref ref-type="bibr" rid="B36">36</xref>) designed an ensemble model based on decision trees and boosting to predict survival for patients harboring kidney, ovarian, or pancreatic cancer for whom only incomplete clinical data was available and report improved accuracy for SSL compared to SL baselines. Cristovao et&#xa0;al. (<xref ref-type="bibr" rid="B37">37</xref>) compared SL and SSL in subtyping breast cancer using multi-omic data, however, did not find any performance improvements when comparing SSL to baseline logistic regression. Also investigating multi-omics data, Ma et&#xa0;al. (<xref ref-type="bibr" rid="B38">38</xref>) developed affinity fusion networks to cluster patients based on their specific omics profile into lung, kidney, uterus or adrenal gland cancer groups. The authors report a high predictive accuracy with training on less than one percent of labeled data. Sherafat et&#xa0;al. (<xref ref-type="bibr" rid="B39">39</xref>) developed a positive-unlabeled learning model using auto machine learning to predict tumor-rejection mediation neoepitopes from exome sequencing data in ovarian cancer. The authors report improved performance over model-based classifiers for somatic variant calling and peptide identification. Both Camargo et&#xa0;al. (<xref ref-type="bibr" rid="B40">40</xref>) and Livieris et&#xa0;al. (<xref ref-type="bibr" rid="B41">41</xref>) propose novel active learning models that are tested on either data of acute myeloid leukemia, E. coli, and plant leaves, or breast and lung cancer, respectively. In both studies, the authors report higher accuracies for their respective models, root distance boundary sampling (<xref ref-type="bibr" rid="B40">40</xref>) and improved CST voting (<xref ref-type="bibr" rid="B41">41</xref>), compared to both SSL and SL classifiers. <xref ref-type="table" rid="T3">
<bold>Table&#xa0;3</bold>
</xref> summarizes non-image-based applications of SSL with relevance to cancer detection and management.</p>
<table-wrap id="T3" position="float">
<label>Table&#xa0;3</label>
<caption>
<p>Overview of Studies on Semi-Supervised Learning using non-image-based data.</p>
</caption>
<table frame="hsides">
<thead>
<tr>
<th valign="top" align="left">Authors and Reference</th>
<th valign="top" align="center">Entity</th>
<th valign="top" align="center">Objective</th>
<th valign="top" align="center">Technique</th>
<th valign="top" align="center">Publicly Available Code</th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Chai et&#xa0;al. (<xref ref-type="bibr" rid="B33">33</xref>)</td>
<td valign="top" align="center">breast, lung, gastric and liver cancer</td>
<td valign="top" align="center">predicting survival</td>
<td valign="top" align="center">self-paced learning with Cox proportional hazard and accelerated failure time models</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Shi et&#xa0;al. (<xref ref-type="bibr" rid="B34">34</xref>)</td>
<td valign="top" align="center">colorectal and breast cancer</td>
<td valign="top" align="center">predicting relapse</td>
<td valign="top" align="center">low density separation</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Park et&#xa0;al. (<xref ref-type="bibr" rid="B35">35</xref>)</td>
<td valign="top" align="center">colorectal and breast cancer</td>
<td valign="top" align="center">predicting relapse</td>
<td valign="top" align="center">graph-based regularization</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Hassanzadeh et&#xa0;al. (<xref ref-type="bibr" rid="B36">36</xref>)</td>
<td valign="top" align="center">kidney, ovarian and pancreatic cancer</td>
<td valign="top" align="center">predicting survival</td>
<td valign="top" align="center">ensemble learning with robust boost and decision trees</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Cristovao et&#xa0;al. (<xref ref-type="bibr" rid="B37">37</xref>)</td>
<td valign="top" align="center">breast cancer</td>
<td valign="top" align="center">subtyping, model comparison</td>
<td valign="top" align="center">comparison of different SL and SSL algorithms</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Ma et&#xa0;al. (<xref ref-type="bibr" rid="B38">38</xref>)</td>
<td valign="top" align="center">lung, kidney, uterus and adrenal gland cancer</td>
<td valign="top" align="center">predicting primary tumor site</td>
<td valign="top" align="center">Affinity Network Fusion</td>
<td valign="top" align="center">yes</td>
</tr>
<tr>
<td valign="top" align="left">Sherafat et&#xa0;al. (<xref ref-type="bibr" rid="B39">39</xref>)</td>
<td valign="top" align="center">ovarian cancer</td>
<td valign="top" align="center">predicting tumor-rejection mediating neoepitopes</td>
<td valign="top" align="center">Positive-unlabeled Learning using Auto-ML</td>
<td valign="top" align="center">no</td>
</tr>
<tr>
<td valign="top" align="left">Camargo et&#xa0;al. (<xref ref-type="bibr" rid="B40">40</xref>)</td>
<td valign="top" align="center">acute myeloid leukemia, E. coli, plant leaves</td>
<td valign="top" align="center">model comparison</td>
<td valign="top" align="center">root distance boundary sampling</td>
<td valign="top" align="center">yes</td>
</tr>
<tr>
<td valign="top" align="left">Livieris et&#xa0;al. (<xref ref-type="bibr" rid="B41">41</xref>)</td>
<td valign="top" align="center">breast and lung cancer</td>
<td valign="top" align="center">model comparison</td>
<td valign="top" align="center">self- and co-training with ensemble learning</td>
<td valign="top" align="center">no</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s4" sec-type="discussion">
<title>Discussion</title>
<p>SSL represents a viable approach to the dilemma of big data in cancer medicine, especially in the context of image data which is usually acquired in abundance during clinical routine work-ups, but adequate labeling by medical experts is often time consuming and thus cost-ineffective. The main goal of SSL in this context is to achieve classification performances that surpass those of SL alone when labeled data is limited and at the same time abundant unlabeled data is available. Crucially, SSL models have to satisfy the above-mentioned assumptions: i) both labeled and unlabeled data have to be drawn from the same distribution, ii) similarity of data on the input level results in similarity of data at the output level (smoothness), iii) hence data points on the same low-dimensional structures (manifolds) receive the same labels and thus, iv) the decision boundary runs through an area of low density, i.e. where data points are separated and of different classes. Divergence from these key assumptions can not only lead to performance stagnation, but also degradation as unlabeled data is handled as noise that blurs information abstraction of the classifier (<xref ref-type="bibr" rid="B42">42</xref>). Importantly, this is what delineates SSL from transfer learning, where a classifier is first trained on one use-case and subsequently transferred to another similar use-case where it is supposed to perform a similar task (<xref ref-type="bibr" rid="B43">43</xref>), e. g. a classifier trained by identifying alteration A in immunohistochemistry on WSI in a supervised setting could potentially be transferred to also identify alteration B if staining is similar. Therefore, the most important question before conducting SSL experiments is whether labeled and unlabeled data are actually from the same distribution and if so whether an inclusion of the unlabeled samples might lead to a performance gain over baseline SL.</p>
<p>Several of the above-mentioned studies reported substantial performance gains for SSL as long as the model was short on labeled data, however, when the amount of labeled data was increased or only labeled data was used the gap between SSL and SL performance shrunk. However, the frequent lack of a comparison between baseline SL and SSL classifiers further complicates the evaluation of such studies and only few studies do report baseline comparisons (<xref ref-type="bibr" rid="B11">11</xref>, <xref ref-type="bibr" rid="B13">13</xref>, <xref ref-type="bibr" rid="B19">19</xref>, <xref ref-type="bibr" rid="B22">22</xref>, <xref ref-type="bibr" rid="B33">33</xref>, <xref ref-type="bibr" rid="B37">37</xref>) and still even fewer report equal tuning of hyperparameters (<xref ref-type="bibr" rid="B11">11</xref>, <xref ref-type="bibr" rid="B19">19</xref>) for SSL and SL classifiers to make results comparable. When it comes to model design, it is essential to note that different algorithms may perform differently with regard to different tasks (<xref ref-type="bibr" rid="B9">9</xref>). While this sounds obvious, it is still the case that often only the use of a single algorithm is reported which either may be due to a lack of comparative testing or due to publication bias as only the successful algorithm is selected for a given manuscript. However, to evaluate suitable model designs for different tasks, we advocate for a full report on tested algorithms ideally including a comparison between different SSL model set-ups, their SL baseline, adequate hyperparameter tuning for both SSL and SL, and the models&#x2019; individual performance in comparison. Further, varying the amount of labeled and unlabeled data for both training and testing sets seems warranted to find the equilibrium of optimal performance for different tasks in future studies of SSL in oncology. The lack of reproducibility in research on artificial intelligence in general (<xref ref-type="bibr" rid="B44">44</xref>) is also likely to be a future issue in biomedical use-cases of SSL as unfortunately only a minority of studies provide publicly accessible code to support their results (<xref ref-type="bibr" rid="B11">11</xref>, <xref ref-type="bibr" rid="B19">19</xref>, <xref ref-type="bibr" rid="B27">27</xref>, <xref ref-type="bibr" rid="B28">28</xref>, <xref ref-type="bibr" rid="B30">30</xref>, <xref ref-type="bibr" rid="B38">38</xref>, <xref ref-type="bibr" rid="B40">40</xref>). As is evident from previous studies on SSL in oncology, use cases mainly include tumor entities with high prevalence such as breast (<xref ref-type="bibr" rid="B15">15</xref>&#x2013;<xref ref-type="bibr" rid="B17">17</xref>, <xref ref-type="bibr" rid="B25">25</xref>&#x2013;<xref ref-type="bibr" rid="B28">28</xref>, <xref ref-type="bibr" rid="B33">33</xref>&#x2013;<xref ref-type="bibr" rid="B35">35</xref>, <xref ref-type="bibr" rid="B37">37</xref>, <xref ref-type="bibr" rid="B41">41</xref>), lung (<xref ref-type="bibr" rid="B18">18</xref>, <xref ref-type="bibr" rid="B22">22</xref>, <xref ref-type="bibr" rid="B23">23</xref>, <xref ref-type="bibr" rid="B33">33</xref>, <xref ref-type="bibr" rid="B34">34</xref>, <xref ref-type="bibr" rid="B38">38</xref>, <xref ref-type="bibr" rid="B41">41</xref>), and colorectal cancer (<xref ref-type="bibr" rid="B11">11</xref>, <xref ref-type="bibr" rid="B12">12</xref>, <xref ref-type="bibr" rid="B34">34</xref>, <xref ref-type="bibr" rid="B35">35</xref>) where single centers can amass sufficiently sized data sets to conduct SSL experiments. This is also reflected in the overwhelming absence of studies on SSL in hematology with only one single study (<xref ref-type="bibr" rid="B40">40</xref>) including any hematological neoplasm at all. Therefore, data-sharing is crucial in order to expand use-cases to rare tumor entities. Slight differences between centers in how training data is handled &#x2013; e.g. differences in imaging devices used and thus consecutive differences in image format, shape, contrast, resolution and brightness &#x2013; may also influence individual models. A model trained solely on single center image data may therefore significantly drop in performance if it is introduced to data of another source. Hence, pooling heterogenous data of different sources for initial model training is useful in order to obtain classifiers that can be widely generalized beyond in-house use for single institutions. Not only may the crowd-sourcing of research in biomedical SSL vastly enlarge the pool of unlabeled (and possibly labeled) data, but it may also help identify and modify promising models for multi-center prospective validation. The latter is another key shortcoming of previous studies that were often confined to single centers and retrospective evaluation. Thus, publicly available code, data-sharing for both labeled and unlabeled data and prospective collaborative research efforts will be key to evaluate models for future clinical applicability. Shared data and models may then also enable the evaluation of a variety of tumor entities in the same diagnostic modality, i. e. differential diagnosis of tumor entities in histopathological WSI.</p>
<p>This, however, leads to a frequent problem of artificial intelligence in general that is even more pronounced in the sensitive context of oncology where diagnostic accuracy is essential to provide high quality care to patients with life-threatening diseases: explainability of ML models. ML and especially deep learning has often been referred to as a &#x2018;black box&#x2019; (<xref ref-type="bibr" rid="B45">45</xref>) and the path of decision making within a model is hard to interpret. While this is already a key issue in SL, SSL adds to the confusion as information is also derived from unlabeled samples. The apparent lack of interpretability when it comes to clinical validation of model outputs stresses the urgent need to incorporate mechanisms of explainability into SSL models that make outputs or even intermediate steps such as label assignment on unlabeled samples traceable for clinical experts. The virtual lack thereof in previous studies signals a discrepancy between what is technologically possible and what is clinically acceptable for routine use as &#x2018;black box&#x2019; models will likely have it harder to be included in routine clinical workflows due to a lack of acceptance in diagnostic specialties and ethical concerns in cancer management (<xref ref-type="bibr" rid="B46">46</xref>). Still, given large unlabeled data sets that often are routinely acquired in cancer diagnostics combined with the trend of a shrinking physician workforce that is occupied with complex tasks that have to be performed in increasingly shorter periods of time (<xref ref-type="bibr" rid="B1">1</xref>), SSL provides a low-cost and potentially high-benefit solution to develop clinically meaningful ML models for diagnostic tasks in oncology.</p>
</sec>
<sec id="s5">
<title>Conclusion</title>
<p>While SSL provides a possible solution to the vast discrepancy between available labeled and unlabeled data in cancer diagnostics, it should not be considered a silver bullet in the development of accurate classifiers for cancer detection. Adequate selection of labeled and unlabeled data of the same distribution as well as comparisons to baseline SL, among others, are crucial to build robust SSL models. While previous research efforts of SSL in oncology have mainly comprised retrospective single-center studies, future research is warranted in multi-center prospective model evaluation to design robust and explainable classifiers for implementation in the clinical routine of cancer diagnostics.</p>
</sec>
<sec id="s6" sec-type="author-contributions">
<title>Author contributions</title>
<p>J-NE performed the literature search and wrote the initial draft. All authors provided critical scientific insights, reviewed and edited the draft and approved its final version for submission. All authors agree to be accountable on the contents of the work. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec id="s7" sec-type="funding-information">
<title>Funding</title>
<p>J-NE is grateful for a research scholarship from the Mildred-Scheel-Nachwuchszentrum Dresden (German Cancer Aid). The funder had no role in the design and conduct of the study, analysis, and interpretation of the data, preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.</p>
</sec>
<sec id="s8" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="s9" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<label>1</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhang</surname> <given-names>X</given-names>
</name>
<name>
<surname>Lin</surname> <given-names>D</given-names>
</name>
<name>
<surname>Pforsich</surname> <given-names>H</given-names>
</name>
<name>
<surname>Lin</surname> <given-names>VW</given-names>
</name>
</person-group>. <article-title>Physician workforce in the united states of America: forecasting nationwide shortages</article-title>. <source>Hum Resour Health</source> (<year>2020</year>) <volume>18</volume>:<elocation-id>8</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1186/s12960-020-0448-3</pub-id>
</citation>
</ref>
<ref id="B2">
<label>2</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Metter</surname> <given-names>DM</given-names>
</name>
<name>
<surname>Colgan</surname> <given-names>TJ</given-names>
</name>
<name>
<surname>Leung</surname> <given-names>ST</given-names>
</name>
<name>
<surname>Timmons</surname> <given-names>CF</given-names>
</name>
<name>
<surname>Park</surname> <given-names>JY</given-names>
</name>
</person-group>. <article-title>Trends in the US and Canadian pathologist workforces from 2007 to 2017</article-title>. <source>JAMA Netw Open</source> (<year>2019</year>) <volume>2</volume>:<fpage>e194337</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1001/jamanetworkopen.2019.4337</pub-id>
</citation>
</ref>
<ref id="B3">
<label>3</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>van Engelen</surname> <given-names>JE</given-names>
</name>
<name>
<surname>Hoos</surname> <given-names>HH</given-names>
</name>
</person-group>. <article-title>A survey on semi-supervised learning</article-title>. <source>Mach Learn</source> (<year>2020</year>) <volume>109</volume>:<fpage>373</fpage>&#x2013;<lpage>440</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10994-019-05855-6</pub-id>
</citation>
</ref>
<ref id="B4">
<label>4</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Willemink</surname> <given-names>MJ</given-names>
</name>
<name>
<surname>Koszek</surname> <given-names>WA</given-names>
</name>
<name>
<surname>Hardell</surname> <given-names>C</given-names>
</name>
<name>
<surname>Wu</surname> <given-names>J</given-names>
</name>
<name>
<surname>Fleischmann</surname> <given-names>D</given-names>
</name>
<name>
<surname>Harvey</surname> <given-names>H</given-names>
</name>
<etal/>
</person-group>. <article-title>Preparing medical imaging data for machine learning</article-title>. <source>Radiology</source> (<year>2020</year>) <volume>295</volume>:<fpage>4</fpage>&#x2013;<lpage>15</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1148/radiol.2020192224</pub-id>
</citation>
</ref>
<ref id="B5">
<label>5</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Triguero</surname> <given-names>I</given-names>
</name>
<name>
<surname>Garc&#xed;a</surname> <given-names>S</given-names>
</name>
<name>
<surname>Herrera</surname> <given-names>F</given-names>
</name>
</person-group>. <article-title>Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study</article-title>. <source>Knowl Inf Syst</source> (<year>2015</year>) <volume>42</volume>:<page-range>245&#x2013;84</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/s10115-013-0706-y</pub-id>
</citation>
</ref>
<ref id="B6">
<label>6</label>
<citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname>Cunningham</surname> <given-names>P</given-names>
</name>
<name>
<surname>Cord</surname> <given-names>M</given-names>
</name>
<name>
<surname>Delany</surname> <given-names>SJ</given-names>
</name>
</person-group>. <article-title>Supervised learning</article-title>. In: <person-group person-group-type="editor">
<name>
<surname>Cord</surname> <given-names>M</given-names>
</name>
<name>
<surname>Cunningham</surname> <given-names>P</given-names>
</name>
</person-group>, editors. <source>Machine learning techniques for multimedia: case studies on organization and retrieval. cognitive technologies</source>. <publisher-loc>Berlin, heidelberg</publisher-loc>: <publisher-name>springer</publisher-name> (<year>2008</year>). p. <fpage>P.21</fpage>&#x2013;<lpage>49</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/978-3-540-75171-7_2</pub-id>
</citation>
</ref>
<ref id="B7">
<label>7</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Barlow</surname> <given-names>HB</given-names>
</name>
</person-group>. <article-title>Unsupervised learning</article-title>. <source>Neural Comput</source> (<year>1989</year>) <volume>1</volume>:<fpage>295</fpage>&#x2013;<lpage>311</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1162/neco.1989.1.3.295</pub-id>
</citation>
</ref>
<ref id="B8">
<label>8</label>
<citation citation-type="book">
<person-group person-group-type="editor">
<name>
<surname>Chapelle</surname> <given-names>O</given-names>
</name>
<name>
<surname>Sch&#xf6;lkopf</surname> <given-names>B</given-names>
</name>
<name>
<surname>Zien</surname> <given-names>A</given-names>
</name>
</person-group> eds. <source>Semi-supervised learning</source>. <publisher-loc>Cambridge, MA, USA</publisher-loc>: <publisher-name>MIT Press</publisher-name> (<year>2006</year>). <fpage>528 p</fpage>.</citation>
</ref>
<ref id="B9">
<label>9</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Oliver</surname> <given-names>A</given-names>
</name>
<name>
<surname>Odena</surname> <given-names>A</given-names>
</name>
<name>
<surname>Raffel</surname> <given-names>C</given-names>
</name>
<name>
<surname>Cubuk</surname> <given-names>ED</given-names>
</name>
<name>
<surname>Goodfellow</surname> <given-names>IJ</given-names>
</name>
</person-group>. <article-title>Realistic evaluation of deep semi-supervised learning algorithms</article-title> (<year>2019</year>) (Accessed <access-date>March 9, 2022</access-date>).</citation>
</ref>
<ref id="B10">
<label>10</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Zhu</surname> <given-names>X</given-names>
</name>
<name>
<surname>Goldberg</surname> <given-names>AB</given-names>
</name>
</person-group>. <article-title>Introduction to semi-supervised learning</article-title>. <source>Synthesis Lectures Artif Intell Mach Learn</source> (<year>2009</year>) <volume>3</volume>:<fpage>1</fpage>&#x2013;<lpage>130</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.2200/S00196ED1V01Y200906AIM006</pub-id>
</citation>
</ref>
<ref id="B11">
<label>11</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Yu</surname> <given-names>G</given-names>
</name>
<name>
<surname>Sun</surname> <given-names>K</given-names>
</name>
<name>
<surname>Xu</surname> <given-names>C</given-names>
</name>
<name>
<surname>Shi</surname> <given-names>X-H</given-names>
</name>
<name>
<surname>Wu</surname> <given-names>C</given-names>
</name>
<name>
<surname>Xie</surname> <given-names>T</given-names>
</name>
<etal/>
</person-group>. <article-title>Accurate recognition of colorectal cancer with semi-supervised deep learning on pathological images</article-title>. <source>Nat Commun</source> (<year>2021</year>) <volume>12</volume>:<fpage>6311</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/s41467-021-26643-8</pub-id>
</citation>
</ref>
<ref id="B12">
<label>12</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Shaw</surname> <given-names>S</given-names>
</name>
<name>
<surname>Pajak</surname> <given-names>M</given-names>
</name>
<name>
<surname>Lisowska</surname> <given-names>A</given-names>
</name>
<name>
<surname>Tsaftaris</surname> <given-names>SA</given-names>
</name>
<name>
<surname>O&#x2019;Neil</surname> <given-names>AQ</given-names>
</name>
</person-group>. <article-title>Teacher-student chain for efficient semi-supervised histology image classification</article-title> (Accessed <access-date>February 22, 2022</access-date>).</citation>
</ref>
<ref id="B13">
<label>13</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wenger</surname> <given-names>K</given-names>
</name>
<name>
<surname>Tirdad</surname> <given-names>K</given-names>
</name>
<name>
<surname>Dela Cruz</surname> <given-names>A</given-names>
</name>
<name>
<surname>Mari</surname> <given-names>A</given-names>
</name>
<name>
<surname>Basheer</surname> <given-names>M</given-names>
</name>
<name>
<surname>Kuk</surname> <given-names>C</given-names>
</name>
<etal/>
</person-group>. <article-title>A semi-supervised learning approach for bladder cancer grading</article-title>. <source>Mach Learn Appl</source> (<year>2022</year>) <volume>9</volume>:<elocation-id>100347</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.mlwa.2022.100347</pub-id>
</citation>
</ref>
<ref id="B14">
<label>14</label>
<citation citation-type="web">
<person-group person-group-type="author">
<name>
<surname>Jaiswal</surname> <given-names>AK</given-names>
</name>
<name>
<surname>Panshin</surname> <given-names>I</given-names>
</name>
<name>
<surname>Shulkin</surname> <given-names>D</given-names>
</name>
<name>
<surname>Aneja</surname> <given-names>N</given-names>
</name>
<name>
<surname>Abramov</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>Semi-supervised learning for cancer detection of lymph node metastases</article-title> (<year>2019</year>) (Accessed <access-date>February 22, 2022</access-date>).</citation>
</ref>
<ref id="B15">
<label>15</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Su</surname> <given-names>L</given-names>
</name>
<name>
<surname>Liu</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>M</given-names>
</name>
<name>
<surname>Li</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>Semi-HIC: A novel semi-supervised deep learning method for histopathological image classification</article-title>. <source>Comput Biol Med</source> (<year>2021</year>) <volume>137</volume>:<elocation-id>104788</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.compbiomed.2021.104788</pub-id>
</citation>
</ref>
<ref id="B16">
<label>16</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Al-Azzam</surname> <given-names>N</given-names>
</name>
<name>
<surname>Shatnawi</surname> <given-names>I</given-names>
</name>
</person-group>. <article-title>Comparing supervised and semi-supervised machine learning models on diagnosing breast cancer</article-title>. <source>Ann Med Surg</source> (<year>2021</year>) <volume>62</volume>:<fpage>53</fpage>&#x2013;<lpage>64</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.amsu.2020.12.043</pub-id>
</citation>
</ref>
<ref id="B17">
<label>17</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Das</surname> <given-names>A</given-names>
</name>
<name>
<surname>Mishra</surname> <given-names>S</given-names>
</name>
<name>
<surname>Mishra</surname> <given-names>DK</given-names>
</name>
<name>
<surname>Gopalan</surname> <given-names>SS</given-names>
</name>
</person-group>. <article-title>Machine learning to predict 5-year survival among pediatric acute myeloid leukemia patients and development of OSPAM-c online survival prediction tool</article-title>. <source>medRxiv</source> (<year>2020</year>) <volume>2020</volume>:<elocation-id>4</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1101/2020.04.16.20068221</pub-id>
</citation>
</ref>
<ref id="B18">
<label>18</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Kapil</surname> <given-names>A</given-names>
</name>
<name>
<surname>Meier</surname> <given-names>A</given-names>
</name>
<name>
<surname>Zuraw</surname> <given-names>A</given-names>
</name>
<name>
<surname>Steele</surname> <given-names>KE</given-names>
</name>
<name>
<surname>Rebelatto</surname> <given-names>MC</given-names>
</name>
<name>
<surname>Schmidt</surname> <given-names>G</given-names>
</name>
<etal/>
</person-group>. <article-title>Deep semi supervised generative learning for automated tumor proportion scoring on nsclc tissue needle biopsies</article-title>. <source>Sci Rep</source> (<year>2018</year>) <volume>8</volume>:<fpage>17343</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/s41598-018-35501-5</pub-id>
</citation>
</ref>
<ref id="B19">
<label>19</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Marini</surname> <given-names>N</given-names>
</name>
<name>
<surname>Ot&#xe1;lora</surname> <given-names>S</given-names>
</name>
<name>
<surname>M&#xfc;ller</surname> <given-names>H</given-names>
</name>
<name>
<surname>Atzori</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Semi-supervised training of deep convolutional neural networks with heterogeneous data and few local annotations: an experiment on prostate histopathology image classification</article-title>. <source>Med Image Anal</source> (<year>2021</year>) <volume>73</volume>:<elocation-id>102165</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.media.2021.102165</pub-id>
</citation>
</ref>
<ref id="B20">
<label>20</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Li</surname> <given-names>J</given-names>
</name>
<name>
<surname>Speier</surname> <given-names>W</given-names>
</name>
<name>
<surname>Ho</surname> <given-names>KC</given-names>
</name>
<name>
<surname>Sarma</surname> <given-names>KV</given-names>
</name>
<name>
<surname>Gertych</surname> <given-names>A</given-names>
</name>
<name>
<surname>Knudsen</surname> <given-names>BS</given-names>
</name>
<etal/>
</person-group>. <article-title>An EM-based semi-supervised deep learning approach for semantic segmentation of histopathological images from radical prostatectomies</article-title>. <source>Comput Med Imaging Graph</source> (<year>2018</year>) <volume>69</volume>:<page-range>125&#x2013;33</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.compmedimag.2018.08.003</pub-id>
</citation>
</ref>
<ref id="B21">
<label>21</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Masood</surname> <given-names>A</given-names>
</name>
<name>
<surname>Al-Jumaily</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>Semi-advised learning model for skin cancer diagnosis based on histopathalogical images</article-title>. <source>Annu Int Conf IEEE Eng Med Biol Soc</source> (<year>2016</year>) <volume>2016</volume>:<page-range>631&#x2013;4</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/EMBC.2016.7590781</pub-id>
</citation>
</ref>
<ref id="B22">
<label>22</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Khosravan</surname> <given-names>N</given-names>
</name>
<name>
<surname>Bagci</surname> <given-names>U</given-names>
</name>
</person-group>. <article-title>Semi-supervised multi-task learning for lung cancer diagnosis</article-title>. <source>Annu Int Conf IEEE Eng Med Biol Soc</source> (<year>2018</year>) <volume>2018</volume>:<page-range>710&#x2013;3</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/EMBC.2018.8512294</pub-id>
</citation>
</ref>
<ref id="B23">
<label>23</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Xie</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>J</given-names>
</name>
<name>
<surname>Xia</surname> <given-names>Y</given-names>
</name>
</person-group>. <article-title>Semi-supervised adversarial model for benign-malignant lung nodule classification on chest CT</article-title>. <source>Med Image Anal</source> (<year>2019</year>) <volume>57</volume>:<page-range>237&#x2013;48</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.media.2019.07.004</pub-id>
</citation>
</ref>
<ref id="B24">
<label>24</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shi</surname> <given-names>F</given-names>
</name>
<name>
<surname>Chen</surname> <given-names>B</given-names>
</name>
<name>
<surname>Cao</surname> <given-names>Q</given-names>
</name>
<name>
<surname>Wei</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Zhou</surname> <given-names>Q</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>R</given-names>
</name>
<etal/>
</person-group>. <article-title>Semi-supervised deep transfer learning for benign-malignant diagnosis of pulmonary nodules in chest ct images</article-title>. <source>IEEE Trans Med Imaging</source> (<year>2021</year>) <volume>41</volume>(<issue>4</issue>):<page-range>771&#x2013;81</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/TMI.2021.3123572</pub-id>
</citation>
</ref>
<ref id="B25">
<label>25</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sun</surname> <given-names>W</given-names>
</name>
<name>
<surname>Tseng</surname> <given-names>T-LB</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>J</given-names>
</name>
<name>
<surname>Qian</surname> <given-names>W</given-names>
</name>
</person-group>. <article-title>Computerized breast cancer analysis system using three stage semi-supervised learning method</article-title>. <source>Comput Methods Programs BioMed</source> (<year>2016</year>) <volume>135</volume>:<fpage>77</fpage>&#x2013;<lpage>88</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.cmpb.2016.07.017</pub-id>
</citation>
</ref>
<ref id="B26">
<label>26</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Azary</surname> <given-names>H</given-names>
</name>
<name>
<surname>Abdoos</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>A semi-supervised method for tumor segmentation in mammogram images</article-title>. <source>J Med Signals Sens</source> (<year>2020</year>) <volume>10</volume>:<page-range>12&#x2013;8</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.4103/jmss.JMSS_62_18</pub-id>
</citation>
</ref>
<ref id="B27">
<label>27</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shin</surname> <given-names>YS</given-names>
</name>
<name>
<surname>Lee</surname> <given-names>S</given-names>
</name>
<name>
<surname>Yun</surname> <given-names>IlD</given-names>
</name>
<name>
<surname>Kim</surname> <given-names>SM</given-names>
</name>
<name>
<surname>Lee</surname> <given-names>KM</given-names>
</name>
</person-group>. <article-title>Joint weakly and semi-supervised deep learning for localization and classification of masses in breast ultrasound images</article-title>. <source>IEEE Trans Med Imaging</source> (<year>2019</year>) <volume>38</volume>:<page-range>762&#x2013;74</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/TMI.2018.2872031</pub-id>
</citation>
</ref>
<ref id="B28">
<label>28</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Wodzinski</surname> <given-names>M</given-names>
</name>
<name>
<surname>Ciepiela</surname> <given-names>I</given-names>
</name>
<name>
<surname>Kuszewski</surname> <given-names>T</given-names>
</name>
<name>
<surname>Kedzierawski</surname> <given-names>P</given-names>
</name>
<name>
<surname>Skalski</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>Semi-supervised deep learning-based image registration method with volume penalty for real-time breast tumor bed localization</article-title>. <source>Sensors (Basel)</source> (<year>2021</year>) <volume>21</volume>:<elocation-id>4085</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.3390/s21124085</pub-id>
</citation>
</ref>
<ref id="B29">
<label>29</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ge</surname> <given-names>C</given-names>
</name>
<name>
<surname>Gu</surname> <given-names>IY-H</given-names>
</name>
<name>
<surname>Jakola</surname> <given-names>AS</given-names>
</name>
<name>
<surname>Yang</surname> <given-names>J</given-names>
</name>
</person-group>. <article-title>Deep semi-supervised learning for brain tumor classification</article-title>. <source>BMC Med Imaging</source> (<year>2020</year>) <volume>20</volume>:<fpage>87</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1186/s12880-020-00485-0</pub-id>
</citation>
</ref>
<ref id="B30">
<label>30</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chen</surname> <given-names>G</given-names>
</name>
<name>
<surname>Ru</surname> <given-names>J</given-names>
</name>
<name>
<surname>Zhou</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Rekik</surname> <given-names>I</given-names>
</name>
<name>
<surname>Pan</surname> <given-names>Z</given-names>
</name>
<name>
<surname>Liu</surname> <given-names>X</given-names>
</name>
<etal/>
</person-group>. <article-title>MTANS: Multi-scale mean teacher combined adversarial network with shape-aware embedding for semi-supervised brain lesion segmentation</article-title>. <source>Neuroimage</source> (<year>2021</year>) <volume>244</volume>:<elocation-id>118568</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.neuroimage.2021.118568</pub-id>
</citation>
</ref>
<ref id="B31">
<label>31</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Meier</surname> <given-names>R</given-names>
</name>
<name>
<surname>Bauer</surname> <given-names>S</given-names>
</name>
<name>
<surname>Slotboom</surname> <given-names>J</given-names>
</name>
<name>
<surname>Wiest</surname> <given-names>R</given-names>
</name>
<name>
<surname>Reyes</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Patient-specific semi-supervised learning for postoperative brain tumor segmentation</article-title>. <source>Med Image Comput Comput Assist Interv</source> (<year>2014</year>) <volume>17</volume>:<page-range>714&#x2013;21</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/978-3-319-10404-1_89</pub-id>
</citation>
</ref>
<ref id="B32">
<label>32</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Turk</surname> <given-names>G</given-names>
</name>
<name>
<surname>Ozdemir</surname> <given-names>M</given-names>
</name>
<name>
<surname>Zeydan</surname> <given-names>R</given-names>
</name>
<name>
<surname>Turk</surname> <given-names>Y</given-names>
</name>
<name>
<surname>Bilgin</surname> <given-names>Z</given-names>
</name>
<name>
<surname>Zeydan</surname> <given-names>E</given-names>
</name>
</person-group>. <article-title>On the identification of thyroid nodules using semi-supervised deep learning</article-title>. <source>Int J Numer Method BioMed Eng</source> (<year>2021</year>) <volume>37</volume>:<fpage>e3433</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1002/cnm.3433</pub-id>
</citation>
</ref>
<ref id="B33">
<label>33</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Chai</surname> <given-names>H</given-names>
</name>
<name>
<surname>Li</surname> <given-names>Z</given-names>
</name>
<name>
<surname>Meng</surname> <given-names>D</given-names>
</name>
<name>
<surname>Xia</surname> <given-names>L</given-names>
</name>
<name>
<surname>Liang</surname> <given-names>Y</given-names>
</name>
</person-group>. <article-title>A new semi-supervised learning model combined with cox and sp-aft models in cancer survival analysis</article-title>. <source>Sci Rep</source> (<year>2017</year>) <volume>7</volume>:<fpage>13053</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/s41598-017-13133-5</pub-id>
</citation>
</ref>
<ref id="B34">
<label>34</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Shi</surname> <given-names>M</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>B</given-names>
</name>
</person-group>. <article-title>Semi-supervised learning improves gene expression-based prediction of cancer recurrence</article-title>. <source>Bioinformatics</source> (<year>2011</year>) <volume>27</volume>:<page-range>3017&#x2013;23</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1093/bioinformatics/btr502</pub-id>
</citation>
</ref>
<ref id="B35">
<label>35</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Park</surname> <given-names>C</given-names>
</name>
<name>
<surname>Ahn</surname> <given-names>J</given-names>
</name>
<name>
<surname>Kim</surname> <given-names>H</given-names>
</name>
<name>
<surname>Park</surname> <given-names>S</given-names>
</name>
</person-group>. <article-title>Integrative gene network construction to analyze cancer recurrence using semi-supervised learning</article-title>. <source>PLoS One</source> (<year>2014</year>) <volume>9</volume>:<fpage>e86309</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1371/journal.pone.0086309</pub-id>
</citation>
</ref>
<ref id="B36">
<label>36</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hassanzadeh</surname> <given-names>HR</given-names>
</name>
<name>
<surname>Phan</surname> <given-names>JH</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>MD</given-names>
</name>
</person-group>. <article-title>A semi-supervised method for predicting cancer survival using incomplete clinical data</article-title>. <source>Annu Int Conf IEEE Eng Med Biol Soc</source> (<year>2015</year>) <volume>2015</volume>:<page-range>210&#x2013;3</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/EMBC.2015.7318337</pub-id>
</citation>
</ref>
<ref id="B37">
<label>37</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cristovao</surname> <given-names>F</given-names>
</name>
<name>
<surname>Cascianelli</surname> <given-names>S</given-names>
</name>
<name>
<surname>Canakoglu</surname> <given-names>A</given-names>
</name>
<name>
<surname>Carman</surname> <given-names>M</given-names>
</name>
<name>
<surname>Nanni</surname> <given-names>L</given-names>
</name>
<name>
<surname>Pinoli</surname> <given-names>P</given-names>
</name>
<etal/>
</person-group>. <article-title>Investigating deep learning based breast cancer subtyping using pan-cancer and multi-omic data</article-title>. <source>IEEE/ACM Trans Comput Biol Bioinform</source> (<year>2022</year>) <volume>19</volume>:<page-range>121&#x2013;34</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1109/TCBB.2020.3042309</pub-id>
</citation>
</ref>
<ref id="B38">
<label>38</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Ma</surname> <given-names>T</given-names>
</name>
<name>
<surname>Zhang</surname> <given-names>A</given-names>
</name>
</person-group>. <article-title>Affinity network fusion and semi-supervised learning for cancer patient clustering</article-title>. <source>Methods</source> (<year>2018</year>) <volume>145</volume>:<fpage>16</fpage>&#x2013;<lpage>24</lpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1016/j.ymeth.2018.05.020</pub-id>
</citation>
</ref>
<ref id="B39">
<label>39</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Sherafat</surname> <given-names>E</given-names>
</name>
<name>
<surname>Force</surname> <given-names>J</given-names>
</name>
<name>
<surname>M&#x103;ndoiu</surname> <given-names>II</given-names>
</name>
</person-group>. <article-title>Semi-supervised learning for somatic variant calling and peptide identification in personalized cancer immunotherapy</article-title>. <source>BMC Bioinf</source> (<year>2020</year>) <volume>21</volume>:<fpage>498</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1186/s12859-020-03813-x</pub-id>
</citation>
</ref>
<ref id="B40">
<label>40</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Camargo</surname> <given-names>G</given-names>
</name>
<name>
<surname>Bugatti</surname> <given-names>PH</given-names>
</name>
<name>
<surname>Saito</surname> <given-names>PTM</given-names>
</name>
</person-group>. <article-title>Active semi-supervised learning for biological data classification</article-title>. <source>PLoS One</source> (<year>2020</year>) <volume>15</volume>:<elocation-id>e0237428</elocation-id>. doi:&#xa0;<pub-id pub-id-type="doi">10.1371/journal.pone.0237428</pub-id>
</citation>
</ref>
<ref id="B41">
<label>41</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Livieris</surname> <given-names>I</given-names>
</name>
<name>
<surname>Pintelas</surname> <given-names>E</given-names>
</name>
<name>
<surname>Kanavos</surname> <given-names>A</given-names>
</name>
<name>
<surname>Pintelas</surname> <given-names>P</given-names>
</name>
</person-group>. <article-title>An improved self-labeled algorithm for cancer prediction</article-title>. <source>Adv Exp Med Biol</source> (<year>2020</year>) <volume>1194</volume>:<page-range>331&#x2013;42</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1007/978-3-030-32622-7_31</pub-id>
</citation>
</ref>
<ref id="B42">
<label>42</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Cozman</surname> <given-names>F</given-names>
</name>
<name>
<surname>Cohen</surname> <given-names>I</given-names>
</name>
</person-group>. <article-title>Risks of semi-supervised learning: How unlabeled data can degrade performance of generative classifiers</article-title>. <source>Semi-Supervised Learning MIT Press</source> (<year>2006</year>):<page-range>57&#x2013;71</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.7551/mitpress/9780262033589.003.0004</pub-id>
</citation>
</ref>
<ref id="B43">
<label>43</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Weiss</surname> <given-names>K</given-names>
</name>
<name>
<surname>Khoshgoftaar</surname> <given-names>TM</given-names>
</name>
<name>
<surname>Wang</surname> <given-names>D</given-names>
</name>
</person-group>. <article-title>A survey of transfer learning</article-title>. <source>J Big Data</source> (<year>2016</year>) <volume>3</volume>:<fpage>9</fpage>. doi:&#xa0;<pub-id pub-id-type="doi">10.1186/s40537-016-0043-6</pub-id>
</citation>
</ref>
<ref id="B44">
<label>44</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Hutson</surname> <given-names>M</given-names>
</name>
</person-group>. <article-title>Artificial intelligence faces reproducibility crisis</article-title>. <source>Science</source> (<year>2018</year>) <volume>359</volume>(<issue>6377</issue>):<page-range>725&#x2013;6</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1126/science.359.6377.725</pub-id>
</citation>
</ref>
<ref id="B45">
<label>45</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Castelvecchi</surname> <given-names>D</given-names>
</name>
</person-group>. <article-title>Can we open the black box of AI</article-title>? <source>Nature</source> (<year>2016</year>) <volume>538</volume>:<page-range>20&#x2013;3</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1038/538020a</pub-id>
</citation>
</ref>
<ref id="B46">
<label>46</label>
<citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname>Grote</surname> <given-names>T</given-names>
</name>
<name>
<surname>Berens</surname> <given-names>P</given-names>
</name>
</person-group>. <article-title>On the ethics of algorithmic decision-making in healthcare</article-title>. <source>J Med Ethics</source> (<year>2020</year>) <volume>46</volume>:<page-range>205&#x2013;11</page-range>. doi:&#xa0;<pub-id pub-id-type="doi">10.1136/medethics-2019-105586</pub-id>
</citation>
</ref>
</ref-list>
</back>
</article>