<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2021.765754</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>An MRI Study on Effects of Math Education on Brain Development Using Multi-Instance Contrastive Learning</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name><surname>Zhang</surname> <given-names>Yupei</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1310548/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Liu</surname> <given-names>Shuhui</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1212310/overview"/>
</contrib>
<contrib contrib-type="author" corresp="yes">
<name><surname>Shang</surname> <given-names>Xuequn</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/898708/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Computer Science, Northwestern Polytechnical University</institution>, <addr-line>Xi&#x00027;an</addr-line>, <country>China</country></aff>
<aff id="aff2"><sup>2</sup><institution>Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology</institution>, <addr-line>Xi&#x00027;an</addr-line>, <country>China</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Zhongyu Wei, Fudan University, China</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Boran Zhou, Emory University, United States; Jianguo Chen, A*STAR Graduate Academy (A*STAR), Singapore; Zichao Want, Rice University, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Xuequn Shang <email>shang&#x00040;nwpu.edu.cn</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology</p></fn></author-notes>
<pub-date pub-type="epub">
<day>24</day>
<month>11</month>
<year>2021</year>
</pub-date>
<pub-date pub-type="collection">
<year>2021</year>
</pub-date>
<volume>12</volume>
<elocation-id>765754</elocation-id>
<history>
<date date-type="received">
<day>27</day>
<month>08</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>21</day>
<month>10</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2021 Zhang, Liu and Shang.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Zhang, Liu and Shang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license> </permissions>
<abstract><p>This paper explores whether mathematical education has effects on brain development from the perspective of brain MRIs. While biochemical changes in the left middle front gyrus region of the brain have been investigated, we proposed to classify students by using MRIs from the intraparietal sulcus (IPS) region that was left untouched in the previous study. On the cropped IPS regions, the proposed model developed popular contrastive learning (CL) to solve the problem of multi-instance representation learning. The resulted data representations were then fed into a linear neural network to identify whether students were in the math group or the non-math group. Experiments were conducted on 123 adolescent students, including 72 math students and 51 non-math students. The proposed model achieved an accuracy of 90.24 % for student classification, gaining more than 5% improvements compared to the classical CL frame. Our study provides not only a multi-instance extension to CL and but also an MRI insight into the impact of mathematical studying on brain development.</p></abstract>
<kwd-group>
<kwd>educational cognitive</kwd>
<kwd>MRI</kwd>
<kwd>mathematical learning</kwd>
<kwd>multi-instance learning</kwd>
<kwd>contrastive learning</kwd>
<kwd>brain development</kwd>
</kwd-group>
<contract-num rid="cn001">61802313</contract-num>
<contract-num rid="cn001">U1811262</contract-num>
<contract-sponsor id="cn001">National Natural Science Foundation of China<named-content content-type="fundref-id">10.13039/501100001809</named-content></contract-sponsor>
<counts>
<fig-count count="6"/>
<table-count count="2"/>
<equation-count count="10"/>
<ref-count count="42"/>
<page-count count="9"/>
<word-count count="5720"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Mathematical learning has significant impacts on the brain&#x00027;s plasticity and cognitive functions and has been associated with many quality-of-life and development indices (Beddington et al., <xref ref-type="bibr" rid="B6">2008</xref>; Zacharopoulos et al., <xref ref-type="bibr" rid="B33">2021</xref>). The understanding of these associations could help in utilizing mathematical learning to benefit the individual&#x00027;s development (Baglama et al., <xref ref-type="bibr" rid="B3">2017</xref>; Steffe, <xref ref-type="bibr" rid="B29">2017</xref>; Zacharopoulos et al., <xref ref-type="bibr" rid="B33">2021</xref>). Toward a better understanding of education behaviors, many researchers made a great number of efforts and yielded a wide range of education discoveries and educational tools from psychological measurements to artificial intelligence (AI) techniques (Steffe, <xref ref-type="bibr" rid="B29">2017</xref>; Barzagar Nazari and Ebersbach, <xref ref-type="bibr" rid="B5">2018</xref>; Mammarella et al., <xref ref-type="bibr" rid="B19">2018</xref>; Zhang et al., <xref ref-type="bibr" rid="B35">2020a</xref>, <xref ref-type="bibr" rid="B34">2021a</xref>; Peng et al., <xref ref-type="bibr" rid="B22">2021a</xref>,<xref ref-type="bibr" rid="B23">b</xref>).</p>
<p>This paper reviewed related works for Educational Information Science and Engineering (EISE) from the four aspects, i.e., psychological measurement (Mammarella et al., <xref ref-type="bibr" rid="B19">2018</xref>), biological analysis (Zacharopoulos et al., <xref ref-type="bibr" rid="B33">2021</xref>), educational computer engineering (Robertson and Howells, <xref ref-type="bibr" rid="B25">2008</xref>), and educational data science (Zhang et al., <xref ref-type="bibr" rid="B35">2020a</xref>, <xref ref-type="bibr" rid="B34">2021a</xref>). The psychological measurement aims to quantify education behaviors and understand the learning process from sociality and mentality by using statistical and cognitive models, e.g., item response theory (IRT) (Zhang et al., <xref ref-type="bibr" rid="B36">2019</xref>, <xref ref-type="bibr" rid="B35">2020a</xref>). Leslie reviewed the studies from 1901 to the present and augmented that the mathematics curricula should be constructed following children&#x00027;s psychology (Steffe, <xref ref-type="bibr" rid="B29">2017</xref>). Yupei et al. developed the classical psychological IRT model by seeking latent factors in response records to predict student responses to exam questions (Zhang et al., <xref ref-type="bibr" rid="B36">2019</xref>). Robert et al. explored the nature of the relations among prior information to show the effectiveness of the social cognitive theory (Lent et al., <xref ref-type="bibr" rid="B17">1993</xref>). While psychology explores learning behaviors from phenotypes, biological analysis is used extract the intrinsic impact of education on individuals from brain structure or genotypes (Liu et al., <xref ref-type="bibr" rid="B18">2021</xref>; Peng et al., <xref ref-type="bibr" rid="B24">2021c</xref>; Zacharopoulos et al., <xref ref-type="bibr" rid="B33">2021</xref>). By investigating the numerical cognition in the brain, Korbinian et al. determined that numerical cognition is subserved by a frontoparietal network that connects the cortex, basal ganglia, and thalamus (Moeller et al., <xref ref-type="bibr" rid="B20">2015</xref>). Annie et al. explored the association between neural changes and behaviors, suggesting teachers could help students remedy student misconceptions (Brookman-Byrne and Dumontheil, <xref ref-type="bibr" rid="B8">2020</xref>). Brain et al. reviewed specific learning disabilities to understand the complex etiology and co-occurrences, and accordingly underpin the optimization of learning contexts for individual learners (Butterworth and Kovas, <xref ref-type="bibr" rid="B9">2013</xref>). Based on the understanding of learning behaviors, computer engineering is introduced to create automatic tools or intelligent games to aid student learning and instructor teaching (Ng and Chan, <xref ref-type="bibr" rid="B21">2019</xref>; Alur et al., <xref ref-type="bibr" rid="B1">2020</xref>). Oi-Lam et al. examined students mathematics learning with computer-aided learning software and found that the students used 3D CAD to develop spatial skills and to achieve mathematics learning far beyond using formulate and performing procedures (Ng and Chan, <xref ref-type="bibr" rid="B21">2019</xref>). Christos et al. showed mobile game-based learning could further assist students in higher education toward advancing their knowledge level (Troussas et al., <xref ref-type="bibr" rid="B30">2020</xref>). Alberto built a multi-view early warning system with genetic-programming classification rules and the multi-view learning strategy to enhance the prediction (Cano and Leonard, <xref ref-type="bibr" rid="B10">2019</xref>). In this era of big data, educational data science creates a new path toward educational understanding and increasingly becomes a hopeful prospect for education revolution (Bienkowski et al., <xref ref-type="bibr" rid="B7">2012</xref>). With a sparsity learning model (Zhang and Liu, <xref ref-type="bibr" rid="B39">2020</xref>), Yupei et al. proposed a meta-knowledge dictionary learning model that learnt the latent meta-knowledge instead of the traditional manual Q-matrix (Zhang et al., <xref ref-type="bibr" rid="B35">2020a</xref>). They also used the technique of matrix factorization, integrating the side information of students and courses to predict the learning performance on the next-term course (Zhang et al., <xref ref-type="bibr" rid="B42">2020c</xref>). Through assessing the relations between controlling and autonomy-supportive teaching behaviors on 672 students, Nuria et al. showed that controlling teaching behaviors are negatively associated with psychological needs satisfaction and positively associated with procrastination (Codina et al., <xref ref-type="bibr" rid="B13">2018</xref>). More works in educational data science can be referred to in Cristobal&#x00027;s recent review (Romero and Ventura, <xref ref-type="bibr" rid="B27">2020</xref>). Nevertheless, data science needs to consider a wider range of data types in education research.</p>
<p>In recent years, the impact of mathematical learning on brain development has attracted great attention, where the neuroimage is the usually adopted technique (Kershner, <xref ref-type="bibr" rid="B15">2020</xref>; Zacharopoulos et al., <xref ref-type="bibr" rid="B33">2021</xref>). Mariano et al. discussed four specific cases in which neuroscience synergizes with other disciplines to serve education, ranging from very general physiological aspects of human learning to brain architectures, showing that the neuroscience method, tools, and theoretical frameworks have broadened our understanding of the mind in a way that is highly relevant to educational practices (Sigman et al., <xref ref-type="bibr" rid="B28">2014</xref>). Marie et al. used quantitative meta-analyses of fMRI studies to identify brain regions concordant among studies on number and calculation, yielding a topographical brain atlas of arithmetic (Arsalidou and Taylor, <xref ref-type="bibr" rid="B2">2011</xref>). Ching-Lin et al. reviewed the MRI neuroimaging approach in education studies and kinds of learning themes investigated in MRI research and provided objective and empirical evidence to connect learning processes outcomes and brain mechanisms (Wu et al., <xref ref-type="bibr" rid="B31">2021</xref>). Karin et al. used fMRIs to observe brain activation in mathematical calculation, revealing similar parietal and prefrontal activation patterns in children with developmental dyscalculia compared to controls for various conditions (Kucian et al., <xref ref-type="bibr" rid="B16">2006</xref>). To probe the impact of a lack of mathematical education on brain development, Georege et al. took more than 120 fMRIs from adolescent students that were allowed to stop studying math in the United Kingdom (Zacharopoulos et al., <xref ref-type="bibr" rid="B33">2021</xref>). By examining the neurotransmitter concentrations in the brain, they found that the &#x003B3;-aminobutyric acid (GABA) concentration in the middle frontal gyrus (MFG) is closely associated with mathematical learning and mathematical reasoning. This is evidence that the lack of math education has effects on brain plasticity and cognitive functions.</p>
<p>However, few studies investigated the effects of education on brain development from the perspective of structural neuroimages. The medical image is a technique of probing the intrinsic structure of the human body that is often utilized in disease diagnosis and therapy (Zhang et al., <xref ref-type="bibr" rid="B37">2020b</xref>, <xref ref-type="bibr" rid="B38">2021b</xref>). While the GABA in the MFG was investigated (Zacharopoulos et al., <xref ref-type="bibr" rid="B33">2021</xref>), we in this paper looked into the math-learning impact on brain development from the intraparietal sulcus (IPS) region that is also frequently reported in neuroimaging studies of arithmetic. This study made an attempt to assess the problem of whether math students and non-math students could be separated by using brain MRIs. The used method first cropped the voxel of interest (VOI), i.e., IPS, from the MRI and then fed all VOI image patches to our proposed multi-instance contrastive learning (MiCL) model, followed by a linear classifier for student identification. Our contributions could be summarized in two aspects: (1) We developed the classical CL model into the setting of multi-instance learning to solve our problem formulation. (2) This study aimed to explore the impact of mathematical education from structural brain MRIs.</p></sec>
<sec sec-type="materials and methods" id="s2">
<title>2. Materials and Methods</title>
<p>This study aims to identify math and non-math students by using MRI data to understand the impact of math learning on brain structure in the IPS region. With this purpose, we designed the following workflow: (1) acquiring MRIs from adolescent students including math students and non-math students and cropping all images into the IPS region (Zacharopoulos et al., <xref ref-type="bibr" rid="B33">2021</xref>), (2) designing a classification tool by using CL for image representations and a linear classifier (Chen et al., <xref ref-type="bibr" rid="B12">2020</xref>; Xu et al., <xref ref-type="bibr" rid="B32">2021</xref>), and (3) evaluating the performance and experiment analyses on the student classification.</p>
<sec>
<title>2.1. The Used MRIs</title>
<p>The used MRI data (XNAT Project ID: PN21) were acquired from 16-year-old adolescents that chose to stop or continue math learning in the United Kingdom. Math education was controlled as a single variable to a set math group with 72 students who engaged in A-level math and a non-math group containing 51 students who were not engaged in A-level math. In total, 123 MRIs were acquired on a 3T Siemens MAGNETOM Prisma MRI System equipped with a 32-channel receive-only head coil at the Oxford Centre for Function MRI of the Brain (FMRIB). With an MPRAGE sequence, the anatomical high-resolution T1-weighted MRI was taken by 192 slices, where echo time TE=3.97 ms, repetition time TR =1,900 ms, and voxel size = 1 &#x000D7; 1 &#x000D7; 1 mm. The IPS regions of 20 &#x000D7; 20 &#x000D7; 20 mm were manually defined on the individual&#x00027;s T1-weighted images while the student was lying down in the MR scanner (Zacharopoulos et al., <xref ref-type="bibr" rid="B33">2021</xref>). Acquisition time was 10&#x02013;15 min per voxel, including planning and shimming. <xref ref-type="fig" rid="F1">Figure 1</xref> shows the used T1-weighted MRIs together with the left MFG region. We in this study cropped the left IPS region from the T1-weighted MRIs, leading to 3D image VOI patches of 20 &#x000D7; 20 &#x000D7; 20 mm slices. To ensure the computation in deep learning, we normalized all voxels of image patches by
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where <italic>I</italic><sub><italic>ij</italic></sub> is an arbitrary pixel in all images; <italic>I</italic><sub><italic>max</italic></sub> and <italic>I</italic><sub><italic>min</italic></sub> are the maximal and minimal values among all VOI image voxels, respectively. To train the model in a supervised schema, we shuffled all image slices and took the student&#x00027;s label (i.e., class 1: non-math group, class 0: math group) as slice labels.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Positions of VOI in a representative T1-weighted MRI for IPS. Three cyan boxes show the IPS from sagittal, coronal, and axial views, respectively. <bold>(A)</bold> Sagittal slice, <bold>(B)</bold> coronal slice, and <bold>(C)</bold> axial slice.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpsyg-12-765754-g0001.tif"/>
</fig></sec>
<sec>
<title>2.2. Multi-Instance Contrastive Learning</title>
<p>The proposed multi-instance contrastive learning (MiCL) model aims to deal with the problem of student classification where each student involves 20 2D image slices. MiCL includes an input layer of 20 slices per student, a data transform layer for data augmentations, a hidden layer for slice representation learning, a feature layer for student representation learning, and a loss subspace layer for loss computation. <xref ref-type="fig" rid="F2">Figure 2</xref> shows the framework of the proposed MiCL.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>The proposed MiCL model. <italic>T</italic><sub>1</sub> and <italic>T</italic><sub>2</sub> are two data augmentation operators; <italic>F</italic><sub>1</sub> and <italic>F</italic><sub>2</sub> are the ResNets; and <italic>G</italic> is a multi-layer perception.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpsyg-12-765754-g0002.tif"/>
</fig>
<sec>
<title>2.2.1. Formulation</title>
<p>Let <bold>X</bold> &#x0003D; {<italic>X</italic><sub>1</sub>, <italic>X</italic><sub>2</sub>, &#x022EF;&#x02009;, <italic>X</italic><sub>20</sub>} represent student data consisting of 20 instances, where <italic>X</italic><sub><italic>i</italic></sub> represents an instance for an image slice. All students are denoted by <inline-formula><mml:math id="M2"><mml:mi mathvariant="-tex-caligraphic">D</mml:mi><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, where <italic>N</italic> is the number of students, and <italic>y</italic><sub><italic>i</italic></sub> is the label of the <italic>i</italic>-th student. Note that <italic>y</italic><sub><italic>i</italic></sub> &#x0003D; 1 is for students that have stopped math education, while <italic>y</italic><sub><italic>i</italic></sub> &#x0003D; 0 is for students that have continued mathematical studying. The problem we will handle in this study is
<disp-formula id="E2"><label>(2)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:mo class="qopname">min</mml:mo><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mi mathvariant="-tex-caligraphic">Q</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">G</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext class="textit" mathvariant="italic">F</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where <italic>F</italic> aims to extract the representations from 20 instances per student; <inline-formula><mml:math id="M14"><mml:mi mathvariant="-tex-caligraphic">G</mml:mi></mml:math></inline-formula> is a classifier that maps <bold>X</bold><sub><italic>i</italic></sub> to its label <italic>y</italic><sub><italic>i</italic></sub>; and <inline-formula><mml:math id="M15"><mml:mi mathvariant="-tex-caligraphic">Q</mml:mi></mml:math></inline-formula> is the loss function. In this formulation, the major problem is to learn student representations from all the 20 instances, i.e., the function <italic>F</italic>. A simple method is used to fuse the 20 instances into one student representation, which has been investigated in Dongkuan&#x00027;s work (Xu et al., <xref ref-type="bibr" rid="B32">2021</xref>). While their model is focused on the time series data in a supervised setting, we in this study proposed a new unsupervised model to learn student representations in a multi-instance setting.</p></sec>
<sec>
<title>2.2.2. Contrastive Learning</title>
<p>Recently, contrastive learning (CL) has become a popular scheme for robust image representation learning and has been widely used in many fields, e.g., text classification (Gao et al., <xref ref-type="bibr" rid="B14">2021</xref>), image classification (Chen et al., <xref ref-type="bibr" rid="B12">2020</xref>), and medical image segmentation (Chaitanya et al., <xref ref-type="bibr" rid="B11">2020</xref>). CL learns the latent image feature by training a nonlinear model on two noisy versions of each data point toward minimizing the difference between them. SimCLR is a representative framework for CL by training a ResNet for image representations and a multiple-layer perceptron (MLP) for loss calculations (Chen et al., <xref ref-type="bibr" rid="B12">2020</xref>). In mathematics, SimCLR is used to seek an optimal solution to the following problem,
<disp-formula id="E3"><label>(3)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="-tex-caligraphic">R</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mi mathvariant="-tex-caligraphic">L</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">R</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="italic"><mml:mi>T</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi mathvariant="-tex-caligraphic">R</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="italic"><mml:mi>T</mml:mi></mml:mstyle></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where <italic>T</italic><sub>1</sub> and <italic>T</italic><sub>2</sub> are the two data augmentation operations from the same family of augmentations; <inline-formula><mml:math id="M16"><mml:mi mathvariant="-tex-caligraphic">R</mml:mi></mml:math></inline-formula> is the classical ResNet for <italic>F</italic><sub>1</sub> and <italic>F</italic><sub>2</sub>. <inline-formula><mml:math id="M17"><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:math></inline-formula> is the contrastive loss, which is defined in detail as <inline-formula><mml:math id="M18"><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:math></inline-formula> &#x0003D; <italic>l</italic>(<bold>z</bold><sub><italic>i</italic></sub>, <bold>z</bold><sub><italic>j</italic></sub>) &#x0002B; <italic>l</italic>(<bold>z</bold><sub><italic>j</italic></sub>, <bold>z</bold><sub><italic>i</italic></sub>), where <bold>z</bold><sub><italic>i</italic></sub> and <bold>z</bold><sub><italic>j</italic></sub> are the results from <inline-formula><mml:math id="M19"><mml:mi mathvariant="-tex-caligraphic">R</mml:mi></mml:math></inline-formula>(<italic>T</italic><sub>1</sub>(&#x000B7;)) and <inline-formula><mml:math id="M20"><mml:mi mathvariant="-tex-caligraphic">R</mml:mi></mml:math></inline-formula>(<italic>T</italic><sub>2</sub>(&#x000B7;)), respectively. The loss function <italic>l</italic>(&#x000B7;) is
<disp-formula id="E4"><label>(4)</label><mml:math id="M5"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mstyle mathvariant="italic"><mml:mi>l</mml:mi></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mfrac><mml:mrow><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mi>&#x003C4;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mi>N</mml:mi></mml:mrow></mml:msubsup></mml:mstyle><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mn>1</mml:mn></mml:mstyle></mml:mrow><mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x02260;</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mi>&#x003C4;</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where &#x003C4; is a temperature parameter; <bold>1</bold> is an indicator function; and <inline-formula><mml:math id="M6"><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msubsup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msubsup><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>|</mml:mo><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msubsup><mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula>.</p></sec>
<sec>
<title>2.2.3. Objective Function</title>
<p>However, the objective function in Equation (3) fails to handle our multi-instance problem of student classification. To this end, we extended SimCLR into MiCL as
<disp-formula id="E5"><label>(5)</label><mml:math id="M7"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo class="qopname">min</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi><mml:mo>,</mml:mo><mml:mi>G</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:munder></mml:mstyle><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mi mathvariant="-tex-caligraphic">L</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mstyle mathvariant="italic"><mml:mi>G</mml:mi></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02295;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02295;</mml:mo><mml:mo>&#x022EF;</mml:mo><mml:mo>&#x02295;</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mn>20</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mstyle mathvariant="italic"><mml:mi>G</mml:mi></mml:mstyle><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02295;</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x02295;</mml:mo><mml:mo>&#x022EF;</mml:mo><mml:mo>&#x02295;</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>20</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where &#x02295; is the concentration operation; <bold>z</bold><sub><italic>i</italic></sub> and <inline-formula><mml:math id="M8"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>z</mml:mtext></mml:mstyle></mml:mrow><mml:mo>^</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> (<italic>i</italic> &#x0003D; 1, &#x022EF;&#x02009;, 20) are latent representations for the two transformed versions of an input image <italic>X</italic><sub><italic>i</italic></sub>, i.e., <bold>z</bold><sub><italic>i</italic></sub> &#x0003D; <italic>F</italic><sub>2</sub>(<italic>F</italic><sub>1</sub>(<italic>X</italic><sub><italic>i</italic></sub>)). As is shown in <xref ref-type="fig" rid="F2">Figure 2</xref>, we implemented <italic>T</italic><sub>1</sub> and <italic>T</italic><sub>2</sub> by randomly cropping and resizing, Gaussian blur, translation, and distortions, and <italic>F</italic><sub>1</sub> and <italic>F</italic><sub>2</sub> by using the classical ResNet, <italic>G</italic> by using MLP, and CL loss by using Equation (4). After all mappings were achieved, we used outputs of the feature layer as student representations for the subsequent classification tasks.</p></sec>
<sec>
<title>2.2.4. Linear Classifier</title>
<p>To implement the final student classification, this study employs the single-layer neural network that has been investigated in the evaluation of SimCLR (Chen et al., <xref ref-type="bibr" rid="B12">2020</xref>). By denoting <bold>h</bold><sub><italic>i</italic></sub>, the resultant representation for the <italic>i</italic>-th student, the classifier aims to minimize the cost function.
<disp-formula id="E6"><label>(6)</label><mml:math id="M9"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mi mathvariant="-tex-caligraphic">L</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mstyle displaystyle="true"><mml:munderover accentunder="false" accent="false"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover></mml:mstyle><mml:mo>-</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>&#x0002B;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:msub><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>-</mml:mo><mml:mi mathvariant="-tex-caligraphic">C</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>h</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
where <bold>h</bold> denotes the obtained representation from Equation (5) and <inline-formula><mml:math id="M21"><mml:mi mathvariant="-tex-caligraphic">C</mml:mi></mml:math></inline-formula>(&#x000B7;) &#x0003D; <italic>Sigmoid</italic>(&#x000B7;) is the activated function mapping student representations to the label space. Equation (6) is the function that measures the binary cross-entropy between the target and the output.</p></sec></sec>
<sec>
<title>2.3. Model Setting and Evaluation</title>
<p>The proposed model shown in <xref ref-type="fig" rid="F2">Figure 2</xref> was set up in detail as follows. All instances share the same <italic>F</italic><sub>1</sub> and <italic>F</italic><sub>2</sub>, so the two functions are implemented by using the ResNet. The ResNet comprises a convolutional layer with a kernel size of 3 &#x000D7; 3, three residual modules of four bottleneck blocks, and an average pooling layer. The number of channels is 64, 128, 256, 512, 256, 128, and 64, respectively. And the bottleneck block is composed of three convolutional layers with ReLU. Besides, batch normalization (BN) is utilized after each convolutional layer. Our model transfers image instances into a 128-dimensional space, and thus, student features into a 2,560 dimensional space. Then, the MLP for <italic>G</italic> is composed of two fully connected layers of channels 1,024 and 128. Finally, the linear classifier is from 2,560 to 1 and employs the Sigmoid as the activation function to yield the prediction probability. The model was trained by 2,000 iterations with a learning rate of 0.001, and 1,000 iterations trained the linear classifier with a learning rate of 0.005.</p>
<p>In this study, we finally calculated accuracy (ACC), F1-score (F1), and area under the ROC (AUC) on the used 123 MRIs. From the confusion matrix, we calculated the four metrics, i.e., true positive (TP), false positive (FP), false negative (FN), and true negative (TN). ACC and F1 are calculated by
<disp-formula id="E7"><label>(7)</label><mml:math id="M10"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>A</mml:mi><mml:mi>C</mml:mi><mml:mi>C</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E8"><label>(8)</label><mml:math id="M11"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E9"><label>(9)</label><mml:math id="M12"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="E10"><label>(10)</label><mml:math id="M13"><mml:mtable class="eqnarray" columnalign="right center left"><mml:mtr><mml:mtd><mml:mi>F</mml:mi><mml:mn>1</mml:mn><mml:mo>=</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x000D7;</mml:mo><mml:mfrac><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x000D7;</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x0002B;</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
and AUC is defined as the area under the ROC. Besides, the two-tailed <italic>t</italic>-test is adopted to compute the <italic>p</italic>-value for the statistic significance test (Zhang et al., <xref ref-type="bibr" rid="B41">2018</xref>). Due to the small-size dataset, we could conduct five-fold cross-validation on the 123 students. That is to say, the model could be trained on four folds and tested on the remaining fold to obtain the average evaluations.</p></sec></sec>
<sec id="s3">
<title>3. Result</title>
<p>To have a comparison with SimCLR (Chen et al., <xref ref-type="bibr" rid="B12">2020</xref>), we implemented the student classification by firstly learning an image representation for each slice per student, secondly connecting the 20 representations, and finally reducing them into a 2,560-dimensional PCA subspace (Zhang et al., <xref ref-type="bibr" rid="B40">2017</xref>). In short, we called this method SimCLR through the following context.</p>
<sec>
<title>3.1. Visualization</title>
<p><xref ref-type="fig" rid="F3">Figure 3</xref> scatters all 123 student representations from SimCLR and MiCL in the 2D subspace. All obtained representations were reduced into 50-dimensional PCA subspaces and then reduced into 2-dimensional t-SNE subspaces. There were 51 students who stopped math education for class 1 and 72 students who continued math studying for class 0, colored in brown and blue in the figures, respectively. As is shown, the student representations yielded from MiCL could be easily separated between class 1 and class 0, compared to SimCLR, in the 2D t-SNE subspace. This observation potentially suggests that joint learning of the 20 image slices in a multi-instance setting could yield more smart student representations.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Visualization of the learned representation in 2D subspaces. There are in total 123 students, including 51 students in class 1 and 72 students in class 0. <bold>(A)</bold> SimCLR and <bold>(B)</bold> MiCL.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpsyg-12-765754-g0003.tif"/>
</fig></sec>
<sec>
<title>3.2. Overall Evaluation</title>
<p><xref ref-type="fig" rid="F4">Figure 4</xref> shows the confusion matrixes from SimCLR and the proposed MiCL. Note that this study took the non-math group as the positive class and the math group as the negative class. TP<sub><italic>SimCLR</italic></sub> &#x0003E; TP<sub><italic>MiCL</italic></sub> shows that SimCLR prefers non-math students, while MiCL prefers math students from TN<sub><italic>MiCL</italic></sub> &#x0003E; TN<sub><italic>SimCLR</italic></sub>. SimCLR has a big FN while MiCL has a big FP, where FP<sub><italic>SimCLR</italic></sub> = TN<sub><italic>MiCL</italic></sub>. That means that SimCLR is better at identifying non-math students, while MiCL is better at identifying math students. However, the proposed MiCL is better overall than SimCLR at classification.</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Confusion matrix. The two matrixes show TN, FP, FN, and TP from the classification results of SimCLR and MiCL, respectively. We here considered non-math students as the positive class. <bold>(A)</bold> SimCLR and <bold>(B)</bold> MiCL.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpsyg-12-765754-g0004.tif"/>
</fig>
<p><xref ref-type="table" rid="T1">Table 1</xref> reports the overall evaluations in terms of the various metrics. Since SimCLR prefers non-math students, SimCLR achieves higher precision than MiCL. But MiCL obtains a higher recall than SimCLR and furthermore results in a higher F1 score. On the other hand, the proposed MiCL gains significant improvements on ACC and AUC by 5 and 3% with <italic>p</italic> &#x0003C; 0.01, respectively. The AUC was obtained by the ROCs, shown in <xref ref-type="fig" rid="F5">Figure 5</xref>. ROCs were plotted by the true positive rate (TPR) against the false positive rate (FPR), showing the classification performance at various thresholds. As is shown, MiCL achieves a higher TPR at a low FPR than SimCLR. Controlling FPR is an important research topic in many fields, e.g., disease diagnosis and drug discovery (Romano et al., <xref ref-type="bibr" rid="B26">2020</xref>). While SimCLR has higher performance at a high FPR, MiCL gains an improvement at AUC that is calculated by the area under ROC in comparison with SimCLR. Overall, the proposed MiCL achieves a better classification performance than SimCLR, while FPR could meanwhile be controlled.</p>
<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Evaluation results.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center"><bold>ACC</bold></th>
<th valign="top" align="center"><bold>Precision</bold></th>
<th valign="top" align="center"><bold>Recall</bold></th>
<th valign="top" align="center"><bold>F1</bold></th>
<th valign="top" align="center"><bold>AUC</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">SimCLR</td>
<td valign="top" align="center">0.8455</td>
<td valign="top" align="center">0.8431</td>
<td valign="top" align="center">0.7963</td>
<td valign="top" align="center">0.8190</td>
<td valign="top" align="center">0.9289</td>
</tr>
<tr>
<td valign="top" align="left">MiCL</td>
<td valign="top" align="center">0.9024</td>
<td valign="top" align="center">0.7843</td>
<td valign="top" align="center">0.9756</td>
<td valign="top" align="center">0.8696</td>
<td valign="top" align="center">0.9578</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>Student classification results were calculated for all 123 students in terms of the mentioned metrics, where non-math students were used as positive samples</italic>.</p>
</table-wrap-foot>
</table-wrap>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>ROCs. The ROCs show the classification performance of SimCLR and MiCL.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpsyg-12-765754-g0005.tif"/>
</fig></sec>
<sec>
<title>3.3. Individual Evaluation</title>
<p><xref ref-type="fig" rid="F6">Figure 6</xref> shows the classification probability for two classes yielded by SimCLR and MiCL. The probability was calculated by normalizing the two outputs to sum 1. That is to say, the sum of the probability belonging to class 1 and the probability belong to class 0 is 100%. In this study, we identified a student to be a math student if the corresponding probability is less than 0.5; otherwise, we identified the student to be a non-math student. As is shown, SimCLR results in most of the probabilities in [0.2, 0.4) for class 0 and most of the probabilities in [0.5, 0.7). And MiCL yields the classification probability concentrated in [0.0, 0.3) for class 0 and the classification probability concentrated in [0.6, 0.9) for class 1. On the other hand, SimCLR leads to more students having a probability of greater than 0.5 for class 0, while MiCL gives rise to more students having a probability of less than 0.5 for class 1. The observation shows that MiCL could yield a more convincing classification for the corrected predictions than SimCLR. Besides, SimCLR leads to more stable predictions for non-math students, and even the probability is concentrated at near 0.5.</p>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>The probability distribution. The distribution of the classification probability for math students and non-math students by SimCLR and MiCL. <bold>(A)</bold> Math students and <bold>(B)</bold> Non-math students.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpsyg-12-765754-g0006.tif"/>
</fig>
<p><xref ref-type="table" rid="T2">Table 2</xref> summarizes the mean and the standard deviation of the classification probability for SimCLR and MiCL, respectively. As is shown, MiCL has a smaller mean with a smaller standard deviation than SimCLR on the tasks of identifying math students. While MiCL has the same mean for non-math students, SimCLR has a smaller standard deviation. However, MiCL yields more confident predictions having benefited from multi-instance joint learning.</p>
<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Means and standard deviations.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Math student</bold></th>
<th valign="top" align="center" colspan="2" style="border-bottom: thin solid #000000;"><bold>Non-math student</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td/>
<td valign="top" align="center"><bold>Mean</bold></td>
<td valign="top" align="center"><bold>Standard deviation</bold></td>
<td valign="top" align="center"><bold>Mean</bold></td>
<td valign="top" align="center"><bold>Standard deviation</bold></td>
</tr> <tr>
<td valign="top" align="left">SimCLR</td>
<td valign="top" align="center">0.3215</td>
<td valign="top" align="center">0.1581</td>
<td valign="top" align="center">0.6500</td>
<td valign="top" align="center">0.1332</td>
</tr>
<tr>
<td valign="top" align="left">MiCL</td>
<td valign="top" align="center">0.2291</td>
<td valign="top" align="center">0.1469</td>
<td valign="top" align="center">0.6501</td>
<td valign="top" align="center">0.1703</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The results were calculated for all 123 classification probabilities</italic>.</p>
</table-wrap-foot>
</table-wrap></sec></sec>
<sec id="s4">
<title>4. Conclusion and Discussion</title>
<p>In this paper, we made an attempt to classify students that have stopped studying mathematics and students that have continued their mathematical education by using the popular deep learning technique. To deal with the 3D images, we formulated this problem into multi-instance learning and developed a classical contrastive learning framework in a multi-instance setting.</p>
<p>The proposed MiCL learns the image representation by sharing the weights between the 20 instances and then concatenates 20 image representations, leading to the final student representation. In the two versions of each student, the contrastive loss is employed to encourage a minimal difference. For 123 students, composed of 51 non-math students and 72 math students, MiCL achieves an accuracy of 90.24% that gains a 5% improvement in comparison with SimCLR. Benefitting from the multi-instance joint learning, the same observation has also been obtained for other metrics.</p>
<p>The MRI data have the potential to be used in identifying whether a student has stopped their mathematical education. Both SimCLR and MiCL convey decent accuracy on the classification task of math students or non-math students. Moreover, SimCLR is capable of identifying non-math students more stably, while MiCL prefers to identifying math students. Since the math or non-math student could be separated with a high accuracy using MRIs, mathematical education has a potential impact on adolescent brain development from white matter and gray matter in the IPS region. This conclusion has also been investigated in the work of Karin (Kucian et al., <xref ref-type="bibr" rid="B16">2006</xref>; Zacharopoulos et al., <xref ref-type="bibr" rid="B33">2021</xref>).</p>
<p>There are two points that should be noticed. (1) MiCL gains an insubstantial improvement in accuracy in the 2,560-dimensional subspace in comparison with the 2-dimensional subspace. It may mean that feature selection could be utilized to discover the brain atlas for mathematical studying. (2) Multi-instance joint features maybe contribute more to math-student identification. It potentially means the impact of mathematical studying is more varied on multiple image slices.</p>
<p>Hence, we should uncover the brain atlas that is affected by mathematical education and further discuss the impact on future attainment for adolescents in future works. The attention mechanism could provide more explanations to understand the latent representation, which is our other future consideration (Zhang et al., <xref ref-type="bibr" rid="B34">2021a</xref>). Besides, we will investigate more brain regions that are also related to math learning, e.g., the middle front gyrus (Zacharopoulos et al., <xref ref-type="bibr" rid="B33">2021</xref>), and conduct more experiments to prob the associations between the MRI images and other problems, e.g., student psychology and math anxiety (Barroso et al., <xref ref-type="bibr" rid="B4">2021</xref>).</p></sec>
<sec sec-type="data-availability" id="s5">
<title>Data Availability Statement</title>
<p>Publicly available datasets were analyzed in this study. This data can be found here: <ext-link ext-link-type="uri" xlink:href="http://central.xnat.org">http://central.xnat.org</ext-link>.</p></sec>
<sec id="s6">
<title>Ethics Statement</title>
<p>The studies involving human participants were reviewed and approved by the School of Computer Science at Northwestern Polytechnical University, Xi&#x00027;an, China. The participants provided their written informed consent to participate in this study.</p></sec>
<sec id="s7">
<title>Author Contributions</title>
<p>YZ and XS work with the School of Computer Science (SCS) at Northwestern Polytechnical University (NPU), Xi&#x00027;an, China and funded this study. SL is a Ph.D. student in SCS at NPU, Xi&#x00027;an, China and collected the data and plotted the figures. YZ designed the study, conducted experiments, and wrote the manuscript. All authors contributed to the article and approved the submitted version.</p></sec>
<sec sec-type="funding-information" id="s8">
<title>Funding</title>
<p>This study was supported in part by National Natural Science Foundation of China (Nos. 61802313 and U1811262), Key Research and Development Program of China (No. 2020AAA0108500), and Reformation Research on Education and Teaching at Northwestern Polytechnical University (No. 2021JGY31).</p></sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p></sec>
<sec sec-type="disclaimer" id="s9">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p></sec>
</body>
<back>
<ack><p>All authors thank the editors and the reviewers for their helpful comments.</p>
</ack>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Alur</surname> <given-names>R.</given-names></name> <name><surname>Baraniuk</surname> <given-names>R.</given-names></name> <name><surname>Bodik</surname> <given-names>R.</given-names></name> <name><surname>Drobnis</surname> <given-names>A.</given-names></name> <name><surname>Gulwani</surname> <given-names>S.</given-names></name> <name><surname>Hartmann</surname> <given-names>B.</given-names></name> <etal/></person-group>. (<year>2020</year>). <article-title>Computer-aided personalized education</article-title>. <source>arXiv preprint arXiv</source>:2007.03704.</citation>
</ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Arsalidou</surname> <given-names>M.</given-names></name> <name><surname>Taylor</surname> <given-names>M. J.</given-names></name></person-group> (<year>2011</year>). <article-title>Is 2&#x0002B; 2= 4? meta-analyses of brain areas needed for numbers and calculations</article-title>. <source>Neuroimage</source> <volume>54</volume>, <fpage>2382</fpage>&#x02013;<lpage>2393</lpage>. <pub-id pub-id-type="doi">10.1016/j.neuroimage.2010.10.009</pub-id><pub-id pub-id-type="pmid">20946958</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baglama</surname> <given-names>B.</given-names></name> <name><surname>Yucesoy</surname> <given-names>Y.</given-names></name> <name><surname>Uzunboylu</surname> <given-names>H.</given-names></name> <name><surname>&#x000D6;zcan</surname> <given-names>D.</given-names></name></person-group> (<year>2017</year>). <article-title>Can infographics facilitate the learning of individuals with mathematical learning difficulties</article-title>. <source>Int. J. Cogn. Res. Sci. Eng. Educ</source>. <volume>5</volume>, <fpage>119</fpage>&#x02013;<lpage>128</lpage>. <pub-id pub-id-type="doi">10.5937/ijcrsee1702119B</pub-id></citation>
</ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barroso</surname> <given-names>C.</given-names></name> <name><surname>Ganley</surname> <given-names>C. M.</given-names></name> <name><surname>McGraw</surname> <given-names>A. L.</given-names></name> <name><surname>Geer</surname> <given-names>E. A.</given-names></name> <name><surname>Hart</surname> <given-names>S. A.</given-names></name> <name><surname>Daucourt</surname> <given-names>M. C.</given-names></name></person-group> (<year>2021</year>). <article-title>A meta-analysis of the relation between math anxiety and math achievement</article-title>. <source>Psychol. Bull</source>. <volume>147</volume>, <fpage>134</fpage>. <pub-id pub-id-type="doi">10.1037/bul0000307</pub-id><pub-id pub-id-type="pmid">33119346</pub-id></citation></ref>
<ref id="B5">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barzagar Nazari</surname> <given-names>K.</given-names></name> <name><surname>Ebersbach</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). <article-title>Distributed practice: rarely realized in self-regulated mathematical learning</article-title>. <source>Front. Psychol</source>. <volume>9</volume>:<fpage>2170</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2018.02170</pub-id><pub-id pub-id-type="pmid">30524328</pub-id></citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Beddington</surname> <given-names>J.</given-names></name> <name><surname>Cooper</surname> <given-names>C. L.</given-names></name> <name><surname>Field</surname> <given-names>J.</given-names></name> <name><surname>Goswami</surname> <given-names>U.</given-names></name> <name><surname>Huppert</surname> <given-names>F. A.</given-names></name> <name><surname>Jenkins</surname> <given-names>R.</given-names></name> <etal/></person-group>. (<year>2008</year>). <article-title>The mental wealth of nations</article-title>. <source>Nature</source> <volume>455</volume>, <fpage>1057</fpage>&#x02013;<lpage>1060</lpage>. <pub-id pub-id-type="doi">10.1038/4551057a</pub-id><pub-id pub-id-type="pmid">18948946</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Bienkowski</surname> <given-names>M.</given-names></name> <name><surname>Feng</surname> <given-names>M.</given-names></name> <name><surname>Means</surname> <given-names>B.</given-names></name></person-group> (<year>2012</year>). <source>Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief</source>. Office of Educational Technology, US Department of Education.</citation>
</ref>
<ref id="B8">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Brookman-Byrne</surname> <given-names>A.</given-names></name> <name><surname>Dumontheil</surname> <given-names>I.</given-names></name></person-group> (<year>2020</year>). <article-title>Brain and cognitive development during adolescence: implications for science and mathematics education</article-title>, in <source>The &#x0201C;BrainCanDo&#x0201D; Handbook of Teaching and Learning</source> (<publisher-loc>London</publisher-loc>: <publisher-name>David Fulton Publishers</publisher-name>), <fpage>205</fpage>&#x02013;<lpage>221</lpage>.</citation>
</ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Butterworth</surname> <given-names>B.</given-names></name> <name><surname>Kovas</surname> <given-names>Y.</given-names></name></person-group> (<year>2013</year>). <article-title>Understanding neurocognitive developmental disorders can improve education for all</article-title>. <source>Science</source> <volume>340</volume>, <fpage>300</fpage>&#x02013;<lpage>305</lpage>. <pub-id pub-id-type="doi">10.1126/science.1231022</pub-id><pub-id pub-id-type="pmid">23599478</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cano</surname> <given-names>A.</given-names></name> <name><surname>Leonard</surname> <given-names>J. D.</given-names></name></person-group> (<year>2019</year>). <article-title>Interpretable multiview early warning system adapted to underrepresented student populations</article-title>. <source>IEEE Trans. Learn. Technol</source>. <volume>12</volume>, <fpage>198</fpage>&#x02013;<lpage>211</lpage>. <pub-id pub-id-type="doi">10.1109/TLT.2019.2911079</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B11">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chaitanya</surname> <given-names>K.</given-names></name> <name><surname>Erdil</surname> <given-names>E.</given-names></name> <name><surname>Karani</surname> <given-names>N.</given-names></name> <name><surname>Konukoglu</surname> <given-names>E.</given-names></name></person-group> (<year>2020</year>). <article-title>Contrastive learning of global and local features for medical image segmentation with limited annotations</article-title>. <source>arXiv preprint arXiv</source>:2006.10511.</citation>
</ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Chen</surname> <given-names>T.</given-names></name> <name><surname>Kornblith</surname> <given-names>S.</given-names></name> <name><surname>Norouzi</surname> <given-names>M.</given-names></name> <name><surname>Hinton</surname> <given-names>G.</given-names></name></person-group> (<year>2020</year>). <article-title>A simple framework for contrastive learning of visual representations</article-title>, in <source>International Conference on Machine Learning</source> (<publisher-loc>PMLR</publisher-loc>), <fpage>1597</fpage>&#x02013;<lpage>1607</lpage>.</citation>
</ref>
<ref id="B13">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Codina</surname> <given-names>N.</given-names></name> <name><surname>Valenzuela</surname> <given-names>R.</given-names></name> <name><surname>Pestana</surname> <given-names>J. V.</given-names></name> <name><surname>Gonzalez-Conde</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>Relations between student procrastination and teaching styles: autonomy-supportive and controlling</article-title>. <source>Front. Psychol</source>. <volume>9</volume>:<fpage>809</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2018.00809</pub-id><pub-id pub-id-type="pmid">29875731</pub-id></citation></ref>
<ref id="B14">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Gao</surname> <given-names>T.</given-names></name> <name><surname>Yao</surname> <given-names>X.</given-names></name> <name><surname>Chen</surname> <given-names>D.</given-names></name></person-group> (<year>2021</year>). <article-title>Simcse: Simple contrastive learning of sentence embeddings</article-title>. <source>arXiv preprint arXiv</source>:2104.08821.</citation>
</ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kershner</surname> <given-names>J. R.</given-names></name></person-group> (<year>2020</year>). <article-title>Neuroscience and education: cerebral lateralization of networks and oscillations in dyslexia</article-title>. <source>Laterality</source> <volume>25</volume>, <fpage>109</fpage>&#x02013;<lpage>125</lpage>. <pub-id pub-id-type="doi">10.1080/1357650X.2019.1606820</pub-id><pub-id pub-id-type="pmid">30987535</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kucian</surname> <given-names>K.</given-names></name> <name><surname>Loenneker</surname> <given-names>T.</given-names></name> <name><surname>Dietrich</surname> <given-names>T.</given-names></name> <name><surname>Dosch</surname> <given-names>M.</given-names></name> <name><surname>Martin</surname> <given-names>E.</given-names></name> <name><surname>Von Aster</surname> <given-names>M.</given-names></name></person-group> (<year>2006</year>). <article-title>Impaired neural networks for approximate calculation in dyscalculic children: a functional mri study</article-title>. <source>Behav. Brain Funct</source>. <volume>2</volume>, <fpage>1</fpage>&#x02013;<lpage>17</lpage>. <pub-id pub-id-type="doi">10.1186/1744-9081-2-31</pub-id><pub-id pub-id-type="pmid">16953876</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lent</surname> <given-names>R. W.</given-names></name> <name><surname>Lopez</surname> <given-names>F. G.</given-names></name> <name><surname>Bieschke</surname> <given-names>K. J.</given-names></name></person-group> (<year>1993</year>). <article-title>Predicting mathematics-related choice and success behaviors: test of an expanded social cognitive model</article-title>. <source>J. Vocat. Behav</source>. <volume>42</volume>, <fpage>223</fpage>&#x02013;<lpage>236</lpage>. <pub-id pub-id-type="doi">10.1006/jvbe.1993.1016</pub-id></citation>
</ref>
<ref id="B18">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>S.</given-names></name> <name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Shang</surname> <given-names>X.</given-names></name> <name><surname>Zhang</surname> <given-names>Z.</given-names></name></person-group> (<year>2021</year>). <article-title>Protics reveals prognostic impact of tumor infiltrating immune cells in different molecular subtypes</article-title>. <source>Brief Bioinform</source>. <volume>22</volume>:<fpage>bbab164</fpage>. <pub-id pub-id-type="doi">10.1093/bib/bbab164</pub-id><pub-id pub-id-type="pmid">33963834</pub-id></citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mammarella</surname> <given-names>I. C.</given-names></name> <name><surname>Caviola</surname> <given-names>S.</given-names></name> <name><surname>Giofr&#x000E8;</surname> <given-names>D.</given-names></name> <name><surname>Sz&#x00171;cs</surname> <given-names>D.</given-names></name></person-group> (<year>2018</year>). <article-title>The underlying structure of visuospatial working memory in children with mathematical learning disability</article-title>. <source>Br. J. Dev. Psychol</source>. <volume>36</volume>, <fpage>220</fpage>&#x02013;<lpage>235</lpage>. <pub-id pub-id-type="doi">10.1111/bjdp.12202</pub-id><pub-id pub-id-type="pmid">28833308</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moeller</surname> <given-names>K.</given-names></name> <name><surname>Willmes</surname> <given-names>K.</given-names></name> <name><surname>Klein</surname> <given-names>E.</given-names></name></person-group> (<year>2015</year>). <article-title>A review on functional and structural brain connectivity in numerical cognition</article-title>. <source>Front. Hum. Neurosci</source>. <volume>9</volume>:<fpage>227</fpage>. <pub-id pub-id-type="doi">10.3389/fnhum.2015.00227</pub-id><pub-id pub-id-type="pmid">26029075</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ng</surname> <given-names>O.-L.</given-names></name> <name><surname>Chan</surname> <given-names>T.</given-names></name></person-group> (<year>2019</year>). <article-title>Learning as making: Using 3d computer-aided design to enhance the learning of shape and space in stem-integrated ways</article-title>. <source>Br. J. Educ. Technol</source>. <volume>50</volume>, <fpage>294</fpage>&#x02013;<lpage>308</lpage>. <pub-id pub-id-type="doi">10.1111/bjet.12643</pub-id></citation>
</ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peng</surname> <given-names>J.</given-names></name> <name><surname>Guan</surname> <given-names>J.</given-names></name> <name><surname>Hui</surname> <given-names>W.</given-names></name> <name><surname>Shang</surname> <given-names>X.</given-names></name></person-group> (<year>2021a</year>). <article-title>A novel subnetwork representation learning method for uncovering disease-disease relationships</article-title>. <source>Methods</source> <volume>192</volume>, <fpage>77</fpage>&#x02013;<lpage>84</lpage>. <pub-id pub-id-type="doi">10.1016/j.ymeth.2020.09.002</pub-id><pub-id pub-id-type="pmid">32946974</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peng</surname> <given-names>J.</given-names></name> <name><surname>Han</surname> <given-names>L.</given-names></name> <name><surname>Shang</surname> <given-names>X.</given-names></name></person-group> (<year>2021b</year>). <article-title>A novel method for predicting cell abundance based on single-cell rna-seq data</article-title>. <source>BMC Bioinformatics</source> <volume>22</volume>, <fpage>1</fpage>&#x02013;<lpage>15</lpage>. <pub-id pub-id-type="doi">10.1186/s12859-021-04187-4</pub-id><pub-id pub-id-type="pmid">34433409</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peng</surname> <given-names>J.</given-names></name> <name><surname>Xue</surname> <given-names>H.</given-names></name> <name><surname>Wei</surname> <given-names>Z.</given-names></name> <name><surname>Tuncali</surname> <given-names>I.</given-names></name> <name><surname>Hao</surname> <given-names>J.</given-names></name> <name><surname>Shang</surname> <given-names>X.</given-names></name></person-group> (<year>2021c</year>). <article-title>Integrating multi-network topology for gene function prediction using deep neural networks</article-title>. <source>Brief Bioinform</source>. <volume>22</volume>, <fpage>2096</fpage>&#x02013;<lpage>2105</lpage>. <pub-id pub-id-type="doi">10.1093/bib/bbaa036</pub-id><pub-id pub-id-type="pmid">32249297</pub-id></citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Robertson</surname> <given-names>J.</given-names></name> <name><surname>Howells</surname> <given-names>C.</given-names></name></person-group> (<year>2008</year>). <article-title>Computer game design: Opportunities for successful learning</article-title>. <source>Comput. Educ</source>. <volume>50</volume>, <fpage>559</fpage>&#x02013;<lpage>578</lpage>. <pub-id pub-id-type="doi">10.1016/j.compedu.2007.09.020</pub-id></citation>
</ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Romano</surname> <given-names>Y.</given-names></name> <name><surname>Sesia</surname> <given-names>M.</given-names></name> <name><surname>Cand&#x000E8;s</surname> <given-names>E.</given-names></name></person-group> (<year>2020</year>). <article-title>Deep knockoffs</article-title>. <source>J. Am. Stat. Assoc</source>. <volume>115</volume>, <fpage>1861</fpage>&#x02013;<lpage>1872</lpage>. <pub-id pub-id-type="doi">10.1080/01621459.2019.1660174</pub-id></citation>
</ref>
<ref id="B27">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Romero</surname> <given-names>C.</given-names></name> <name><surname>Ventura</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>Educational data mining and learning analytics: an updated survey. Wiley Interdiscipl</article-title>. <source>Rev. Data Min. Knowl. Discov</source>. <volume>10</volume>, <fpage>e1355</fpage>. <pub-id pub-id-type="doi">10.1002/widm.1355</pub-id><pub-id pub-id-type="pmid">25855820</pub-id></citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sigman</surname> <given-names>M.</given-names></name> <name><surname>Pe na</surname> <given-names>M.</given-names></name> <name><surname>Goldin</surname> <given-names>A. P.</given-names></name> <name><surname>Ribeiro</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>Neuroscience and education: prime time to build the bridge</article-title>. <source>Nat. Neurosci</source>. <volume>17</volume>, <fpage>497</fpage>&#x02013;<lpage>502</lpage>. <pub-id pub-id-type="doi">10.1038/nn.3672</pub-id><pub-id pub-id-type="pmid">24671066</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Steffe</surname> <given-names>L. P.</given-names></name></person-group> (<year>2017</year>). <article-title>Psychology in mathematics education: past, present, and future</article-title>, in <source>Proceedings of the 39 Annual Meeting of North American Chapter of the International Group for the Psychology of Mathematics Education</source> (<publisher-loc>Indianapolis, IN</publisher-loc>), <fpage>27</fpage>&#x02013;<lpage>56</lpage>.</citation>
</ref>
<ref id="B30">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Troussas</surname> <given-names>C.</given-names></name> <name><surname>Krouska</surname> <given-names>A.</given-names></name> <name><surname>Sgouropoulou</surname> <given-names>C.</given-names></name></person-group> (<year>2020</year>). <article-title>Collaboration and fuzzy-modeled personalization for mobile game-based learning in higher education</article-title>. <source>Comput. Educ</source>. <volume>144</volume>:<fpage>103698</fpage>. <pub-id pub-id-type="doi">10.1016/j.compedu.2019.103698</pub-id></citation>
</ref>
<ref id="B31">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wu</surname> <given-names>C.-L.</given-names></name> <name><surname>Lin</surname> <given-names>T.-J.</given-names></name> <name><surname>Chiou</surname> <given-names>G.-L.</given-names></name> <name><surname>Lee</surname> <given-names>C.-Y.</given-names></name> <name><surname>Luan</surname> <given-names>H.</given-names></name> <name><surname>Tsai</surname> <given-names>M.-J.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>A systematic review of mri neuroimaging for education research</article-title>. <source>Front. Psychol</source>. <volume>12</volume>:<fpage>1763</fpage>. <pub-id pub-id-type="doi">10.3389/fpsyg.2021.617599</pub-id><pub-id pub-id-type="pmid">34093308</pub-id></citation></ref>
<ref id="B32">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Xu</surname> <given-names>D.</given-names></name> <name><surname>Cheng</surname> <given-names>W.</given-names></name> <name><surname>Ni</surname> <given-names>J.</given-names></name> <name><surname>Luo</surname> <given-names>D.</given-names></name> <name><surname>Natsumeda</surname> <given-names>M.</given-names></name> <name><surname>Song</surname> <given-names>D.</given-names></name> <etal/></person-group>. (<year>2021</year>). <article-title>Deep multi-instance contrastive learning with dual attention for anomaly precursor detection</article-title>, in <source>Proceedings of the 2021 SIAM International Conference on Data Mining</source> (<publisher-loc>SIAM</publisher-loc>), <fpage>91</fpage>&#x02013;<lpage>99</lpage>.</citation>
</ref>
<ref id="B33">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zacharopoulos</surname> <given-names>G.</given-names></name> <name><surname>Sella</surname> <given-names>F.</given-names></name> <name><surname>Kadosh</surname> <given-names>R. C.</given-names></name></person-group> (<year>2021</year>). <article-title>The impact of a lack of mathematical education on brain development and future attainment</article-title>. <source>Proc. Natl. Acad. Sci. U.S.A</source>. <volume>118</volume>:<fpage>e2013155118</fpage>. <pub-id pub-id-type="doi">10.1073/pnas.2013155118</pub-id><pub-id pub-id-type="pmid">34099561</pub-id></citation></ref>
<ref id="B34">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>An</surname> <given-names>R.</given-names></name> <name><surname>Cui</surname> <given-names>J.</given-names></name> <name><surname>Shang</surname> <given-names>X.</given-names></name></person-group> (<year>2021a</year>). <article-title>Undergraduate grade prediction in chinese higher education using convolutional neural networks</article-title>, in <source>LAK21: 11th International Learning Analytics and Knowledge Conference</source>, <fpage>462</fpage>&#x02013;<lpage>468</lpage>.</citation>
</ref>
<ref id="B35">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Dai</surname> <given-names>H.</given-names></name> <name><surname>Yun</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>S.</given-names></name> <name><surname>Lan</surname> <given-names>A.</given-names></name> <name><surname>Shang</surname> <given-names>X.</given-names></name></person-group> (<year>2020a</year>). <article-title>Meta-knowledge dictionary learning on 1-bit response data for student knowledge diagnosis</article-title>. <source>Knowl. Based Syst</source>. <volume>205</volume>:<fpage>106290</fpage>. <pub-id pub-id-type="doi">10.1016/j.knosys.2020.106290</pub-id></citation>
</ref>
<ref id="B36">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Dai</surname> <given-names>H.</given-names></name> <name><surname>Yun</surname> <given-names>Y.</given-names></name> <name><surname>Shang</surname> <given-names>X.</given-names></name></person-group> (<year>2019</year>). <article-title>Student knowledge diagnosis on response data via the model of sparse factor learning</article-title>, in <source>International Conference on Educational Data Mining</source> (<publisher-loc>Montreal, CA</publisher-loc>), <fpage>691</fpage>&#x02013;<lpage>694</lpage>.</citation>
</ref>
<ref id="B37">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>He</surname> <given-names>X.</given-names></name> <name><surname>Tian</surname> <given-names>Z.</given-names></name> <name><surname>Jeong</surname> <given-names>J. J.</given-names></name> <name><surname>Lei</surname> <given-names>Y.</given-names></name> <name><surname>Wang</surname> <given-names>T.</given-names></name> <etal/></person-group>. (<year>2020b</year>). <article-title>Multi-needle detection in 3d ultrasound images using unsupervised order-graph regularized sparse dictionary learning</article-title>. <source>IEEE Trans. Med. Imaging</source> <volume>39</volume>, <fpage>2302</fpage>&#x02013;<lpage>2315</lpage>. <pub-id pub-id-type="doi">10.1109/TMI.2020.2968770</pub-id><pub-id pub-id-type="pmid">31985414</pub-id></citation></ref>
<ref id="B38">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Lei</surname> <given-names>Y.</given-names></name> <name><surname>Lin</surname> <given-names>M.</given-names></name> <name><surname>Curran</surname> <given-names>W.</given-names></name> <name><surname>Liu</surname> <given-names>T.</given-names></name> <name><surname>Yang</surname> <given-names>X.</given-names></name></person-group> (<year>2021b</year>). <article-title>Region of interest discovery using discriminative concrete autoencoder for COVID-19 lung ct images</article-title>, in <source>Medical Imaging 2021: Computer-Aided Diagnosis, Vol. 11597</source> (<publisher-name>International Society for Optics and Photonics</publisher-name>), <fpage>115970U</fpage>.</citation>
</ref>
<ref id="B39">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Liu</surname> <given-names>S.</given-names></name></person-group> (<year>2020</year>). <article-title>Integrated sparse coding with graph learning for robust data representation</article-title>. <source>IEEE Access</source> <volume>8</volume>:<fpage>161245</fpage>&#x02013;<lpage>161260</lpage>. <pub-id pub-id-type="doi">10.1109/ACCESS.2020.3021081</pub-id><pub-id pub-id-type="pmid">27295638</pub-id></citation></ref>
<ref id="B40">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Xiang</surname> <given-names>M.</given-names></name> <name><surname>Yang</surname> <given-names>B.</given-names></name></person-group> (<year>2017</year>). <article-title>Low-rank preserving embedding</article-title>. <source>Pattern Recognit</source>. <volume>70</volume>, <fpage>112</fpage>&#x02013;<lpage>125</lpage>. <pub-id pub-id-type="doi">10.1016/j.patcog.2017.05.003</pub-id></citation>
</ref>
<ref id="B41">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Xiang</surname> <given-names>M.</given-names></name> <name><surname>Yang</surname> <given-names>B.</given-names></name></person-group> (<year>2018</year>). <article-title>Hierarchical sparse coding from a bayesian perspective</article-title>. <source>Neurocomputing</source> <volume>272</volume>, <fpage>279</fpage>&#x02013;<lpage>293</lpage>. <pub-id pub-id-type="doi">10.1016/j.neucom.2017.06.076</pub-id></citation>
</ref>
<ref id="B42">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>Y.</given-names></name> <name><surname>Yun</surname> <given-names>Y.</given-names></name> <name><surname>Dai</surname> <given-names>H.</given-names></name> <name><surname>Cui</surname> <given-names>J.</given-names></name> <name><surname>Shang</surname> <given-names>X.</given-names></name></person-group> (<year>2020c</year>). <article-title>Graphs regularized robust matrix factorization and its application on student grade prediction</article-title>. <source>Appl. Sci</source>. <volume>10</volume>, <fpage>1755</fpage>. <pub-id pub-id-type="doi">10.3390/app10051755</pub-id></citation>
</ref>
</ref-list>
</back>
</article>