<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Archiving and Interchange DTD v2.3 20070202//EN" "archivearticle.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="methods-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Educ.</journal-id>
<journal-title>Frontiers in Education</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Educ.</abbrev-journal-title>
<issn pub-type="epub">2504-284X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/feduc.2023.989836</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Education</subject>
<subj-group>
<subject>Methods</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Ecological diversity methods improve quantitative examination of student language in short constructed responses in STEM</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Shiroda</surname> <given-names>Megan</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1680872/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Fleming</surname> <given-names>Michael P.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1860292/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Haudek</surname> <given-names>Kevin C.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1483864/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>CREATE for STEM Institute, Michigan State University</institution>, <addr-line>East Lansing, MI</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>Department of Biological Sciences, California State University Stanislaus, One University Circle</institution>, <addr-line>Turlock, CA</addr-line>, <country>United States</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Biochemistry and Molecular Biology, Michigan State University</institution>, <addr-line>East Lansing, MI</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Subramaniam Ramanathan, Nanyang Technological University, Singapore</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Liangliang Zhang, Case Western Reserve University, United States; Jiawei Xiong, University of Georgia, United States; Hyo Jeong Shin, Sogang University, Republic of Korea</p></fn>
<corresp id="c001">&#x002A;Correspondence: Megan Shiroda, <email>shirodam@msu.edu</email></corresp>
<fn fn-type="other" id="fn004"><p>This article was submitted to Assessment, Testing and Applied Measurement, a section of the journal Frontiers in Education</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>02</day>
<month>02</month>
<year>2023</year>
</pub-date>
<pub-date pub-type="collection">
<year>2023</year>
</pub-date>
<volume>8</volume>
<elocation-id>989836</elocation-id>
<history>
<date date-type="received">
<day>08</day>
<month>07</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>09</day>
<month>01</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2023 Shiroda, Fleming and Haudek.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Shiroda, Fleming and Haudek</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>We novelly applied established ecology methods to quantify and compare language diversity within a corpus of short written student texts. Constructed responses (CRs) are a common form of assessment but are difficult to evaluate using traditional methods of lexical diversity due to text length restrictions. Herein, we examined the utility of ecological diversity measures and ordination techniques to quantify differences in short texts by applying these methods in parallel to traditional text analysis methods to a corpus of previously studied college student CRs. The CRs were collected at two time points (Timing), from three types of higher-ed institutions (Type), and across three levels of student understanding (Thinking). Using previous work, we were able to predict that we would observe the most difference based on Thinking, then Timing and did not expect differences based on Type allowing us to test the utility of these methods for categorical examination of the corpus. We found that the ecological diversity metrics that compare CRs to each other (Whittaker&#x2019;s beta, species turnover, and Bray&#x2013;Curtis Dissimilarity) were informative and correlated well with our predicted differences among categories and other text analysis methods. Other ecological measures, including Shannon&#x2019;s and Simpson&#x2019;s diversity, measure the diversity of language within a single CR. Additionally, ordination provided meaningful visual representations of the corpus by reducing complex word frequency matrices to two-dimensional graphs. Using the ordination graphs, we were able to observe patterns in the CR corpus that further supported our predictions for the data set. This work establishes novel approaches to measuring language diversity within short texts that can be used to examine differences in student language and possible associations with categorical data.</p>
</abstract>
<kwd-group>
<kwd>text analysis</kwd>
<kwd>ecological diversity</kwd>
<kwd>constructed response</kwd>
<kwd>assessment</kwd>
<kwd>student thinking</kwd>
<kwd>ordination</kwd>
</kwd-group>
<counts>
<fig-count count="3"/>
<table-count count="3"/>
<equation-count count="6"/>
<ref-count count="60"/>
<page-count count="16"/>
<word-count count="15172"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="intro">
<title>1. Introduction</title>
<sec id="S1.SS1">
<title>1.1. Assessment of student thinking in STEM through constructed response</title>
<p>Assessment of student understanding and skills is an essential component of teaching, learning, and education research. For this reason, science education standards have pushed for increased use of assessment practices that test authentic scientific practices, such as constructing explanations, and assessments that measure knowledge-in-use (<xref ref-type="bibr" rid="B32">NGSS Lead States, 2013</xref>; <xref ref-type="bibr" rid="B11">Gerard and Linn, 2016</xref>; <xref ref-type="bibr" rid="B23">Krajcik, 2021</xref>). Constructed responses (CRs) are an increasingly used type of assessment that provide valuable insight to both instructors and researchers, as students express their understanding or demonstrate their ability using their own words (<xref ref-type="bibr" rid="B3">Birenbaum et al., 1992</xref>; <xref ref-type="bibr" rid="B31">Nehm and Schonfeld, 2008</xref>; <xref ref-type="bibr" rid="B11">Gerard and Linn, 2016</xref>). Through CRs, students reveal differing levels of performance, complex thinking, and unexpected language in a variety of STEM topics including evolution (<xref ref-type="bibr" rid="B30">Nehm and Reilly, 2007</xref>), tracking mass across scales (<xref ref-type="bibr" rid="B44">Sripathi et al., 2019</xref>), statistics (<xref ref-type="bibr" rid="B21">Kaplan et al., 2014</xref>), mechanistic reasoning in chemistry and genetics (<xref ref-type="bibr" rid="B33">Noyes et al., 2020</xref>; <xref ref-type="bibr" rid="B48">Uhl et al., 2022</xref>), and covariational reasoning (<xref ref-type="bibr" rid="B39">Scott et al., 2022</xref>). Due to their value and expanded use, it is increasingly important for assessment developers and researchers to have methods to carefully and quantitatively examine the language within CRs. Such methods could allow for comparison of expert and novice language, determine if substantial differences in student language occur due to instruction, regions or institutional type, or help examine bias in written assessments. Unfortunately, quantitative methods of examining and comparing the words within corpuses of short texts, such as CRs, are limited.</p>
</sec>
<sec id="S1.SS2">
<title>1.2. Current methods of written language analysis and their limitations</title>
<p>Text analysis falls into two major categories: qualitative and quantitative. For qualitative text analysis, researchers typically use &#x201C;coding,&#x201D; in which expert coders categorize &#x201C;the text in order to establish a framework of thematic ideas about it&#x201D; (p. 38; <xref ref-type="bibr" rid="B12">Gibbs, 2007</xref>). Coding is the most common approach for qualitative analysis in content based CRs in STEM, as it gives insight into student thinking by examining student produced text or words. In previous work with CRs, coding has reflected various frameworks in STEM, including cognitive models such as learning progressions (<xref ref-type="bibr" rid="B18">Jescovitch et al., 2021</xref>; <xref ref-type="bibr" rid="B39">Scott et al., 2022</xref>), the use of scientific skills (<xref ref-type="bibr" rid="B49">Uhl et al., 2021</xref>; <xref ref-type="bibr" rid="B60">Zhai et al., 2022</xref>), or the presence of key conceptual ideas (<xref ref-type="bibr" rid="B31">Nehm and Schonfeld, 2008</xref>, <xref ref-type="bibr" rid="B44">Sripathi et al., 2019</xref>; <xref ref-type="bibr" rid="B33">Noyes et al., 2020</xref>). Qualitative coding can be done by reading the responses or using text mining programs that use computer-based dictionaries and natural language processing to pull out themes from the text. Through these qualitative methods, researchers often observe words or phrases that are associated with the coding of the text. These observations can often be statistically supported using quantitative analysis. Quantitative text analysis is typically performed <italic>via</italic> content or dictionary analysis, in which the text is reduced to word and phrase frequency lists that can be examined and/or compared between CRs or groupings of the CRs that are based on the qualitative coding. These types of analyses can be useful; however, these approaches do not examine the CRs holistically or examine the diversity of language used. While dictionary analysis allows for comparison of individual words or phrases between groups, this analysis seems overly reductive, since the words and phrases are typically interpreted as a part of the overall response by human coders. To assist with this gap, machine learning and natural language processing have also been used to better analyze texts for meaning (<xref ref-type="bibr" rid="B5">Boumans and Trilling, 2016</xref>). One approach currently used in text analysis to holistically examine language is through latent semantic analysis (LSA). LSA uses natural language processing and machine learning to compare the language in different texts to each other based on the words within the texts (<xref ref-type="bibr" rid="B9">Deerwester et al., 1990</xref>; <xref ref-type="bibr" rid="B24">Landauer and Psotka, 2000</xref>). While this method and others related to it have been used to help identify themes in CRs (<xref ref-type="bibr" rid="B44">Sripathi et al., 2019</xref>) and even in the creation of computer scoring models for automated analysis of student thinking (<xref ref-type="bibr" rid="B26">LaVoie et al., 2020</xref>), their purpose is to identify meaning or common topics in the text. The identified themes or topics must be interpreted for relevance by an expert in the domain. In contrast, we are interested in comparing and quantifying the diversity of words students use in written explanations.</p>
<p>Our interest in comparing the words students use could also be approached through lexical diversity, which measures the range of words in a given text, with high lexical diversity values indicating more varied language (<xref ref-type="bibr" rid="B17">Jarvis, 2013</xref>). Many lexical diversity measures, most commonly Type to Token (TTR) and several derivatives, calculate the proportion of words in a text that are unique. These measures are helpful predictors of linguistic traits, including vocabulary and language proficiency (<xref ref-type="bibr" rid="B27">Malvern et al., 2004</xref>; <xref ref-type="bibr" rid="B51">Voleti et al., 2020</xref>). Unfortunately, these lexical diversity measures cannot be applied to CRs, as many are sensitive to the text length and cannot be applied to texts under 100 words (<xref ref-type="bibr" rid="B47">Tweedie and Baayen, 1998</xref>; <xref ref-type="bibr" rid="B22">Koizumi, 2012</xref>; <xref ref-type="bibr" rid="B7">Choi and Jeong, 2016</xref>). Although some lexical diversity measures, such as MATTR (<xref ref-type="bibr" rid="B8">Covington and McFall, 2010</xref>; <xref ref-type="bibr" rid="B59">Zenker and Kyle, 2021</xref>), allow use of shorter texts of 50&#x2013;100 words, most content-based CRs in STEM can frequently be as short as 25&#x2013;35 words (<xref ref-type="bibr" rid="B15">Haudek et al., 2012</xref>; <xref ref-type="bibr" rid="B41">Shiroda et al., 2021</xref>). Beyond the length requirement, we find these lexical measures somewhat lacking for our intended use in that they do not present a full picture of diversity, as they only measure the repetition of words within a single response. In contrast to linguistics for which repetition does often indicate language proficiency, word repetition is not necessarily indicative of proficiency in STEM assessments. This could be especially true when considering the importance of discipline specific language which restricts word choice. In particular, we are interested in holistically comparing responses to one another based on word frequency. Such an approach could be used to determine if certain variables (e.g., question prompt, timing) are associated with more similar or varied language in student CRs.</p>
<p>Quantifying such diversity between two CRs or within a group of CRs is more similar to measures of ecological diversity than any current form of text analysis. Indeed, <xref ref-type="bibr" rid="B17">Jarvis (2013)</xref> previously compared lexical diversity to ecological diversity (ED) approaches and proposed applying ecological definitions and practices to texts. Within his work, Jarvis comments, &#x201C;Both fields view diversity as a matter of complexity, but ecologists have gone much further in modeling and developing measures for the different aspects of that complexity. Ecologists have also held to a literal and intuitive understanding of diversity, and this has resulted in a highly developed, intricate picture of what diversity entails.&#x201D; (p. 99; 2013). Indeed, ED metrics quantify not only diversity within a sample but between samples within data sets. Further, ecologists also commonly use a data reduction technique called ordination to explore data sets and test hypotheses. To our knowledge, this idea of applying ecological methods to language has never been empirically tested and its application to a corpus of short, content rich CRs is novel.</p>
</sec>
<sec id="S1.SS3">
<title>1.3. Ecological diversity metrics</title>
<p>In ecology, Robert Whittaker articulated three diversity metrics that are now central to ecology: alpha, gamma, and beta diversity (<xref ref-type="fig" rid="F1">Figure 1A</xref>, <xref ref-type="bibr" rid="B55">Whittaker, 1972</xref>). Alpha (&#x03B1; or species richness) diversity is the count of the number of species in a sample. This idea is similar to counting unique words (also called Types in lexical diversity) in a CR. For example, as shown in <xref ref-type="fig" rid="F1">Figure 1A</xref>, Sample A has a higher alpha than Sample B. Both samples have 4 individuals, but all four in A are unique, while Sample B has three of the same species. Gamma (&#x03B3;) is the count of the total number of species in a pair or set of samples, similar to the total words (also called Tokens in lexical diversity) in a CR. Beta diversity (&#x03B2;) compares the species occurrences between samples (<xref ref-type="bibr" rid="B53">Whittaker, 1967</xref>, <xref ref-type="bibr" rid="B54">1969</xref>) and does not have an equivalent in lexical diversity or text analysis. This is the simplest calculation of &#x03B2; diversity; however, other metrics can be used to represent this kind of relatedness, including absolute species turnover (<xref ref-type="bibr" rid="B46">Tuomisto, 2010</xref>; <xref ref-type="bibr" rid="B28">McCune and Mefford, 2018</xref>). The species turnover measure uses presence-absence data of species in samples and is considered a better indicator of relatedness than &#x03B2;, as &#x03B2; can be heavily affected by rare species (<xref ref-type="bibr" rid="B50">Vellend, 2001</xref>; <xref ref-type="bibr" rid="B25">Lande, 1996</xref>). Another method of comparing two or more samples is using dissimilarity measures, such as Bray&#x2013;Curtis dissimilarity (<xref ref-type="bibr" rid="B6">Bray and Curtis, 1957</xref>). This is calculated by comparing every pair of species within two samples. While these measures may appear redundant, each can be biased in different ways (<xref ref-type="bibr" rid="B38">Roswell et al., 2021</xref>). Examining a collection of diversity metrics results in a more equitable description of the data, in much the same way that mean, median, and mode all offer different values for a measure of central tendency (<xref ref-type="bibr" rid="B58">Zelen&#x00FD;, 2021</xref>).</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Schematics of ecological diversity terms. <bold>(A)</bold> For ecological diversity, three samples (open circles) are shown with differing numbers of individuals, representing a different species (filled shapes). Alpha values are given for each sample, and beta values are given for each pairing and the overall data set. Example calculations are provided for beta between Sample A and B and the data set overall. <bold>(B)</bold> For language applications, responses are compared instead of samples, while words are treated as individuals. Repeated words are equivalent to being the same species. While only single sentences are shown here, our data set contains many CRs that contain more than one sentence that are still treated as single samples. Alpha values are given for each response, and beta values are given for each pairing and the overall data set. Example calculations are provided for beta between response A and B and the data set overall.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="feduc-08-989836-g001.tif"/>
</fig>
<p>In addition to comparing species between samples, other measures examine the diversity of individual communities or samples. These types of measures include Evenness (E), Shannon&#x2019;s diversity index (H&#x2019;; <xref ref-type="bibr" rid="B40">Shannon, 1948</xref>) and Simpson&#x2019;s diversity index (D; <xref ref-type="bibr" rid="B43">Simpson, 1949</xref>). Evenness describes the proportional abundance of species across a given sample and indicates if a sample is dominated by one or a few species. Similar to Whittaker&#x2019;s &#x03B2;, species turnover and Bray&#x2013;Curtis Dissimilarity, H&#x2019; and D both represent the diversity of a single community or sample but are calculated slightly differently. H&#x2019; represents the certainty of predicting a <italic>single</italic> species of a randomly selected individual, while D is the probability of two random species being the same. Each measure has potential biases associated with it, resulting in most researchers examining both metrics for a clearer picture of the data (<xref ref-type="bibr" rid="B58">Zelen&#x00FD;, 2021</xref>).</p>
</sec>
<sec id="S1.SS4">
<title>1.4. Ecological diversity visualization</title>
<p>In addition to diversity metrics, ecological studies also apply ordination methods to visualize and extract patterns from complex data (<xref ref-type="bibr" rid="B10">Gauch, 1982</xref>; <xref ref-type="bibr" rid="B45">Syms, 2008</xref>; <xref ref-type="bibr" rid="B34">Palmer, n.d.</xref>). Ordination methods use dimension reduction to project multivariate data into two or three dimensions that can be visualized in a map-like graph. This technique arranges samples with greater similarity more closely to each other as points in the graph, while samples with lower similarity are further apart. These ordination methods are often used in combination with ED metrics as the ordination techniques provide unique benefits. First, diversity is complex in a way that an individual measure or even a collection of measures do not fully relate to the whole of an object. <xref ref-type="bibr" rid="B19">Jost (2006)</xref> said, &#x201C;a diversity index itself is not necessarily a &#x2018;diversity.&#x2019; The radius of a sphere is an index of its volume but is not itself the volume and using the radius in place of the volume in engineering equations will give dangerously misleading results&#x201D; (p. 363). Ordination attempts to collapse the diversity in a different way compared to ED metrics through extracting patterns while attempting to account for as much variation in the data as possible. Second, extracting and prioritizing patterns that best explain the data focuses researchers on the most important patterns, allowing them to ignore noise in the data. Ecologists have found that even if ordinations result in a low percentage of variance in the data being explained, the ordinations are still meaningful and, more importantly, provide insight into the system being studied (<xref ref-type="bibr" rid="B13">Goodrich et al., 2014</xref>). Third, different patterns can be observed when a data set is examined holistically as opposed to examination of categorical sub-groups. In comparison, ED metrics need to be calculated by defining subsets of the data to obtain a single value for categorical data, while ordination analysis is performed on the entire data set and categorical data is overlaid. Finally, ordination results in an intuitive graph whose patterns can be more easily interpreted to better understand communities and how they relate to each other. For these reasons, ordination is used in diverse fields including image analysis, psychology, education research, and text analysis. Within education research, <xref ref-type="bibr" rid="B14">Graesser et al. (2011)</xref> used ordination to examine attributes of long texts in order to curate reading assignments for students. <xref ref-type="bibr" rid="B4">Borges et al. (2018)</xref> proposed the use of ordination to predict student performance and gain understanding of important student attributes, while another group used ordination to create models to evaluate teacher quality (<xref ref-type="bibr" rid="B42">Si, 2006</xref>; <xref ref-type="bibr" rid="B56">Xian et al., 2016</xref>).</p>
<p>For any of these applications, a data matrix is created that contains the objects of interest as rows and their attributes as columns. In ecological work, the matrix contains rows as samples and columns are species recorded in these samples (<xref ref-type="fig" rid="F2">Figure 2A</xref>). The species in each row are compared for every pair in the matrix, resulting in a pairwise comparison of the entire matrix. The resulting distance or similarity values are a necessary prerequisite for distance-based ordination methods [ex: principal coordinate analysis (PCoA)] and eigen analysis-based methods [ex: detrended correspondence analysis (DCA)], both of which we use in this work. The patterns found in these data are used to create a map-like visualization that projects the distances or similarities between samples in two or three dimensions. While the idea of ordination is maintained, different methods of ordination vary in how they work. Each has their own strengths and weaknesses; therefore, it is common in ecology to apply multiple ordination methods in order to strengthen the conclusions made <italic>via</italic> one method. Selection between the different methods is based on the overarching question being investigated, the qualities of the data matrix, and the advantages or disadvantages of each method (<xref ref-type="bibr" rid="B36">Peck, 2010</xref>; <xref ref-type="bibr" rid="B28">McCune and Mefford, 2018</xref>; <xref ref-type="bibr" rid="B35">Palmer, 2019</xref>). Ordination methods fall into two general categories: indirect (unconstrained) and direct (constrained) methods (<xref ref-type="bibr" rid="B45">Syms, 2008</xref>). Indirect ordination is used to explore data for patterns from a species matrix (described above), while direct ordination is used to test if patterns in the species matrix are attributable to a secondary matrix of data (measured environmental factors associated with samples). In general, indirect ordination is considered exploratory and is used to generate hypotheses, while direct ordination is confirmatory and used to test hypotheses. Since we want to use ordination methods to explore our data set, we selected only indirect methods of ordination. When selecting a specific ordination method, it is important to recognize the limitations of the method and the data itself. For example, many ordination methods, including Principal Component Analysis (PCA) and Non-metric multidimensional scaling (NMDS), do not handle high numbers of zeros in the data set well (<xref ref-type="bibr" rid="B36">Peck, 2010</xref>). However, high-zero data exists in many instances, and methods exist to circumvent this limitation, including DCA and PCoA.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Sample matrices. <bold>(A)</bold> For ecological data matrices, samples are rows, while species are columns. Values in individual cells are the frequency of the given species in the sample. <bold>(B)</bold> In this example, each response is a row, while each word is a column. Values in cells are the frequency of a word within the response.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="feduc-08-989836-g002.tif"/>
</fig>
</sec>
<sec id="S1.SS5">
<title>1.5. Applying ecological methods to language analysis and its potential benefits</title>
<p>Addressing the challenge of language analysis and comparisons for short texts, we propose applying ecological methods of diversity analysis to a corpus of CRs, in which each individual response is equivalent to a sample, and each word is analogous to a species within that sample. In <xref ref-type="fig" rid="F1">Figure 1B</xref>, each response is a single sentence; however, in our data set, CRs can range from one word to multiple sentences. They are still counted as a single CR. Similarly, for each of the measures described above, we substitute the species with unique words in a single CR. With this application, &#x03B1; is the count of unique words in a CR and &#x03B3; is the total abundance of words in a pair or larger grouping of responses. &#x03B2; diversity reflects differences in word inclusion between two responses (<xref ref-type="fig" rid="F1">Figure 1B</xref>). H&#x2019; and D are similar to the lexical diversity measures (e.g., TTR and its derivatives) described above. However, in contrast, H&#x2019; and D do not have specific cutoffs for their use with smaller sample sizes (i.e., number of words in a CR). Low alpha data sets are common in ecology as some environments do not support a large variety of species (e.g., <xref ref-type="bibr" rid="B38">Roswell et al., 2021</xref>). Similarly, it is common to observe large differences in &#x03B1; within ecological samples. These differences are often accounted for using a standardization method, such as equalizing effort, sample size or coverage. In this work, we are using an equalizing effort approach in that each student was presented the same opportunity (assessment item and online text box) to supply their CR (sample). However, it is important to note that ED metrics are still sensitive to &#x03B1; as many are calculated using &#x03B1; either directly or indirectly. They should therefore be interpreted carefully if there are stark differences in &#x03B1;. In addition to offering a solution to the length requirement of lexical diversity measures, Whittaker&#x2019;s &#x03B2;, species turnover, and Bray&#x2013;Curtis Dissimilarity allow holistic comparison of the CRs to each other in a way that no current text analysis methods do.</p>
<p>Ordination methods add to this holistic comparison by visualizing language differences in the CR corpus. To accomplish this, each CR is a row in our matrix and each column is the frequency of that word in the CR, similar to a term-document matrix in text analyses (<xref ref-type="fig" rid="F2">Figure 2B</xref>). The nature of a large corpus of CRs results in a high number of zeros as the majority of words are used infrequently, resulting in a sparse data set. The high percentage of zeros results in a non-normal distribution of the data, restricting the ordination methods that can be used. However, these types of data sets are increasingly common with microbial diversity studies, which established best practices for sparse data sets, including Principal Coordinate Analysis (PCoA). We elected to use this method because it is most commonly used for sparse data but note one potential drawback in its utility for language diversity in comparison to an ecological study. PCoA ignores zero-zero pairs (when two separate rows being compared each have matching zero values). In ecology, zeros can mean that a species was not detected or that the species is truly not present, making it, in a way, favorable to ignore them. In comparison, with language a zero represents a known absence, and this absence can be as important as its presence. To ensure ignoring zero-zero pairs does not drastically change the observed patterns, we also applied another ordination approach. DCA is one of the most widely used methods in ecology (<xref ref-type="bibr" rid="B35">Palmer, 2019</xref>, <xref ref-type="bibr" rid="B34">Palmer, n.d.</xref>). This method is a type of Correspondence Analysis (CA) that reduces the dimensionality of a data set with categorical data. In addition to handling sparse data, this method has an additional benefit for our purposes as the <italic>x</italic>-axis is uniquely scaled in beta-diversity units, which allows users to calculate species turnover. In combination, DCA and PCoA complement each other and provide unique approaches that together support the results of the other. These approaches to diversity are similar to other types of text analysis techniques, including LSA described above, which can be visualized using ordination techniques similar to those described above. An important difference is that these DCA and PCoA techniques do not attempt to extract meaning from the texts and instead compare and contrast responses based solely on word frequencies without any weighting or dictionaries. This distinction is important to our goals because we are interested in measuring language diversity, not meaning.</p>
<p>Finally, in addition to the methods themselves, we appreciate the approach of ecology in interpreting diversity. Specifically, each metric is treated as a single view of the diversity, meaning that interpretation of diversity is done by taking into account each measure to provide a more comprehensive picture (<xref ref-type="bibr" rid="B19">Jost, 2006</xref>). This multifaceted approach will allow for full appreciation of the diversity of language students use in STEM CRs and will be more likely to reveal differences observed based on categorical data.</p>
</sec>
<sec id="S1.SS6">
<title>1.6. Present study</title>
<p>To test the application of ecological methods in analysis of short CRs, we utilized a corpus of 418 explanatory CRs collected from undergraduates that explore student understanding of the Pathways and Transformations Energy and Matter (<xref ref-type="bibr" rid="B1">American Association for the Advancement of Science, 2011</xref>) within the context of human weight loss. The question asks &#x201C;You have a friend who lost 15 pounds on a diet. Where did the mass go?&#x201D; We chose this data set as we have worked heavily with it and are very familiar with the language within the student CRs. Additionally, this corpus has three types of categorical data that can be used to test the method&#x2019;s ability to find differences in corpus based on word usage, as we have expectations on which categories are likely to have different language. First, the CRs were previously coded for the presence or absence of seven ideas, categorized as normative (correct) or non-normative (na&#x00EF;ve) (<xref ref-type="table" rid="T1">Table 1</xref>; <xref ref-type="bibr" rid="B44">Sripathi et al., 2019</xref>). Using the presence and absence of these ideas, the CRs can be further categorized into Developing, Mixed, or Scientific Thinking (<xref ref-type="bibr" rid="B44">Sripathi et al., 2019</xref>). We expect this categorization to result in the greatest difference in language as the ideas in the CRs should directly reflect the ideas written by students. In addition, these CRs were collected before and after an online tutorial on cellular respiration (Timing), and from three different institutional Types (<xref ref-type="bibr" rid="B41">Shiroda et al., 2021</xref>; <xref ref-type="bibr" rid="B49">Uhl et al., 2021</xref>). We have previously found that student performance was affected by engaging with the tutorial (<xref ref-type="bibr" rid="B49">Uhl et al., 2021</xref>) and therefore expect some differences in language to be observed based on Timing. In previous work, we did not observe striking differences in student ideas based on the institutional type [i.e., Research Intensive Colleges and Universities (RICUs); Primarily Undergraduate Institutions (PUI), and Two Year Colleges (TYCs)]; therefore, we are expecting these categories to result in the lowest language differences in this analysis.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>Coding rubric and description.</p></caption>
<table cellspacing="5" cellpadding="5" frame="box" rules="all">
<thead>
<tr>
<td valign="top" align="left" style="color:#ffffff;background-color: #7f8080;">Rubric idea</td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;">Brief description</td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;">Example responses</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">Correct Products<xref ref-type="table-fn" rid="t1fns1"><sup>N</sup></xref></td>
<td valign="top" align="center">Responses in this category include the idea that the products of cellular respiration, primarily carbon dioxide in any form are the result of mass loss.</td>
<td valign="top" align="center">The mass went to <underline>water and CO<sub>2</sub></underline>.</td>
</tr>
<tr>
<td valign="top" align="left">Exhalation<xref ref-type="table-fn" rid="t1fns1"><sup>N</sup></xref></td>
<td valign="top" align="center">Responses in this category include the idea that excess mass is exhaled or exits the body.</td>
<td valign="top" align="center">As glucose was burned off the mass was also <underline>shed in the form of CO<sub>2</sub></underline> and H<sub>2</sub>0 (sweat)</td>
</tr>
<tr>
<td valign="top" align="left">Molecular Mechanism<xref ref-type="table-fn" rid="t1fns1"><sup>N</sup></xref></td>
<td valign="top" align="center">Responses in this category include the idea that mass loss occurs due to correct molecular processes (e.g., cellular metabolism, beta oxidation), or describe these processes in specific detail.</td>
<td valign="top" align="center">That mass was broken down into energy that was used through <underline>cellular respiration</underline>.</td>
</tr>
<tr>
<td valign="top" align="left">General Metabolism<xref ref-type="table-fn" rid="t1fns1"><sup>NN</sup></xref></td>
<td valign="top" align="center">Responses in this category include the idea that mass loss occurs due to some kind of molecular conversion, even if it is only partially correct.</td>
<td valign="top" align="center"><underline>Fats are converted into glucose, glucose is then broken down into energy and CO<sub>2</sub></underline>, which then get expelled when you breathe.</td>
</tr>
<tr>
<td valign="top" align="left">Matter to Energy<xref ref-type="table-fn" rid="t1fns1"><sup>NN</sup></xref></td>
<td valign="top" align="center">Responses in this category include the idea that mass loss occurs through vague conversions from matter to energy.</td>
<td valign="top" align="center">Because the friend is not taking in as much as they had before, <underline>the body turned the mass into energy</underline> to do work.</td>
</tr>
<tr>
<td valign="top" align="left">Excretion<xref ref-type="table-fn" rid="t1fns1"><sup>NN</sup></xref></td>
<td valign="top" align="center">Responses in this category state that the mass is excreted out of the body. Responses must specifically indicate the physiological process of excretion by explicitly using the term &#x201C;excreted&#x201D; or similar or indicating physiological waste (i.e., sweat, feces or urine) in their responses.</td>
<td valign="top" align="center">I think the friend must have <underline>gone to the bathroom and either pooped or peed it out.</underline></td>
</tr>
<tr>
<td valign="top" align="left">How to Lose Weight<xref ref-type="table-fn" rid="t1fns1"><sup>NN</sup></xref></td>
<td valign="top" align="center">Responses in this category include ideas about societal discussions of weight loss, such as &#x201C;calories in&#x201D; greater than &#x201C;calories out&#x201D; or exercise.</td>
<td valign="top" align="center">It was lost due <underline>to a lower caloric intake</underline>.</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="t1fns1"><p>Rubric ideas are marked with superscript to denote if ideas are normative (N) or non-normative (NN). These ideas are used to categorize CRs into Thinking categories. Developing Thinking responses contain one or more non-normative ideas and no normative ones. Scientific responses contain one or more normative ideas and no non-normative ideas. Mixed responses contain at least one normative and at least one non-normative idea. All categories can occur in the same response with the exception of Molecular Mechanism and General Metabolism. Molecular Mechanism is coded instead of both. Example responses are provided with the important words or phrases for that idea underlined. Spelling is corrected for clarity.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>In this paper, we apply common text analysis techniques to support our expectations that these three categorizations (Thinking, Timing, and Types) have varying amounts of difference in student language. Next, we outline the various methods and ED measures we applied to examine differences in short texts and demonstrate which ED methods reflect the differences in the categorical data to support their use in the analysis of short texts.</p>
</sec>
</sec>
<sec id="S2" sec-type="materials|methods">
<title>2. Materials and methods</title>
<sec id="S2.SS1">
<title>2.1. Constructed response (CR) corpus collection and description</title>
<p>CRs were collected in collaboration with the SimBiotic Company as described by <xref ref-type="bibr" rid="B49">Uhl et al. (2021)</xref>. Subsequently, <xref ref-type="bibr" rid="B41">Shiroda et al. (2021)</xref> examined a subset of 418 student responses. These studies were considered exempt by an institutional review board (x10&#x2013;577). Briefly, college students enrolled in biology courses were asked to write a response to the prompt &#x201C;You have a friend who lost 15 pounds on a diet. Where did the mass go?&#x201D; in an online system. The subset of CRs used by <xref ref-type="bibr" rid="B41">Shiroda et al. (2021)</xref> and in this study are from 239 students from 19 colleges and universities across the USA. <xref ref-type="bibr" rid="B41">Shiroda et al. (2021)</xref> grouped the colleges and universities into three general categories of institutional type: Two Year Colleges (TYCs; <italic>n</italic> = 137), Primarily Undergraduate Institutions (PUIs; <italic>n</italic> = 142), and Research-Intensive Colleges and Universities (RICUs; <italic>n</italic> = 139). This information is reflected in the categorical data as <italic>Type</italic>. Students answered the prompt both before (<italic>n</italic> = 205) and after (<italic>n</italic> = 213) completing an online tutorial on cellular respiration. This information is reflected in the categorical data as <italic>Timing</italic>. For this study, we required that each response had at least one idea assigned to it (described below) to be included in the study. Therefore, student responses are not paired pre- and post-tutorial.</p>
<p>As part of previous work, <xref ref-type="bibr" rid="B41">Shiroda et al. (2021)</xref> coded these CRs using a rubric previously described by <xref ref-type="bibr" rid="B44">Sripathi et al. (2019</xref>; <xref ref-type="table" rid="T1">Table 1</xref>). Each response is dichotomously scored for each of the seven ideas, to indicate the presence (1) or absence (0) of the underlying idea in the rubric (described below). Briefly, a previous study validated ideas predicted for each response using a machine-learning model. As part of that validation process, an expert (MS) with a Ph.D. in biology independently assigned ideas using the rubric for the full set of 418 responses. Human and computer assigned ideas were then compared; any disagreements between human and computer ideas were examined by a second coder (KH) with a Ph.D. in biology. The two human coders discussed all human-human disagreements until agreement was met between the two human coders. The full coding procedure and validation are detailed further in <xref ref-type="bibr" rid="B41">Shiroda et al. (2021)</xref>. This produced a data set with each response having values for seven ideas (i.e., a zero or one for each of seven ideas).</p>
<p>The applied rubric targets seven common ideas used by college students in response to the assessment item: <underline>Correct Molecular Products</underline> (carbon dioxide and water), physiological <underline>Exhalation</underline> (the weight leaves the body <italic>via</italic> exhalation in the form of carbon dioxide and water), and <underline>Molecular Mechanism</underline> (cellular respiration), <italic>General Metabolism, Matter Converted to Energy, How to Lose Weight</italic>, and <italic>Excretion</italic> (described further in <xref ref-type="table" rid="T1">Table 1</xref>). The first three ideas (underlined) are normative or scientific. The last four (italics) are non-normative or na&#x00EF;ve ideas, in that they are not a part of an expert answer (<xref ref-type="bibr" rid="B44">Sripathi et al., 2019</xref>). All ideas can co-occur within the same answer, except General Metabolism and Molecular Mechanism. Molecular Mechanism is more specific than General Metabolism; therefore, Molecular Mechanism is coded in preference to General Metabolism if they both occur in the same CR.</p>
<p>Using these seven ideas, CRs were further categorized into one of three exclusive Thinking groups (Developing, Mixed, or Scientific) based on the inclusion of ideas associated with normative and non-normative ideas (<xref ref-type="bibr" rid="B44">Sripathi et al., 2019</xref>). This information is reflected in the categorical data as <italic>Thinking</italic>. Briefly, Developing responses contain one or more non-normative ideas and no normative ones (<italic>n</italic> = 181). Scientific responses contain one or more normative ideas and no non-normative ideas (<italic>n</italic> = 88). Mixed responses contain at least one normative and at least one non-normative idea (<italic>n</italic> = 149). Responses that have none of the seven coded ideas were not included in the study.</p>
</sec>
<sec id="S2.SS2">
<title>2.2. Text analysis</title>
<p>We compared the frequencies of words within categories of CRs between or among the categories of data (Thinking, Timing, or Type) in WordStat (v.8.0.23, 2004&#x2013;2018, Provalis Research). We used the default program settings including a Word Exclusion list which removes common words and a preprocessing step of stemming (English snowball). Stemming removes the end of a word in order to mitigate the effect of different tenses, singular/plural, and common spelling errors. Words that have undergone stemming are noted in the text as the stemmed root with a dash (e.g., releas-). We did post processing of the text to keep only words with a frequency greater than or equal to 30 in the whole data set, and a maximum of 300 words were kept based on TF-IDF. TF-IDF stands for Term Frequency&#x2013;Inverse Document Frequency and is a common statistic in text analysis used to reflect the importance of a word in a corpus. This measure weights words based on how much they are used but also accounts for those that are consistently used, meaning conjunctions and articles are not prioritized (<xref ref-type="bibr" rid="B37">Rajaraman and Ullman, 2011</xref>). In combination, these are the default settings in WordStat and are a way of focusing the results and preventing finding arbitrary, unmeaningful statistical differences based on chance (<xref ref-type="bibr" rid="B52">Welbers et al., 2017</xref>). Significance was determined by tabulating case occurrence in each grouping using a Chi-square. Words with <italic>p</italic> &#x003C; 0.05 were considered significant.</p>
</sec>
<sec id="S2.SS3">
<title>2.3. Calculations and ED measures</title>
<p>All ED metrics were calculated in PC-ORD (version 7.08; <xref ref-type="bibr" rid="B28">McCune and Mefford, 2018</xref>). An ecological example of these calculations is provided in <xref ref-type="fig" rid="F1">Figure 1A</xref>, while <xref ref-type="fig" rid="F1">Figure 1B</xref> provides a text example. For the work presented in the body of the work, words were stemmed using Snowball (English) to limit the effect of tense. Misspellings were not corrected. No words were excluded. Other processing settings that we tried are described below. The resulting raw matrix has 418 rows (responses) and 694 columns (words).</p>
<p>Richness (S or &#x03B1;) is the number of non-zero elements in a row, or the number of unique words within a single response. Values provided for a categorical group are the averaged values for each response for the group.</p>
<p>Evenness (E) is a way of determining if a species (or word) is more common in an environment (or CR). In other words, a sample that is heavily dominated by a given species or word has a low evenness (0), while a sample that has the exact same frequency of each word has an evenness of 1. For example, in <xref ref-type="fig" rid="F1">Figure 1A</xref>, samples A and C have an evenness of 1 as they are exactly the same. In contrast, sample B is more dominated by triangles, resulting in a lower evenness value. This calculated using the following equation:</p>
<disp-formula id="S2.Ex1">
<mml:math id="M1">
<mml:mrow>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mi>E</mml:mi>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mfrac>
<mml:msup>
<mml:mi>H</mml:mi>
<mml:mo>&#x2032;</mml:mo>
</mml:msup>
<mml:mrow>
<mml:mi>l</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>n</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mi>S</mml:mi>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>Beta diversity (&#x03B2;) compares the species occurrences between samples (<xref ref-type="bibr" rid="B53">Whittaker, 1967</xref>, <xref ref-type="bibr" rid="B54">1969</xref>). A low &#x03B2; value indicates that two samples are very similar in species content, while a high &#x03B2; value indicates two samples are very different. This calculated using the following equation (PC-ORD version 7.08; <xref ref-type="bibr" rid="B28">McCune and Mefford, 2018</xref>; <xref ref-type="fig" rid="F1">Figure 1A</xref>):</p>
<disp-formula id="S2.Ex2">
<mml:math id="M2">
<mml:mrow>
<mml:mrow>
<mml:mpadded width="+3.3pt">
<mml:mi>B</mml:mi>
</mml:mpadded>
<mml:mo rspace="5.8pt">=</mml:mo>
<mml:mrow>
<mml:mfrac>
<mml:mi>&#x03B3;</mml:mi>
<mml:mi>&#x03B1;</mml:mi>
</mml:mfrac>
<mml:mo>-</mml:mo>
<mml:mn>1</mml:mn>
</mml:mrow>
</mml:mrow>
<mml:mo>.</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>In cases where the researcher wishes to compare &#x03B2; between three or more samples, we divide &#x03B3; by the mean of &#x03B1; for all samples. The resulting value is &#x03B2; of all samples and represents how many samples there would be if &#x03B3; and &#x03B1; per sample did not change, and all the samples share no species in common.</p>
<p>Species turnover (also called Absolute Species Turnover or half-change) represents the amount of difference between two samples. A value of one represents 50% of the species being shared and the other 50% being unique. Ecologists often use the term &#x201C;half-change&#x201D; to describe this condition. At two half-changes, 25% of species are shared between two samples. At four half-changes, the two samples are said to essentially not share any species. In contrast to &#x03B2;, there is not a simple relationship between species turnover and S. Species turnover can still be affected by S, but the relationship between the two can be either positive or negative (<xref ref-type="bibr" rid="B57">Yuan et al., 2016</xref>). Species turnover is calculated by the formula:</p>
<disp-formula id="S2.Ex3">
<mml:math id="M3">
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>1</mml:mn>
</mml:msub>
<mml:mo>-</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
<mml:mo>+</mml:mo>
<mml:mrow>
<mml:mo>(</mml:mo>
<mml:mrow>
<mml:msub>
<mml:mi>s</mml:mi>
<mml:mn>2</mml:mn>
</mml:msub>
<mml:mo>-</mml:mo>
<mml:mi>c</mml:mi>
</mml:mrow>
<mml:mo>)</mml:mo>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where s<sub>1</sub> is the number of words in the first CR, s<sub>2</sub> is the number of words in the second CR, and c is the number of words shared by both CRs (PC-ORD version 7.08; <xref ref-type="bibr" rid="B28">McCune and Mefford, 2018</xref>).</p>
<p>Bray&#x2013;Curtis dissimilarity (or Sorensen dissimilarity) is a measure of percent dissimilarity. This measure ranges from 0 to 1, with 0 indicating two samples share all the same species. It is calculated using the formula:</p>
<disp-formula id="S2.Ex4">
<mml:math id="M4">
<mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo>-</mml:mo>
<mml:mfrac>
<mml:mrow>
<mml:mn>2</mml:mn>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>W</mml:mi>
</mml:mrow>
<mml:mrow>
<mml:mi>A</mml:mi>
<mml:mo>+</mml:mo>
<mml:mi>B</mml:mi>
</mml:mrow>
</mml:mfrac>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where W is the sum of shared abundances and A and B are the sums of abundances in individual responses (PC-ORD version 7.08; <xref ref-type="bibr" rid="B28">McCune and Mefford, 2018</xref>).</p>
<p>Shannon&#x2019;s diversity index (H&#x2019;) represents the certainty of predicting a <italic>single</italic> species of a randomly selected individual. This can be affected by both Richness (&#x03B1;) and Evenness. For example, if a sample contains only one species, the uncertainty of selecting that species is 0. This uncertainty can increase in two ways. First, uncertainty increases as more species are added (<xref ref-type="fig" rid="F1">Figure 1A</xref>; sample A vs. C) or by changing evenness (sample A vs. B). If a community is dominated by a single species (low Evenness), it becomes more certain that the dominant species will be selected, thereby decreasing H&#x2019;. It is therefore important when interpreting this measure that both richness and evenness be considered. Generally, this measure is more affected by richness than evenness (<xref ref-type="bibr" rid="B58">Zelen&#x00FD;, 2021</xref>). While not depicted in the figure, H&#x2019; would be calculated individually for Responses A, B, and C and then averaged to obtain a value for a category of responses or the corpus as a whole (<xref ref-type="bibr" rid="B20">Jurasinski et al., 2009</xref>). H&#x2019; is calculated using the formula:</p>
<disp-formula id="S2.Ex5">
<mml:math id="M5">
<mml:mrow>
<mml:mrow>
<mml:mo>-</mml:mo>
<mml:mrow>
<mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>i</mml:mi>
</mml:mpadded>
</mml:mrow>
<mml:mo rspace="5.8pt">&#x00D7;</mml:mo>
<mml:mi>l</mml:mi>
</mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>n</mml:mi>
</mml:mpadded>
<mml:mo>&#x2062;</mml:mo>
<mml:mrow>
<mml:mo stretchy="false">(</mml:mo>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
<mml:mo stretchy="false">)</mml:mo>
</mml:mrow>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where Pi is the proportion of the i-th word in the entire data set (<xref ref-type="bibr" rid="B40">Shannon, 1948</xref>).</p>
<p>Simpson&#x2019;s diversity index (D) is the probability that <italic>two</italic> randomly selected individuals will be the same species. The probability of this decreases as richness increases and increases as evenness decreases (<xref ref-type="bibr" rid="B58">Zelen&#x00FD;, 2021</xref>). As with H&#x2019;, D would be calculated individually for Responses A, B, and C and then averaged to obtain a value for a group of CRs (<xref ref-type="bibr" rid="B20">Jurasinski et al., 2009</xref>). In comparison to H&#x2019;, D is more influenced by evenness than richness. This is calculated using the formula:</p>
<disp-formula id="S2.Ex6">
<mml:math id="M6">
<mml:mrow>
<mml:mrow>
<mml:mn>1</mml:mn>
<mml:mo rspace="7.5pt">-</mml:mo>
<mml:mrow>
<mml:mo largeop="true" movablelimits="false" symmetric="true">&#x2211;</mml:mo>
<mml:mrow>
<mml:mrow>
<mml:mrow>
<mml:mi>P</mml:mi>
<mml:mo>&#x2062;</mml:mo>
<mml:mpadded width="+3.3pt">
<mml:mi>i</mml:mi>
</mml:mpadded>
</mml:mrow>
<mml:mo rspace="5.8pt">&#x00D7;</mml:mo>
<mml:mi>P</mml:mi>
</mml:mrow>
<mml:mo>&#x2062;</mml:mo>
<mml:mi>i</mml:mi>
</mml:mrow>
</mml:mrow>
</mml:mrow>
<mml:mo>,</mml:mo>
</mml:mrow>
</mml:math>
</disp-formula>
<p>where Pi is the proportion of the i-th word in the entire data set (<xref ref-type="bibr" rid="B43">Simpson, 1949</xref>). The value of Simpson&#x2019;s D ranges from 0 to 1, with 0 representing maximum diversity, and one denoting none. As a larger value represents a lower diversity, this is often presented as the inverse Simpson Index, which is calculated by dividing 1 by D. These values are provided in <xref ref-type="supplementary-material" rid="DS1">Supplementary Table 1</xref>.</p>
</sec>
<sec id="S2.SS4">
<title>2.4. Ordination techniques</title>
<p>Ordinations were performed using a curated word matrix that was created using a custom word exclusion list (containing articles, conjunctions, and prepositions) to reduce the number of uninformative, but frequent words (<xref ref-type="table" rid="T2">Table 2</xref>) in the raw matrix described above. We chose to exclude these words to focus the ordination analysis on informative language, pertinent to the science ideas, in the responses. We also excluded any words that did not occur in at least three responses, as patterns cannot be detected with a lower frequency and these words likely represent very infrequent ideas or ways students use ideas in our corpus. The resulting final data matrix or term-document matrix for ordination contained a total of 254 words (columns) and 418 responses (rows). We performed DCA and PCoA in PC-ORD (version 7.08; <xref ref-type="bibr" rid="B28">McCune and Mefford, 2018</xref>). Depending on the data set, some ecologists will transform the raw data in order for it to be used with certain methods. As we selected methods designed to work with our data set, we did not perform any transformations. The calculations needed to perform ordination techniques are performed within the software package in which several settings need to be selected. First, ordinations are calculated using a seed number which can be randomly selected or entered. Each seed number results in similar patterns, but with slightly different numbers; therefore, we selected the seed number 999. This ensures that the exact ordination calculations can be repeated. For DCA, we elected to down-weight rare words due to the large size of the data set. This focuses the ordination on overarching patterns in the data. For PCoA, a distance measure has to be selected. Similar to ordination itself, each measure has positive and negative attributes. We selected Bray&#x2013;Curtis distance as it is optimal for non-normal data (<xref ref-type="bibr" rid="B13">Goodrich et al., 2014</xref>). Scores were calculated for words using weighted averaging. We examined the significance of each axis using 999 randomizations. The percent inertia (or variance explained) for each axis is provided in the outputs of the PC-ORD file and included in our results. We compiled categorical data (Type, Timing, and Thinking) associated with the CRs into a separate secondary matrix for ordination and used this secondary matrix with PC-ORD software to visually distinguish data points of different categories to help further reveal patterns of (dis)similarity in the data. DCA ordinations were then visualized using the R software package &#x201C;phyloseq&#x201D; (<xref ref-type="bibr" rid="B29">McMurdie and Holmes, 2013</xref>). Ellipses marking the 95% multivariate t-distribution confidence intervals were added to increase readability. PCoA ordinations were visualized in PC-ORD.</p>
<table-wrap position="float" id="T2">
<label>TABLE 2</label>
<caption><p>Words removed for ordination analysis.</p></caption>
<table cellspacing="5" cellpadding="5" frame="box" rules="all">
<tbody>
<tr>
<td valign="top" align="left">Articles</td>
<td valign="top" align="center">a</td>
<td valign="top" align="center">an</td>
<td valign="top" align="center">the</td>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Conjunctions</td>
<td valign="top" align="center">as</td>
<td valign="top" align="center">and</td>
<td valign="top" align="center">but</td>
<td valign="top" align="center">like</td>
<td valign="top" align="center">or</td>
<td/>
<td/>
</tr>
<tr>
<td valign="top" align="left">Prepositions</td>
<td valign="top" align="center">aboard</td>
<td valign="top" align="center">about</td>
<td valign="top" align="center">above</td>
<td valign="top" align="center">across</td>
<td valign="top" align="center">after</td>
<td valign="top" align="center">against</td>
<td valign="top" align="center">along</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">amid</td>
<td valign="top" align="center">among</td>
<td valign="top" align="center">around</td>
<td valign="top" align="center">at</td>
<td valign="top" align="center">before</td>
<td valign="top" align="center">behind</td>
<td valign="top" align="center">below</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">beneath</td>
<td valign="top" align="center">beside</td>
<td valign="top" align="center">besides</td>
<td valign="top" align="center">between</td>
<td valign="top" align="center">beyond</td>
<td valign="top" align="center">by</td>
<td valign="top" align="center">concerning</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">considering</td>
<td valign="top" align="center">despite</td>
<td valign="top" align="center">down</td>
<td valign="top" align="center">during</td>
<td valign="top" align="center">except</td>
<td valign="top" align="center">excepting</td>
<td valign="top" align="center">excluding</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">following</td>
<td valign="top" align="center">for</td>
<td valign="top" align="center">from</td>
<td valign="top" align="center">in</td>
<td valign="top" align="center">inside</td>
<td valign="top" align="center">into</td>
<td valign="top" align="center">minus</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">near</td>
<td valign="top" align="center">of</td>
<td valign="top" align="center">off</td>
<td valign="top" align="center">on</td>
<td valign="top" align="center">onto</td>
<td valign="top" align="center">opposite</td>
<td valign="top" align="center">outside</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">over</td>
<td valign="top" align="center">past</td>
<td valign="top" align="center">per</td>
<td valign="top" align="center">plus</td>
<td valign="top" align="center">regarding</td>
<td valign="top" align="center">round</td>
<td valign="top" align="center">since</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">than</td>
<td valign="top" align="center">through</td>
<td valign="top" align="center">to</td>
<td valign="top" align="center">toward</td>
<td valign="top" align="center">towards</td>
<td valign="top" align="center">under</td>
<td valign="top" align="center">underneath</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">unlike</td>
<td valign="top" align="center">until</td>
<td valign="top" align="center">up</td>
<td valign="top" align="center">upon</td>
<td valign="top" align="center">versus</td>
<td valign="top" align="center">via</td>
<td valign="top" align="center">with</td>
</tr>
<tr>
<td/>
<td valign="top" align="center">within</td>
<td valign="top" align="center">without</td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>These words were not removed to examine the diversity measures.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="S2.SS5">
<title>2.5. Testing of other text processing protocols for ED metrics and ordination</title>
<p>For the ED metrics and ordinations, we also generated raw matrices using lemmatization (in place of stemming) and correcting misspellings from CRs, as these approaches are also common in the field of lexical analysis. We supply results from this other trial in <xref ref-type="supplementary-material" rid="DS1">Supplementary Table 2</xref>. Overall, results from these other text processing methods resulted in similar patterns for the ED metrics further described in the Results from stemming and no misspelling correction. For ordination, we also tested multiple word exclusion lists and frequency thresholds. Our trials included using the Default Exclusion list from WordStat, removing only &#x201C;a, and, in, the&#x201D; and the custom exclusion list provided in <xref ref-type="table" rid="T2">Table 2</xref>. We also tested frequency thresholds of 3 (minimum needed for pattern), 5 (present in 1% of responses), 22 (present in 5% of responses), and 50 (present in 10% of responses). Finally, we also tested using the raw matrix without any text processing. Each of these combinations resulted in a different number of words within the matrix, ranging from only 20 to 898 words (data not shown). When performing the ordination on these matrices, it affected the inertia explained but not the patterns in the graphs (data not shown). We selected the setting used herein as it was a middle number of words (264) and seemed to be the most representative of the language in the responses. However, others may choose a different exclusion list or frequency threshold, depending on their application.</p>
</sec>
<sec id="S2.SS6">
<title>2.6. Statistical analysis</title>
<p>PERMANOVAs (PERmutational Multivariate ANalysis Of VAriance) were calculated in PC-ORD (version 7.08; <xref ref-type="bibr" rid="B28">McCune and Mefford, 2018</xref>). PERMANOVA is a statistical <italic>F</italic>-test on the differences in the mean within-group distances among all the tested groups (<xref ref-type="bibr" rid="B2">Anderson, 2017</xref>), meaning the relatedness of groups of data points in all dimensions. PERMANOVAs require that each group being tested has an equal number of samples in order to be performed. Since the categorical data is not balanced, we performed bootstrap or batched PERMANOVAs, meaning we created 1,000 different random samples of each group and performed a PERMANOVA on each random sample. The number of responses in each test was limited by the lowest <italic>n</italic> of each category within the grouping (Thinking = 88; Timing = 205; Type = 137). Interpretation of this <italic>p</italic>-value is fundamentally the same as it would be for other statistical tests. ANOVAs were performed with Tukey HSD and a cutoff of 0.05 in SPSS (<xref ref-type="bibr" rid="B16">IBM Corp, 2020</xref>).</p>
</sec>
</sec>
<sec id="S3" sec-type="results">
<title>3. Results</title>
<sec id="S3.SS1">
<title>3.1. Comparison of categorical groupings and text analysis</title>
<p>We expected student language included in their CRs to be reflective of their ideas; therefore, we began by examining the distribution of ideas across the sub-groups within each of the Thinking, Timing, and Types categories. To support these claims, we also performed traditional methods of text analysis to examine word usage within the different categories. These analyses are used to provide a point of comparison for findings of the ED methods, in addition to conclusions from previously published efforts.</p>
<sec id="S3.SS1.SSS1">
<title>3.1.1. Distribution of ideas</title>
<p>There is no overlap in singular ideas between Developing and Scientific thinking responses. We therefore expect the difference in language between Developing and Scientific responses to be the greatest in the data set. In contrast, Mixed thinking responses share some ideas with both Developing and Scientific thinking. As Mixed responses can share ideas with both Scientific and Developing responses, we expect Mixed responses to be an intermediate between Scientific and Developing CRs, using some text common to both Scientific and Developing CRs. While four of the seven ideas are considered Developing in our coding scheme, there is a higher total number of Scientific ideas (267) within the Mixed Thinking responses than Developing ideas (212). We therefore expect that there will be more similarities between Mixed and Scientific responses than Mixed and Developing responses. We expect student language to also change based on Timing of collection. This expectation is supported using a larger data set, which found that student explanations after an online tutorial included more scientific ideas and fewer Developing ideas (<xref ref-type="bibr" rid="B49">Uhl et al., 2021</xref>). <xref ref-type="bibr" rid="B49">Uhl et al. (2021)</xref> found that six of the seven ideas were each significantly different based on whether they were collected pre- or post-tutorial. As this data set is a subset of that data, we expect this pattern to hold, resulting in language differences based on Timing. Finally, <xref ref-type="bibr" rid="B41">Shiroda et al. (2021)</xref> also examined the idea distribution in this data set by Institutional Type in previous work. Only three of the seven ideas were statistically different (<italic>p</italic> &#x003C; 0.05) among the Institutional Types; therefore, we expect there to be the least amount of variability based on institutional Type in comparison to Timing or Thinking.</p>
</sec>
<sec id="S3.SS1.SSS2">
<title>3.1.2. Text analysis</title>
<p>Using quantitative text analysis, we found that 25 words were significantly different among the Thinking groupings (<italic>p</italic> &#x003C; 0.05). <italic>H<sub>2</sub>O, water, releas-, cellular, respir</italic>- and <italic>form</italic> were more common in Scientific responses. <italic>CO<sub>2</sub>, carbon, respir-, convert</italic>, and <italic>dioxid</italic>- were more common in both Mixed and Scientific responses. Mixed thinking responses were also more likely to have <italic>exhal-, glucos-, sweat, urin-, breath</italic>-, and <italic>broken</italic>. Finally, <italic>energi, weight, burn, bodi, diet, cell, fat</italic> and <italic>store</italic> were more frequently in Developing responses. The words <italic>lost</italic> and <italic>mass</italic> were more frequent in both Developing and Mixed responses. We performed similar quantitative text analysis for the Timing groups and found 13 words significantly different between responses that were collected Pre or Post-tutorial (<italic>p</italic> &#x003C; 0.05). Post-tutorial responses more frequently contained <italic>CO<sub>2</sub>, glucos-, water, cellular, H<sub>2</sub>O, respir-, breath, sweat, dioxide, convert</italic>, and <italic>ATP</italic>, while post-tutorial responses contained <italic>fat, weight, energi</italic>, <italic>bodi</italic>, and <italic>diet</italic> more frequently. Finally, we found the fewest number of significantly different words (5) among Types. TYCs more frequently contained the words <italic>turn, urin</italic>-, and <italic>sweat.</italic> TYCs and PUIs also contained the words <italic>exhale</italic> and <italic>weight</italic> in comparison to RICUs. In summary, by comparing the number of predictive words across the three possible groupings (Thinking, Timing, and Type), we found the most difference in text based on Thinking, followed by Timing and Type, respectively. The results from the quantitative text analysis agree with our expectations based on idea distribution and previous studies.</p>
</sec>
</sec>
<sec id="S3.SS2">
<title>3.2. Quantitative measures of ED quantify student language differences</title>
<p>Richness (S) is the number of unique non-zero elements in a response and is the same as alpha diversity. As S varies heavily for the responses, we provide a box plot of the data in the <xref ref-type="supplementary-material" rid="DS1">Supplementary Figure 1</xref>. The mean richness of all CRs is 18.5 (<xref ref-type="table" rid="T3">Table 3</xref>). The average response length is 22.5 words, indicating that students do not heavily repeat words in their responses. The S of responses grouped by Institutional Type are comparable (range: 16.7&#x2013;18.4) to the overall data set and each other. We did not find any statistical difference among these groupings (<italic>p</italic> = 0.41, ANOVA). Similarly, the S of Pre- and Post-tutorial responses is 18.3 and 16.8, respectively. This difference was statistically supported (<italic>p</italic> = 0.045; ANOVA). The greatest difference in S is observed among Thinking groups. Responses classified as Scientific have lower S (11.9) than Developing (18.1) or Mixed responses (21.7). This difference was statistically supported for the groupings overall (<italic>p</italic> &#x003C; 0.00001) and between the individual pairings (<italic>p</italic> &#x003C; 0.02; Tukey HSD). This suggests that Scientific responses use relatively few unique words in the responses. This fits with our prediction as Scientific responses include scientific ideas, often expressed with fewer possible terms. As richness is used to calculate some of the following metrics, these differences in S should be considered when interpreting those results.</p>
<table-wrap position="float" id="T3">
<label>TABLE 3</label>
<caption><p>Ecological diversity metrics.</p></caption>
<table cellspacing="5" cellpadding="5" frame="box" rules="all">
<thead>
<tr>
<td valign="top" align="left" style="color:#ffffff;background-color: #7f8080;">Measure</td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;">All</td>
<td valign="top" align="center" colspan="3" style="color:#ffffff;background-color: #7f8080;">Type</td>
<td valign="top" align="center" colspan="2" style="color:#ffffff;background-color: #7f8080;">Timing</td>
<td valign="top" align="center" colspan="3" style="color:#ffffff;background-color: #7f8080;">Thinking</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left" style="color:#ffffff;background-color: #7f8080;"></td>
<td valign="top" align="left" style="color:#ffffff;background-color: #7f8080;"></td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;"><bold>TYC</bold></td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;"><bold>PUI</bold></td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;"><bold>RICU</bold></td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;"><bold>Pre</bold></td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;"><bold>Post</bold></td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;"><bold>Dev</bold></td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;"><bold>Mix</bold></td>
<td valign="top" align="center" style="color:#ffffff;background-color: #7f8080;"><bold>Sci</bold></td>
</tr>
<tr>
<td valign="top" align="left">Richness (S, &#x03B1;)</td>
<td valign="top" align="center">18.1</td>
<td valign="top" align="center">17.2</td>
<td valign="top" align="center">17.9</td>
<td valign="top" align="center">19</td>
<td valign="top" align="center">19.2</td>
<td valign="top" align="center">17</td>
<td valign="top" align="center">18.1</td>
<td valign="top" align="center">21.7</td>
<td valign="top" align="center">11.9</td>
</tr>
<tr>
<td valign="top" align="left">Evenness (E)</td>
<td valign="top" align="center">0.984</td>
<td valign="top" align="center">0.984</td>
<td valign="top" align="center">0.984</td>
<td valign="top" align="center">0.983</td>
<td valign="top" align="center">0.982</td>
<td valign="top" align="center">0.985</td>
<td valign="top" align="center">0.901</td>
<td valign="top" align="center">0.937</td>
<td valign="top" align="center">0.992</td>
</tr>
<tr>
<td valign="top" align="left">Shannon diversity (H&#x2019;)</td>
<td valign="top" align="center">2.65</td>
<td valign="top" align="center">2.63</td>
<td valign="top" align="center">2.6</td>
<td valign="top" align="center">2.71</td>
<td valign="top" align="center">2.7</td>
<td valign="top" align="center">2.59</td>
<td valign="top" align="center">2.64</td>
<td valign="top" align="center">2.88</td>
<td valign="top" align="center">2.27</td>
</tr>
<tr>
<td valign="top" align="left">Simpson&#x2019;s diversity (D)</td>
<td valign="top" align="center">0.906</td>
<td valign="top" align="center">0.907</td>
<td valign="top" align="center">0.896</td>
<td valign="top" align="center">0.917</td>
<td valign="top" align="center">0.919</td>
<td valign="top" align="center">0.903</td>
<td valign="top" align="center">0.901</td>
<td valign="top" align="center">0.932</td>
<td valign="top" align="center">0.873</td>
</tr>
<tr>
<td valign="top" align="left">Whittaker&#x2019;s &#x03B2; diversity</td>
<td valign="top" align="center">37.4</td>
<td valign="top" align="center">39.3</td>
<td valign="top" align="center">37.7</td>
<td valign="top" align="center">35.5</td>
<td valign="top" align="center">35.2</td>
<td valign="top" align="center">39.9</td>
<td valign="top" align="center">37.4</td>
<td valign="top" align="center">31</td>
<td valign="top" align="center">57.3</td>
</tr>
<tr>
<td valign="top" align="left">Bray&#x2013;Curtis dissimilarity</td>
<td valign="top" align="center">80.4</td>
<td valign="top" align="center">80.6</td>
<td valign="top" align="center">81.6</td>
<td valign="top" align="center">78.5</td>
<td valign="top" align="center">81</td>
<td valign="top" align="center">78.5</td>
<td valign="top" align="center">80.2</td>
<td valign="top" align="center">75</td>
<td valign="top" align="center">75</td>
</tr>
<tr>
<td valign="top" align="left">Species turnover</td>
<td valign="top" align="center">2.3</td>
<td valign="top" align="center">2.4</td>
<td valign="top" align="center">2.4</td>
<td valign="top" align="center">2.2</td>
<td valign="top" align="center">2.4</td>
<td valign="top" align="center">2.2</td>
<td valign="top" align="center">2.3</td>
<td valign="top" align="center">2</td>
<td valign="top" align="center">2</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn><p>Calculated using stemming with spelling errors corrected. The values represent averages calculated from the individual responses (Richness, Evenness, Shannon, and Simpson) or every possible pairing (Whittaker, Bray&#x2013;Curtis, Turnover).</p></fn>
</table-wrap-foot>
</table-wrap>
<p>Evenness (E) is the comparative frequency of words in a response. At an E of one, all words in a CR occur in equal frequencies, while low values mean that students heavily use certain words. The entire data set has a value of 0.98, indicating most words occur at the same frequency within an individual CRs. This is expected, as the CRs are relatively short, meaning most words are likely used once. Similar values for evenness are observed for each category within Type (range: 0.98&#x2013;99; <italic>p</italic> = 0.98, ANOVA) and Timing (range: 0.98&#x2013;99, <italic>p</italic> = 0.06, ANOVA). Differences in E are greatest within Thinking groups. Mixed and Developing responses have the lower values of 0.979 and 0.984, respectively, while Scientific Thinking responses have a higher value of 0.99 (<italic>p</italic> &#x003C; 0.00001), with each pairing being significantly different (<italic>p</italic> &#x003C; 0.05; Tukey HSD). As S is the denominator in the E formula, this change in E is likely due to the observed differences in S.</p>
<p>The Simpson&#x2019;s index of diversity (D) is calculated using a single CR and averaged for a group. Higher numbers represent low diversity. The corpus has a value of 0.91, indicating the CRs have high diversity and are not repetitive. Type (range: 0.90&#x2013;0.92; <italic>p</italic> = 0.14, ANOVA) and Timing (range 0.90&#x2013;0.92; <italic>p</italic> = 0.42, ANOVA) have similar values. In contrast, within Thinking, Scientific responses have the lowest value of 0.87, while Developing and Mixed Thinking have values of 0.93 and 0.90, respectively. This difference is significant between all pairings within Thinking (<italic>p</italic> &#x003C; 0.05; Tukey&#x2019;s HSD). This result means there is a higher probability that two random words are the same within a Scientific CR in comparison to the other individual CRs in the Thinking categories and the corpus overall. This could, in part, be due to the Scientific category having the lowest S of the categories.</p>
<p>Shannon Diversity (H&#x2019;) can be interpreted as the chance of predicting a random word in a CR. If a single word is very frequent in a dataset, then there is a higher likelihood a prediction will be correct (low H&#x2019;). The H&#x2019; of the whole data set is 2.65. Type (range: 2.60&#x2013;2.71; <italic>p</italic> = 0.34, ANOVA) and Timing (range: 2.59&#x2013;2.70; <italic>p</italic> = 0.68) have similar H&#x2019; values among categories and in comparison, to the corpus as a whole. In contrast, Thinking groups have more varied H&#x2019; values of 2.88, 2.64 and 2.27 for Mixed, Developing and Scientific, respectively (<italic>p</italic> &#x003C; 0.00001, ANOVA). Each pairing is significantly different within Thinking (<italic>p</italic> &#x003C; 0.005, Tukey HSD). These results indicate that Scientific responses are more repetitive in comparison to other CRs. These results agree with findings using D, indicating the words in a Scientific response are more predictable. Again, this could be due to the large difference in S based within Thinking.</p>
<p>Whittaker&#x2019;s beta (&#x03B2;) diversity compares the shared words between two responses. Low values represent less diversity with many shared words between the responses, while high values indicate high diversity with fewer words being shared. Our entire dataset has a &#x03B2; diversity of 38.6, meaning diversity within categories is much lower than diversity across all responses. When we examined &#x03B2; diversity within the different Types, we found slightly varied &#x03B2; diversities, with RICUs, PUIs, and TYCs having values of 36.7, 38.7, and 40.6, respectively. The relative similarity between the groups and the overall &#x03B2; diversity of the entire data set suggests there is little difference in student CRs based on Type. We found a similar result with Timing, as responses collected Pre- and Post- tutorial responses have &#x03B2; diversities of 37.0 and 40.4, respectively. As with the previous ED metrics, we found there is a more distinct difference in &#x03B2; diversity based on the groupings within Thinking. While &#x03B2; diversities of Developing and Mixed CRs are similar at 37.4 and 31.0, respectively, responses in the Scientific category have a much higher &#x03B2; diversity of 57.3. This measure supports our prediction that the largest difference would be within Thinking. These results suggest that Scientific CRs share the fewest words with each other, while Mixed CRs share the most words. We had expected that Scientific responses would share more words between responses than any other category in Thinking, as the ideas and thereby language would be the most restricted. The increased value may be due to the lower &#x03B1; (or S) of the Scientific CRs (9) in comparison to Mixed (21.7) and Developing (18.1) Thinking, as it is the denominator in the calculation of &#x03B2;.</p>
<p>Species turnover or half changes is calculated based on shared words between paired responses. As the number of half changes increases, responses share fewer and fewer words. We calculated species turnover for the entire data set and found the corpus has a mean of 2.3 half changes, meaning that, on average, two CRs in the corpus share less than 25% of words. We also calculated species turnover based on groupings in the categorical data. We found categories within Type, Timing, and Thinking all have similar half change ranges: Institution: 2.2&#x2013;2.4 (about 21.5&#x2013;19% words shared); Timing: 2.2&#x2013;2.4 (about 21.5&#x2013;19%), and Thinking: 2.0&#x2013;2.3 (25% to about 20% words shared). Mixed and Scientific responses are the categories with the lowest values of 2.0 average half changes. These results also support our prediction that the greatest difference in text would be within Thinking. In contrast to findings using the &#x03B2; metric, Mixed and Scientific responses have more similar species turnover values than Developing CRs. This result agrees with our stated predictions.</p>
<p>A third way to examine variation is to calculate the compositional dissimilarity using a distance measure. The Bray&#x2013;Curtis dissimilarity has a value of 0% when two responses are exactly the same and 100% when no words are shared between responses. We calculated this measure for each pairing in the entire corpus and found the data set has a dissimilarity of 80.36%, indicating that the text used in the entire response set is more dissimilar than similar. This indicates any CR is on average 80% different from any other, which is similar to findings from species turnover above. We also calculated the Bray&#x2013;Curtis dissimilarity for the categorical groupings. Within Types, there are similar dissimilarities of 80.62, 81.57, and 78.49% for TYCs, PUIs, and RICUs, respectively. These values are also very similar to the overall data set, suggesting that each category shows similar patterns to the overall data set. For Timing, the dissimilarities are 80.94 and 78.54% for Pre- and Post-tutorial responses, respectively, suggesting there is little change in language based on Timing. In contrast, the Bray&#x2013;Curtis dissimilarity of Developing responses (80.19%) is higher than that of Mixed (74.98%) or Scientific (74.94%) responses. As with species turnover, Mixed and Scientific responses have more similar values in comparison to Developing CRs.</p>
</sec>
<sec id="S3.SS3">
<title>3.3. Ordination techniques aid in visualization and reveal patterns in the corpus</title>
<p>Each of the measures described above describes diversity <italic>within groups</italic> or <italic>group averages</italic> of single CRs; however, we are also interested in examining and measuring potential differences <italic>between group</italic>s of CRs. Using DCA (<xref ref-type="fig" rid="F3">Figure 3A</xref>) and PCoA (<xref ref-type="supplementary-material" rid="DS1">Supplementary Figure 2</xref>), we created two-dimensional plots of the corpus, wherein each data point is an individual CR. Points that are close to each other are more similar based on word choice and frequencies in the CR. Each axis, beginning with the <italic>x</italic>-axis, explains a descending amount of variation in the data in an additive manner and likely has multiple aspects of the data contributing to it.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Detrended Correspondence Analysis (DCA). DCA was performed without any data transformation. The graphs represent 416 responses after the removal of responses 35 and 78. <bold>(A)</bold> The ordination was graphed with select responses numbered for discussion in the Results. Grouping variables including <bold>(B)</bold> Thinking, <bold>(C)</bold> Timing, and <bold>(D)</bold> Type were overlaid to compare between groups. Centroids of a given grouping variable are represented as filled circles. Ellipses are the 95% multivariate t-distribution confidence of each categorical group.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="feduc-08-989836-g003.tif"/>
</fig>
<sec id="S3.SS3.SSS1">
<title>3.3.1. Detrended correspondence analysis (DCA)</title>
<p>DCA is uniquely suited to our purpose as the <italic>x</italic>-axis is defined exclusively as species turnover, meaning points (responses) that are the furthest away from each other on the <italic>x</italic>-axis have the highest difference in words. Additionally, every 100 units on the <italic>x</italic>-axis of the DCA graphs represents one half-change of words, allowing direct comparison of data by species turnover measure. The DCA of the entire data set results in two responses, 35 and 78, far removed from other data points. CR35 is located at (190, 5012) and reads, &#x201C;Excretion.&#x201D; CR78 is located at (1186, 179) and reads, &#x201C;<underline>Into the</underline> air <italic><underline>via</underline></italic> C0<sub>2</sub>.&#x201D; (Underlined words are removed during the matrix generation process; see &#x201C;Materials and methods&#x201D;) These responses are very unique in comparison to other responses in the corpus (maximum axis 1 value: 449; maximum axis 2 value: 344) and render the rest of the graph uninterpretable (<xref ref-type="supplementary-material" rid="DS1">Supplementary Figure 3</xref>). These responses were therefore removed as outliers (<xref ref-type="bibr" rid="B28">McCune and Mefford, 2018</xref>) from the data set used for DCA, to better examine the remaining data. The results from the DCA explained 7.7% of the total inertia (variability) of the resulting matrix (<xref ref-type="fig" rid="F3">Figure 3A</xref>). The first axis explains 4.9% of the total variability and the second axis explains 3.8%. For large data matrices, it is expected that two axes will not explain large portions of the data (<xref ref-type="bibr" rid="B13">Goodrich et al., 2014</xref>). To ensure the patterns are still meaningful, randomization tests determine if the axes are significant in comparison to randomized orders of the data. We found that both axes significantly explained the data (999 randomizations; <italic>p</italic> &#x003C; 0.003). Data points range from 0 to 434.5 on the <italic>x</italic>-axis (<xref ref-type="fig" rid="F3">Figure 3A</xref>), demonstrating that extremes of this corpus do not share any words, as 4 half changes between points is interpreted to be essentially unique.</p>
</sec>
<sec id="S3.SS3.SSS2">
<title>3.3.2. Principal coordinate analysis (PCoA)</title>
<p>In contrast to DCA, PCoA does not have a specified, singular component or variable that is explained by any axis. As with DCA, close proximity of points means that they are more similar based on the component. We visualized our entire corpus using this ordination technique and did not observe any outlier responses that obscured the remaining data; therefore, no CRs were removed (see <xref ref-type="supplementary-material" rid="DS1">Supplementary Figure 2</xref>). We found six significant axes using this technique (1000 randomizations; <italic>p</italic> &#x003C; 0.03). Combined, these six axes explain 36.8% of the total variance. The first axis explained 9.4% of the data, while the second explained 7.6%. We found DCA and PCoA provided similar results and will therefore only describe DCA results due to the usefulness of the first axis in calculating half-changes between responses.</p>
</sec>
<sec id="S3.SS3.SSS3">
<title>3.3.3. Ordination techniques allow easy examination of corpuses of short texts</title>
<p>Using the ordination graph from DCA (<xref ref-type="fig" rid="F3">Figure 3A</xref>), we can easily identify CRs that are very similar or different without reading the responses. Responses 9 and 10, marked in <xref ref-type="fig" rid="F1">Figure 1A</xref>, are immediately next to each other and both say, &#x201C;Carbon dioxide and water.&#x201D; Response 40 is nearby and reads &#x201C;Expelled through gas like carbon dioxide.&#x201D; In contrast, data points that are on the two extreme sides of the graph share no words in common. Response 100, marked in <xref ref-type="fig" rid="F1">Figure 1A</xref>, says &#x201C;Probably the energy stored in the weight was used up by cells due to the decrease in calorie intake during the diet.&#x201D; During an initial examination of the data, it could be useful to quickly identify CRs that are very similar or very different, especially with very large data sets that would require large amounts of time to examine individually.</p>
</sec>
<sec id="S3.SS3.SSS4">
<title>3.3.4. Categorical data can be overlayed to reveal relationships among CRs</title>
<p>Categorical data (Thinking, Timing, and Type) associated with the CRs can be overlaid on the ordination graphs without affecting the placement of the data points, potentially illustrating patterns within the data set (<xref ref-type="fig" rid="F3">Figures 3B&#x2013;D</xref>). Centroids are the average coordinate value for the categorical group and are represented in the graphs by filled circles. One way to examine differences between groups is to calculate distances between group centroids. We found the largest change in position for centroids based on Thinking groups, with the total distance between the centroids being 134.2 units. Developing thinking is left-most on the <italic>x</italic>-axis at 149.3, Mixed thinking is in the middle at 241.0, and Scientific thinking is right-most at 283.5. While centroids represent the average of the group, PERMANOVAs test the relatedness of groups of data points in all dimensions using the matrix used to create the ordination graph. Within Thinking, the differences in relative distance are significant (<xref ref-type="fig" rid="F3">Figure 3B</xref>; PERMANOVA; <italic>p</italic> = 0.0002; <italic>n</italic> = 88). For Timing (<xref ref-type="fig" rid="F3">Figure 3C</xref>), there is slight separation of the data with post-tutorial responses as a group being more to the right of the graph. There is less distance between the two group centroids of 45 units (Pre: 186.8; Post: 231.8) in comparison to Thinking (134.2 units of separation). Using PERMANOVA, these Timing groups are also significantly different (<italic>p</italic> = 0.0002; <italic>n</italic> = 205). Finally, there appears to be minimal difference based on the Institutional Type (<xref ref-type="fig" rid="F3">Figure 3D</xref>). The centroids are at most separated by only 8.4 units on the <italic>x</italic>-axis (TYC: 206.4; PUI: 214.8; RICU: 207.9), and there is not an apparent distinct clustering of the CRs. PERMANOVA reveals low statistical support for differences based on Type (<italic>p</italic> = 0.084, <italic>n</italic> = 137). While we did observe separation among groupings for Timing and Thinking, we also note the spread of responses within these individual groups is similar, which is consistent with the very similar number of half changes observed using ecological measures (<xref ref-type="table" rid="T3">Table 3</xref>).</p>
</sec>
</sec>
</sec>
<sec id="S4" sec-type="discussion">
<title>4. Discussion</title>
<p>The aim of this paper was to explore the novel application of established ecological diversity measures and methods for analyzing short, explanatory texts. CR assessment offers insight into student thinking or performance through student language, but quantitative evaluation of the language diversity in CRs is limited. For this data set, we previously identified and explored patterns of ideas present in student explanations (<xref ref-type="bibr" rid="B41">Shiroda et al., 2021</xref>) but were dissatisfied with the available methods to quantify and represent holistic differences in language between responses and/or groups. This limitation and previous work by <xref ref-type="bibr" rid="B17">Jarvis (2013)</xref> comparing ecological and lexical approaches to diversity, motivated us to examine ED approaches for text analysis. Herein, ED metrics and ordination allowed us to examine student language in a different way than other methods. We were able to quantify holistic differences in language that we had observed when comparing student responses based on Thinking, Timing, and Type. The purpose of the current work is meant to be confirmatory in nature, in that we have already explored this CR corpus in previous work and had expected results based on this previous qualitative work. Namely, we expected the greatest difference in language to be among Thinking, some difference based on Timing, and little difference based on Type. Using these predictions, we could examine whether the outcomes from the ED metrics and ordination techniques corresponded to construct-relevant differences in student CRs.</p>
<p>Overall, we applied seven ED measures to this data set. Richness or alpha diversity, while helpful in other calculations, does not reveal anything uniquely useful, as this can be easily calculated with other forms of text analysis. Similarly, evenness was not particularly useful in itself given how short most responses were, as students are unlikely to heavily repeat a given word in only one to three sentences. However, this information is important for interpretation of the other metrics and could be more useful in longer texts than ones used here. Shannon and Simpson diversity metrics are similar to existing lexical diversity measures in that they examine diversity of individual responses. One advantage of these ecological measures in comparison to those in lexical diversity is that they have no established lower limit on length. In spite of this, Shannon and Simpson are still influenced by evenness and richness. While this may not be problematic for all CR corpora, our data set had differences in richness based on Thinking and Timing, making the Shannon and Simpson measures more difficult to interpret for those categories of CRs.</p>
<p>We found comparing pairs of responses using Whittaker&#x2019;s &#x03B2;, Bray&#x2013;Curtis Dissimilarity, and Species Turnover to be the most interesting expansion of current text analysis approaches for our applications. These three measures each quantify differences between responses in slightly different ways. Additionally, each identified similar patterns in the categorical data, which correspond well to our previous, qualitative analysis of the corpus. Namely, that grouping responses by Thinking category has the largest effect on all three measures and suggesting that differences in student texts exist between sub-groups. Additionally, all three measures found that Developing CRs are very similar to the entire corpus. For each measure, Developing and Scientific responses are consistently most different from each other; however, Mixed responses are more similar to Developing responses with Whittaker&#x2019;s &#x03B2;, but more similar to Scientific responses when measured by Bray&#x2013;Curtis Dissimilarity and Species Turnover. This result could be due to the difference in Richness (alpha) based on Thinking. Bray&#x2013;Curtis Dissimilarity and Species Turnover also more closely agreed with our prediction that Mixed Thinking CRs would be more similar to Scientific CRs than Developing ones. We also identified a general pattern in the corpus that Scientific responses are more similar to themselves than the corpus overall. This is the only category within Type, Thinking, or Timing that consistently had a unique value. This supports observations from rubric development and human coding during qualitative analysis, in that there are generally fewer ways to write correctly about a scientific idea than ways to write about incorrect or other, non-scientific ideas (<xref ref-type="bibr" rid="B44">Sripathi et al., 2019</xref>; <xref ref-type="bibr" rid="B41">Shiroda et al., 2021</xref>). We are excited these quantitative measures support these qualitative observations and consider these metrics promising for critically testing student language. As Whittaker&#x2019;s &#x03B2; shows a different pattern than Bray&#x2013;Curtis Dissimilarity and Species Turnover, we considered which measures best suit our purposes. Bray&#x2013;Curtis Dissimilarity and Species Turnover are less sensitive to differences in richness, which we prioritize because this difference is already apparent in the richness measure itself. Additionally, Whittaker&#x2019;s &#x03B2; is generally considered to be a very simple representation of diversity, which also contributes to our preference for Bray&#x2013;Curtis Dissimilarity and Species Turnover.</p>
<p>Ordination offers a unique visualization of the CR corpus and greatly assists our comparison of language among different groupings of the CR corpus. While we can and did qualitatively examine the responses previously during human thematic coding (<xref ref-type="bibr" rid="B44">Sripathi et al., 2019</xref>; <xref ref-type="bibr" rid="B41">Shiroda et al., 2021</xref>), these processes take time. We imagine these techniques could be helpful as an exploratory phase of CR analysis, similar to LSA, to look for unique responses or determine if there are potential language differences among groups. Here, we used ordination in a confirmatory fashion. We expected Thinking to most affect student language because that is how the rubric and coding were designed. Similarly, we were expecting there to be differences based on Timing since changes in Thinking are associated with Timing (<xref ref-type="bibr" rid="B49">Uhl et al., 2021</xref>). In contrast, <xref ref-type="bibr" rid="B41">Shiroda et al. (2021)</xref> found fewer apparent differences based on the institutional Type. These expectations are further supported by text analysis through having a decreasing number of predictive words. Indeed, ordination analysis reflected these expectations (<xref ref-type="fig" rid="F3">Figures 3B&#x2013;D</xref>), both in the more distinct clustering of responses using the categorical data and in the distance between group centroids. These overall clustering patterns could be observed in both DCA (<xref ref-type="fig" rid="F3">Figures 3B&#x2013;D</xref>) and in PCoA (<xref ref-type="supplementary-material" rid="DS1">Supplementary Figures 2B&#x2013;D</xref>). While observing these patterns and calculating the half changes in the DCA are useful, PERMANOVA tests are a promising method to quantitatively compare groups of responses. Using this test, we confirm the largest difference in student text is among the groups within Thinking and between Timing, while there is limited support for differences in text among the Institutional Types groups. This allows us to conclude that student word choice differs for sub-groups in both Thinking and Timing, while word choice for CRs to this question is not related to Institutional Type. Differences between Thinking are heavily supported by the rubric, but the lack of differences in language among the institutional Types was only qualitatively supported in <xref ref-type="bibr" rid="B41">Shiroda et al. (2021)</xref>. In contrast, these PERMANOVA tests provide direct statistical rigor to the observations that are not possible with other analyses. These methods could be particularly useful in comparing differential language between groups to better understand the different ways students convey understanding. For example, when originally working with this data set, we were attempting to examine performance differences for a computerized text classification model with this data set in comparison to one that was used to create the model (<xref ref-type="bibr" rid="B41">Shiroda et al., 2021</xref>). Using these ordination techniques, one would be able to quickly and visually compare the original and new data sets to determine if student language was different between the sets. We have since successfully applied ordination techniques to understand other computer scoring model performance (Shiroda et al., in review<sup><xref ref-type="fn" rid="footnote1">1</xref></sup>). In comparison, similar text analysis approaches such as LSA may be helpful in exploratory analyses to find prevalent themes in responses but would be less helpful for this goal as they do not reveal differences in specific words and instead condense the meaning of the language. As such our novel application of ecological diversity measures may be used in complementary fashion with other text analysis methods depending on the research study.</p>
<p>We performed quantitative text analysis to support our expectations for the differences in CRs among the categorical data. Indeed, we found that these differences in ED measures correspond to differences in words identified by text analysis and can be further linked to differences observed in human-assigned ideas (i.e., student thinking). This helps validate the ED metrics by identifying words and phrases which differ significantly in their usage between sub-groups. However, the ED methods and text analysis provide different pieces of information. While ED methods help compare individual CRs to each other, text analysis helps us understand differences in the actual text identified using the ED methods. For example, the words that are differentially used in responses categorized by coders as Scientific ideas include <italic>H<sub>2</sub>O, water, releas-, cellular, respir</italic>- and <italic>form</italic>. Most of these words are closely linked to the Scientific ideas identified in the coding rubric categories of Correct Products and Exhalation. The words <italic>CO<sub>2</sub>, carbon, respir-, convert</italic>, and <italic>dioxid</italic>- were more common in both Mixed and Scientific responses, indicating considerable overlap in how students describe how carbon leaves the system. As water was only frequently used in Scientific thinking, this analysis suggests students with Mixed thinking still struggle with how water leaves the body during weight loss. This information would not be clear using only the ecological methods we describe here. We therefore suggest that ecological methods be used in conjunction with text analysis to examine CR corpora.</p>
<p>In summary, we found that ED measures can be usefully applied to text analysis of students&#x2019; short text explanations. In particular, methods that analyze between response variation (Whittaker&#x2019;s &#x03B2;, Bray&#x2013;Curtis Dissimilarity, Species Turnover, and ordination) were most useful for our interests in understanding CRs based on categorical data. For other research interests, Simpson, or Shannon diversity measures may be more informative. Similarly, richness and evenness do not seem to provide much additional insight to text diversity with this data set but are needed to better interpret the other ED measures and could be more informative for longer texts.</p>
<sec id="S4.SS1">
<title>4.1. Future directions and considerations for additional applications</title>
<p>These techniques help reveal differences in diversity within student language and different categories of the corpus; however, further analysis is needed to understand these results. With the exception of the first axis of DCA, it is difficult to interpret ordinations for specific differences in the text, as each axis represents multiple factors in the data. Similarly, while the different metrics (E, S, D, H&#x2019;, &#x03B2;, Bray&#x2013;Curtis Dissimilarity and species turnover) quantify diversity and provide markers for the amount of variety in a group of responses, the metrics do not specify the nature of the differences. Determining these differences in language within the text is better achieved by text analysis, along with traditional qualitative techniques, such as coding of the responses. Therefore, we recommend that ED and ordination analysis be done to supplement text analysis and qualitative methods. For example, we performed text analysis as a proxy to differences in word choice, but examining the predictive words reveals an important difference in language. <italic>Water</italic> is only increased in Scientific CRs while <italic>sweat</italic> and <italic>urine</italic> are increased in Mixed thinking. This indicates that students with Mixed thinking are still having trouble articulating how water leaves the body in relation to weight loss and could serve as a target for improving student explanations. If we had only applied the ecological methods, we would know that there is a difference but not have an actionable conclusion that could promote teaching and learning.</p>
<p>We consider these analyses broadly applicable to any corpus of short texts. Our group has already successfully applied these analyses to multiple CR corpora to examine the progression of student language across physiology contexts (Shiroda et al., in review<sup><xref ref-type="fn" rid="footnote2">2</xref></sup>) and explore the effect of overlapping language on the success of machine learning models for automated assessment [Shiroda et al., in review (see text footnote 1)]. As with any ecological study, we began this study by considering the nature of our data set and recommend this as a critical first step before applying these methods to new data sets. We note that in applying these diversity methods to our data set, we made purposeful decisions about text processing, many of which led to meaningful interpretation of the results. However, we do not consider these decisions absolute for all applications and acknowledge that other data sets and/or outcomes will most likely justify different text processing decisions. For example, we chose to stem words for the diversity metrics, but not remove any other words. We chose these settings as it most closely matches the text analysis protocols that were used in the previous work. While we found the text processing method did not affect the overall patterns we found, this may not be true for other data sets (<xref ref-type="supplementary-material" rid="DS1">Supplementary Table 2</xref>). We selected this method as the settings are most similar to previous work, allowing this work to be more directly compared to previous work. For some CRs, the distinction between stemming and lemmatization may be important. For example, stemming is not exact in removing tense. It will remove words that maintain the same root but do not collapse the form of words that change fully such as &#x201C;to be.&#x201D; Since our question was in past tense, there was not a large number of differences in tense; however, for other data sets ensuring tense is collapsed may be more important to reveal patterns. Lemmatization does make these changes, but also collapses comparative words. For example, great, greater, and greatest are collapsed. Depending on the context, maintaining the levels of comparison could differentiate student thinking and be important to maintain. We strongly suggest that text processing decisions should be purposeful and tailored to the corpus.</p>
<p>Ordination requires separate, equally purposeful decisions to function correctly. We removed less meaningful words (e.g., articles, conjunctions, propositions), as common, unmeaningful words can skew the overall pattern of the data set. However, it is important to keep the CR context in mind when choosing text processing strategies. For example, if students are explaining the process of diffusion as part of a science course, the words &#x201C;in&#x201D; and &#x201C;out&#x201D; would be critical to student meaning in that context and should not be removed. We advise others using these techniques to examine their data to determine whether certain prepositions or words may be important. While text processing steps will likely differ, DCA and PCoA are likely to be most useful to examine language diversity in most CR data sets. A key advantage of these two approaches is that these methods can handle data sets with high percentages of zeros, which is likely to occur in most lexical datasets (i.e., short, content-rich texts). However, other ordination methods should be considered during the initial phases of data analysis to make sure the approach is appropriate for the data set and these other ordination methods explored further. For example, if a set of CRs is highly redundant, this could result in a lower percentage of zeros, opening the possibility of using ordination methods that our data excluded. We recommend that researchers who wish to apply these methods, but do not have an ecology background, seek out helpful texts including <xref ref-type="bibr" rid="B36">Peck (2010)</xref> and <xref ref-type="bibr" rid="B35">Palmer (2019)</xref>, and a website maintained by Oklahoma State University: <ext-link ext-link-type="uri" xlink:href="http://ordination.okstate.edu/key.htm">http://ordination.okstate.edu/key.htm</ext-link>. We view the versatility and the ability to make purposeful choices for each data as a strength of the methodology.</p>
<p>While this study was confirmatory and the current paper is intended to describe the approach, we believe these techniques can also be used in an exploratory fashion. We were originally motivated to perform this work because we were excited by the potential to expand quantitative approaches to language diversity in CRs (or short blocks of text). The data visualization, various metrics, and statistical computations of our ED methods offer a rich and wide range of results that bring statistical and quantitative methods to a field that typically relies on qualitative methods. Overall, these ED techniques provide quantitative methods that will allow researchers to examine short texts in a novel way in comparison to current text analysis methods. Within STEM education research, these techniques can assist in the examination of differences in student writing and ideas over time, effects of a pedagogical intervention, differences in explanations across contexts for cross-cutting concepts, and many other forms of categorical data.</p>
</sec>
</sec>
<sec id="S5" sec-type="data-availability">
<title>Data availability statement</title>
<p>The raw word matrix, curated matrix used for ordination, and associated categorical data are available on GitHub.1 researchers who are interested in the responses may contact the KH, <email>haudekke@msu.edu</email>.</p>
</sec>
<sec id="S6" sec-type="ethics-statement">
<title>Ethics statement</title>
<p>The studies involving human participants were reviewed and approved by the Michigan State University (x10&#x2013;577). Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.</p>
</sec>
<sec id="S7" sec-type="author-contributions">
<title>Author contributions</title>
<p>MS performed the data analysis and primarily drafted the manuscript. MF assisted the data analysis and drafted the manuscript. KH provided the feedback on the data analysis and manuscript. All authors were involved in project design, execution, and editing of the manuscript.</p>
</sec>
</body>
<back>
<sec id="S8" sec-type="funding-information">
<title>Funding</title>
<p>This material was based upon work supported by the National Science Foundation under Grant Nos. 1323162 and 1660643.</p>
</sec>
<ack><p>We would like to thank Marisol Mercado Santiago for her work on reviewing lexical diversity and Ryan Terrill for his expertise in R. We would also like to thank Juli Uhl, Leonora Kaldaras, and Jennifer Kaplan for helpful edits and Brian Nohomovich for discussions throughout the project.</p>
</ack>
<sec id="S9" sec-type="COI-statement">
<title>Conflict of interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="S10" sec-type="disclaimer">
<title>Publisher&#x2019;s note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
<sec id="S11" sec-type="supplementary-material">
<title>Supplementary material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/feduc.2023.989836/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/feduc.2023.989836/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.docx" id="DS1" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<fn-group>
<fn id="footnote1">
<label>1</label>
<p>Shiroda, M., Doherty, J. H., and Haudek, K. C. (in review). <italic>Exploring attributes of successful machine learning assessments for scoring of undergraduate constructed responses assessment items. Uses of artificial intelligence in STEM education</italic>. Oxford: Oxford University Press.</p></fn>
<fn id="footnote2">
<label>2</label>
<p>Shiroda, M., Doherty, J. H., Scott, E. E., and Haudek, K. C. (in review). Covariational reasoning and item context affect language in undergraduate mass balance written explanations. <italic>Adv. Physiol. Educ.</italic></p></fn>
</fn-group>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><collab>American Association for the Advancement of Science</collab> (<year>2011</year>). <source><italic>Vision and change in undergraduate biology education: a view for the 21st century.</italic></source> Available online at: <ext-link ext-link-type="uri" xlink:href="https://live-visionandchange.pantheonsite.io/wp-content/uploads/2011/03/Revised-Vision-and-Change-Final-Report.pdf">https://live-visionandchange.pantheonsite.io/wp-content/uploads/2011/03/Revised-Vision-and-Change-Final-Report.pdf</ext-link> [<comment>accessed August 18, 2021</comment>].</citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Anderson</surname> <given-names>M. J.</given-names></name></person-group> (<year>2017</year>). &#x201C;<article-title>Permutational multivariate analysis of variance (PERMANOVA)</article-title>,&#x201D; in <source><italic>Wiley StatsRef: statistics reference online</italic></source>, <role>eds</role> <person-group person-group-type="editor"><name><surname>Balakrishnan</surname> <given-names>N.</given-names></name> <name><surname>Colton</surname> <given-names>T.</given-names></name> <name><surname>Everitt</surname> <given-names>B.</given-names></name> <name><surname>Piegorsch</surname> <given-names>W.</given-names></name> <name><surname>Ruggeri</surname> <given-names>F.</given-names></name> <name><surname>Teugels</surname> <given-names>J. L.</given-names></name></person-group> (<publisher-loc>Hoboken, NJ</publisher-loc>: <publisher-name>Wiley</publisher-name>), <pub-id pub-id-type="doi">10.1002/9781118445112.stat07841</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Birenbaum</surname> <given-names>M.</given-names></name> <name><surname>Tatsuoka</surname> <given-names>K. K.</given-names></name> <name><surname>Gutvirtz</surname> <given-names>Y.</given-names></name></person-group> (<year>1992</year>). <article-title>Effects of response format on diagnostic assessment of scholastic achievement.</article-title> <source><italic>Appl. Psychol. Meas.</italic></source> <volume>16</volume> <fpage>353</fpage>&#x2013;<lpage>363</lpage>. <pub-id pub-id-type="doi">10.1177/014662169201600406</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Borges</surname> <given-names>V. R. P.</given-names></name> <name><surname>Esteves</surname> <given-names>S.</given-names></name> <name><surname>de Nardi Araujo</surname> <given-names>P.</given-names></name> <name><surname>de Oliveira</surname> <given-names>L. C.</given-names></name> <name><surname>Holanda</surname> <given-names>M.</given-names></name></person-group> (<year>2018</year>). &#x201C;<article-title>Using principal component analysis to support students&#x2019; performance prediction and data analysis</article-title>,&#x201D; in <source><italic>Proceedings of the Brazilian symposium on computers in education</italic></source>, <volume>Vol. 29</volume> <issue>1383</issue>. <pub-id pub-id-type="doi">10.5753/cbie.sbie.2018.1383</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Boumans</surname> <given-names>J. W.</given-names></name> <name><surname>Trilling</surname> <given-names>D.</given-names></name></person-group> (<year>2016</year>). <article-title>Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars.</article-title> <source><italic>Digital J.</italic></source> <volume>4</volume> <fpage>8</fpage>&#x2013;<lpage>23</lpage>. <pub-id pub-id-type="doi">10.1080/21670811.2015.1096598</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bray</surname> <given-names>J. R.</given-names></name> <name><surname>Curtis</surname> <given-names>J. T.</given-names></name></person-group> (<year>1957</year>). <article-title>An ordination of upland forest communities of southern Wisconsin.</article-title> <source><italic>Ecol. Monogr.</italic></source> <volume>27</volume> <fpage>325</fpage>&#x2013;<lpage>349</lpage>. <pub-id pub-id-type="doi">10.2307/1942268</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Choi</surname> <given-names>W.</given-names></name> <name><surname>Jeong</surname> <given-names>H.</given-names></name></person-group> (<year>2016</year>). <article-title>Finding an appropriate lexical diversity measurement for a small-sized corpus and its application to a comparative study of L2 learners&#x2019; writings.</article-title> <source><italic>Multimed. Tools Appl.</italic></source> <volume>75</volume> <fpage>13015</fpage>&#x2013;<lpage>13022</lpage>. <pub-id pub-id-type="doi">10.1007/s11042-015-2529-1</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Covington</surname> <given-names>M. A.</given-names></name> <name><surname>McFall</surname> <given-names>J. D.</given-names></name></person-group> (<year>2010</year>). <article-title>Cutting the gordian knot: the moving-average type&#x2013;token ratio (MATTR).</article-title> <source><italic>J. Quant. Linguisti.</italic></source> <volume>17</volume> <fpage>94</fpage>&#x2013;<lpage>100</lpage>. <pub-id pub-id-type="doi">10.1080/09296171003643098</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Deerwester</surname> <given-names>S. C.</given-names></name> <name><surname>Dumais</surname> <given-names>S. T.</given-names></name> <name><surname>Landauer</surname> <given-names>T. K.</given-names></name> <name><surname>Furnas</surname> <given-names>G. W.</given-names></name> <name><surname>Harshman</surname> <given-names>R. A.</given-names></name></person-group> (<year>1990</year>). <article-title>Indexing by latent semantic analysis.</article-title> <source><italic>J. Am. Soc. Inf. Sci.</italic></source> <volume>41</volume> <fpage>391</fpage>&#x2013;<lpage>407</lpage>. <pub-id pub-id-type="doi">10.1002/(SICI)1097-4571(199009)41:6&#x003C;391::AID-ASI1&#x003E;3.0.CO;2-9</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gauch</surname> <given-names>H. G.</given-names></name></person-group> (<year>1982</year>). <article-title>Noise reduction by eigenvector ordinations.</article-title> <source><italic>Ecology</italic></source> <volume>63</volume> <fpage>1643</fpage>&#x2013;<lpage>1649</lpage>. <pub-id pub-id-type="doi">10.2307/1940105</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gerard</surname> <given-names>L. F.</given-names></name> <name><surname>Linn</surname> <given-names>M. C.</given-names></name></person-group> (<year>2016</year>). <article-title>Using automated scores of student essays to support teacher guidance in classroom inquiry.</article-title> <source><italic>J. Sci. Teacher Educ.</italic></source> <volume>27</volume> <fpage>111</fpage>&#x2013;<lpage>129</lpage>. <pub-id pub-id-type="doi">10.1007/s10972-016-9455-6</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gibbs</surname> <given-names>G. R.</given-names></name></person-group> (<year>2007</year>). <source><italic>Thematic coding and categorizing. Analyzing qualitative data.</italic></source> <publisher-loc>London</publisher-loc>: <publisher-name>SAGE Publications, Ltd</publisher-name>.</citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Goodrich</surname> <given-names>J. K.</given-names></name> <name><surname>Di Rienzi</surname> <given-names>S. C.</given-names></name> <name><surname>Poole</surname> <given-names>A. C.</given-names></name> <name><surname>Koren</surname> <given-names>O.</given-names></name> <name><surname>Walters</surname> <given-names>W. A.</given-names></name> <name><surname>Caporaso</surname> <given-names>J. G.</given-names></name><etal/></person-group> (<year>2014</year>). <article-title>Conducting a microbiome study.</article-title> <source><italic>Cell</italic></source> <volume>158</volume> <fpage>250</fpage>&#x2013;<lpage>262</lpage>. <pub-id pub-id-type="doi">10.1016/j.cell.2014.06.037</pub-id> <pub-id pub-id-type="pmid">25036628</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Graesser</surname> <given-names>A. C.</given-names></name> <name><surname>McNamara</surname> <given-names>D. S.</given-names></name> <name><surname>Kulikowich</surname> <given-names>J. M.</given-names></name></person-group> (<year>2011</year>). <article-title>Coh-Metrix: Providing multilevel analyses of text characteristics.</article-title> <source><italic>Educ. Res.</italic></source> <volume>40</volume> <fpage>223</fpage>&#x2013;<lpage>234</lpage>. <pub-id pub-id-type="doi">10.3102/0013189X11413260</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Haudek</surname> <given-names>K. C.</given-names></name> <name><surname>Prevost</surname> <given-names>L. B.</given-names></name> <name><surname>Moscarella</surname> <given-names>R. A.</given-names></name> <name><surname>Merrill</surname> <given-names>J.</given-names></name> <name><surname>Urban-Lurain</surname> <given-names>M.</given-names></name></person-group> (<year>2012</year>). <article-title>What are they thinking? Automated analysis of student writing about acid&#x2013;base chemistry in introductory biology.</article-title> <source><italic>CBE Life Sci. Educ.</italic></source> <volume>11</volume> <fpage>283</fpage>&#x2013;<lpage>293</lpage>. <pub-id pub-id-type="doi">10.1187/cbe.11-08-0084</pub-id> <pub-id pub-id-type="pmid">22949425</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><collab>IBM Corp</collab> (<year>2020</year>). <source><italic>IBM SPSS statistics for windows (Version 27.0) [computer software].</italic></source> <publisher-loc>Armonk, NY</publisher-loc>: <publisher-name>IBM Corp</publisher-name>.</citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jarvis</surname> <given-names>S.</given-names></name></person-group> (<year>2013</year>). <article-title>Capturing the diversity in lexical diversity.</article-title> <source><italic>Lang. Learn.</italic></source> <volume>63</volume> <fpage>83</fpage>&#x2013;<lpage>106</lpage>. <pub-id pub-id-type="doi">10.1111/j.1467-9922.2012.00739.x</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jescovitch</surname> <given-names>L. N.</given-names></name> <name><surname>Scott</surname> <given-names>E. E.</given-names></name> <name><surname>Cerchiara</surname> <given-names>J. A.</given-names></name> <name><surname>Merrill</surname> <given-names>J.</given-names></name> <name><surname>Urban-Lurain</surname> <given-names>M.</given-names></name> <name><surname>Doherty</surname> <given-names>J. H.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>Comparison of machine learning performance using analytic and holistic coding approaches across constructed response assessments aligned to a science learning progression.</article-title> <source><italic>J. Sci. Educ. Technol.</italic></source> <volume>30</volume> <fpage>150</fpage>&#x2013;<lpage>167</lpage>. <pub-id pub-id-type="doi">10.1007/s10956-020-09858-0</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jost</surname> <given-names>L.</given-names></name></person-group> (<year>2006</year>). <article-title>Entropy and diversity.</article-title> <source><italic>OIKOS</italic></source> <volume>113</volume> <fpage>363</fpage>&#x2013;<lpage>375</lpage>. <pub-id pub-id-type="doi">10.1111/j.2006.0030-1299.14714.x</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jurasinski</surname> <given-names>G.</given-names></name> <name><surname>Retzer</surname> <given-names>V.</given-names></name> <name><surname>Beierkuhnlein</surname> <given-names>C.</given-names></name></person-group> (<year>2009</year>). <article-title>Inventory, differentiation, and proportional diversity: a consistent terminology for quantifying species diversity.</article-title> <source><italic>Oecologia</italic></source> <volume>159</volume> <fpage>15</fpage>&#x2013;<lpage>26</lpage>. <pub-id pub-id-type="doi">10.1007/s00442-008-1190-z</pub-id> <pub-id pub-id-type="pmid">18953572</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kaplan</surname> <given-names>J. J.</given-names></name> <name><surname>Haudek</surname> <given-names>K. C.</given-names></name> <name><surname>Ha</surname> <given-names>M.</given-names></name> <name><surname>Rogness</surname> <given-names>N.</given-names></name> <name><surname>Fisher</surname> <given-names>D. G.</given-names></name></person-group> (<year>2014</year>). <article-title>Using lexical analysis software to assess student writing in statistics.</article-title> <source><italic>Technol. Innov. Stat. Educ.</italic></source> <volume>8</volume>. <pub-id pub-id-type="doi">10.5070/T581020235</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Koizumi</surname> <given-names>R.</given-names></name></person-group> (<year>2012</year>). <article-title>Relationships between text length and lexical diversity measures: Can we use short texts of less than 100 tokens?</article-title> <source><italic>Vocab. Learn. Instr.</italic></source> <volume>1</volume> <fpage>60</fpage>&#x2013;<lpage>69</lpage>. <pub-id pub-id-type="doi">10.7820/vli.v01.1.koizumi</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Krajcik</surname> <given-names>J. S.</given-names></name></person-group> (<year>2021</year>). <article-title>Commentary&#x2014;Applying machine learning in science assessment: Opportunity and challenges.</article-title> <source><italic>J. Sci. Educ. Technol.</italic></source> <volume>30</volume> <fpage>313</fpage>&#x2013;<lpage>318</lpage>.</citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Landauer</surname> <given-names>T. K.</given-names></name> <name><surname>Psotka</surname> <given-names>J.</given-names></name></person-group> (<year>2000</year>). <article-title>Simulating Text Understanding for Educational Applications with Latent Semantic Analysis: Introduction to LSA.</article-title> <source><italic>Interact. Learn. Environ.</italic></source> <volume>8</volume> <fpage>73</fpage>&#x2013;<lpage>86</lpage>. <pub-id pub-id-type="doi">10.1076/1049-4820(200008)8:2;1-B;FT073</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lande</surname> <given-names>R.</given-names></name></person-group> (<year>1996</year>). <article-title>Statistics and partitioning of species diversity, and similarity among multiple communities.</article-title> <source><italic>Oikos</italic></source> <volume>76</volume> <fpage>5</fpage>&#x2013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.2307/3545743</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>LaVoie</surname> <given-names>N.</given-names></name> <name><surname>Parker</surname> <given-names>J.</given-names></name> <name><surname>Legree</surname> <given-names>P. J.</given-names></name> <name><surname>Ardison</surname> <given-names>S.</given-names></name> <name><surname>Kilcullen</surname> <given-names>R. N.</given-names></name></person-group> (<year>2020</year>). <article-title>Using latent semantic analysis to score short answer constructed responses: automated scoring of the consequences test.</article-title> <source><italic>Educ. Psychol. Meas.</italic></source> <volume>80</volume> <fpage>399</fpage>&#x2013;<lpage>414</lpage>. <pub-id pub-id-type="doi">10.1177/0013164419860575</pub-id> <pub-id pub-id-type="pmid">32158028</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Malvern</surname> <given-names>D.</given-names></name> <name><surname>Richards</surname> <given-names>B.</given-names></name> <name><surname>Chipere</surname> <given-names>N.</given-names></name> <name><surname>Dur&#x00E1;n</surname> <given-names>P.</given-names></name></person-group> (<year>2004</year>). <source><italic>Lexical diversity and language development.</italic></source> <publisher-loc>Berlin</publisher-loc>: <publisher-name>Springer</publisher-name>. <pub-id pub-id-type="doi">10.1057/9780230511804</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McCune</surname> <given-names>B.</given-names></name> <name><surname>Mefford</surname> <given-names>M. J.</given-names></name></person-group> (<year>2018</year>). <source><italic>PC-ORD. Multivariate analysis of ecological data. Version 7.08.</italic></source></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McMurdie</surname> <given-names>P. J.</given-names></name> <name><surname>Holmes</surname> <given-names>S.</given-names></name></person-group> (<year>2013</year>). <article-title>phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data.</article-title> <source><italic>PLoS One</italic></source> <volume>8</volume>:<issue>e61217</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0061217</pub-id> <pub-id pub-id-type="pmid">23630581</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nehm</surname> <given-names>R. H.</given-names></name> <name><surname>Reilly</surname> <given-names>L.</given-names></name></person-group> (<year>2007</year>). <article-title>Biology majors&#x2019; knowledge and misconceptions of natural selection.</article-title> <source><italic>BioScience</italic></source> <volume>57</volume> <fpage>263</fpage>&#x2013;<lpage>272</lpage>. <pub-id pub-id-type="doi">10.1641/B570311</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nehm</surname> <given-names>R. H.</given-names></name> <name><surname>Schonfeld</surname> <given-names>I. S.</given-names></name></person-group> (<year>2008</year>). <article-title>Measuring knowledge of natural selection: a comparison of the CINS, an open response instrument, and an oral interview.</article-title> <source><italic>J. Res. Sci. Teach.</italic></source> <volume>45</volume> <fpage>1131</fpage>&#x2013;<lpage>1160</lpage>. <pub-id pub-id-type="doi">10.1002/tea.20251</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><collab>Ngss Lead States</collab> (<year>2013</year>). <source><italic>Next generation science standards; for states, by states; 2013.</italic></source> Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.nextgenscience.org/">https://www.nextgenscience.org/</ext-link> <comment>(accessed January 5, 2022)</comment>.</citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Noyes</surname> <given-names>K.</given-names></name> <name><surname>McKay</surname> <given-names>R. L.</given-names></name> <name><surname>Neumann</surname> <given-names>M.</given-names></name> <name><surname>Haudek</surname> <given-names>K. C.</given-names></name> <name><surname>Cooper</surname> <given-names>M. M.</given-names></name></person-group> (<year>2020</year>). <article-title>Developing computer resources to automate analysis of students&#x2019; explanations of London dispersion forces.</article-title> <source><italic>J. Chem. Educ.</italic></source> <volume>97</volume> <fpage>3923</fpage>&#x2013;<lpage>3936</lpage>. <pub-id pub-id-type="doi">10.1021/acs.jchemed.0c00445</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Palmer</surname> <given-names>M.</given-names></name></person-group> (<year>n.d.</year>). <source><italic>Ordination methods for ecologists. The ordination web page.</italic></source> Available online at: <ext-link ext-link-type="uri" xlink:href="http://ordination.okstate.edu/">http://ordination.okstate.edu/</ext-link></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Palmer</surname> <given-names>M. W.</given-names></name></person-group> (<year>2019</year>). &#x201C;<article-title>Gradient analysis of ecological communities (ordination)</article-title>,&#x201D; in <source><italic>Handbook of environmental and ecological statistics</italic></source>, <role>eds</role> <person-group person-group-type="editor"><name><surname>Gelfand</surname> <given-names>A.</given-names></name> <name><surname>Fuentes</surname> <given-names>M.</given-names></name> <name><surname>Hoeting</surname> <given-names>P.</given-names></name> <name><surname>Smith</surname> <given-names>R. L.</given-names></name></person-group> (<publisher-loc>Boca Raton</publisher-loc>: <publisher-name>CRC Press</publisher-name>), <fpage>241</fpage>&#x2013;<lpage>274</lpage>. <pub-id pub-id-type="doi">10.1201/9781315152509-12</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Peck</surname> <given-names>J. E.</given-names></name></person-group> (<year>2010</year>). <source><italic>Multivariate analysis for community ecologists: step-by-step using PC-ORD.</italic></source> <publisher-loc>Gleneden Beach, OR</publisher-loc>: <publisher-name>MjM Software Design</publisher-name>, <fpage>162</fpage>.</citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rajaraman</surname> <given-names>A.</given-names></name> <name><surname>Ullman</surname> <given-names>J.</given-names></name></person-group> (<year>2011</year>). <source><italic>Mining of massive datasets.</italic></source> <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>. <pub-id pub-id-type="doi">10.1017/CBO9781139058452</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Roswell</surname> <given-names>M.</given-names></name> <name><surname>Dushoff</surname> <given-names>J.</given-names></name> <name><surname>Winfree</surname> <given-names>R.</given-names></name></person-group> (<year>2021</year>). <article-title>A conceptual guide to measuring species diversity.</article-title> <source><italic>Oikos</italic></source> <volume>130</volume> <fpage>321</fpage>&#x2013;<lpage>338</lpage>. <pub-id pub-id-type="doi">10.1111/oik.07202</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scott</surname> <given-names>E. E.</given-names></name> <name><surname>Cerchiara</surname> <given-names>J.</given-names></name> <name><surname>McFarland</surname> <given-names>J. L.</given-names></name> <name><surname>Wenderoth</surname> <given-names>M. P.</given-names></name> <name><surname>Doherty</surname> <given-names>J. H.</given-names></name></person-group> (<year>2022</year>). <article-title>How students reason about matter flows and accumulations in complex biological phenomena: an emerging learning progression for mass balance.</article-title> <source><italic>J. Res. Sci. Teach.</italic></source> <volume>60</volume> <fpage>63</fpage>&#x2013;<lpage>99</lpage>. <pub-id pub-id-type="doi">10.1002/tea.21791</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shannon</surname> <given-names>C. E.</given-names></name></person-group> (<year>1948</year>). <article-title>A mathematical theory of communication.</article-title> <source><italic>Bell Syst. Tech. J.</italic></source> <volume>27</volume> <fpage>623</fpage>&#x2013;<lpage>656</lpage>. <pub-id pub-id-type="doi">10.1002/j.1538-7305.1948.tb00917.x</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shiroda</surname> <given-names>M.</given-names></name> <name><surname>Uhl</surname> <given-names>J. D.</given-names></name> <name><surname>Urban-Lurain</surname> <given-names>M.</given-names></name> <name><surname>Haudek</surname> <given-names>K. C.</given-names></name></person-group> (<year>2021</year>). <article-title>Comparison of computer scoring model performance for short text responses across undergraduate institutional types.</article-title> <source><italic>J. Sci. Educ. Technol.</italic></source> <volume>31</volume> <fpage>117</fpage>&#x2013;<lpage>128</lpage>. <pub-id pub-id-type="doi">10.1007/s10956-021-09935-y</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Si</surname> <given-names>F. J.</given-names></name></person-group> (<year>2006</year>). <article-title>The application of principal component analysis in teaching evaluation.</article-title> <source><italic>Intelligence</italic></source> <volume>26</volume> <fpage>78</fpage>&#x2013;<lpage>79</lpage>.</citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Simpson</surname> <given-names>E. H.</given-names></name></person-group> (<year>1949</year>). <article-title>Measurement of diversity.</article-title> <source><italic>Nature</italic></source> <volume>163</volume>:<issue>688</issue>. <pub-id pub-id-type="doi">10.1038/163688a0</pub-id></citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sripathi</surname> <given-names>K. N.</given-names></name> <name><surname>Moscarella</surname> <given-names>R. A.</given-names></name> <name><surname>Yoho</surname> <given-names>R.</given-names></name> <name><surname>You</surname> <given-names>H. S.</given-names></name> <name><surname>Urban-Lurain</surname> <given-names>M.</given-names></name> <name><surname>Merrill</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2019</year>). <article-title>Mixed student ideas about mechanisms of human weight loss.</article-title> <source><italic>CBE Life Sci. Educ.</italic></source> <volume>18</volume>:<issue>ar37</issue>. <pub-id pub-id-type="doi">10.1187/cbe.18-11-0227</pub-id> <pub-id pub-id-type="pmid">31418653</pub-id></citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Syms</surname> <given-names>C.</given-names></name></person-group> (<year>2008</year>). &#x201C;<article-title>Ordination&#x2019;</article-title>,&#x201D; in <source><italic>Encyclopedia of ecology</italic></source>, <role>eds</role> <person-group person-group-type="editor"><name><surname>J&#x00F8;rgensen</surname> <given-names>S. E.</given-names></name> <name><surname>Fath</surname> <given-names>B. D.</given-names></name></person-group> (<publisher-loc>Amsterdam</publisher-loc>: <publisher-name>Elsevier</publisher-name>), <fpage>2572</fpage>&#x2013;<lpage>2581</lpage>. <pub-id pub-id-type="doi">10.1016/B978-008045405-4.00524-3</pub-id></citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tuomisto</surname> <given-names>H.</given-names></name></person-group> (<year>2010</year>). <article-title>A diversity of beta diversities: straightening up a concept gone awry. Part 2. Quantifying beta diversity and related phenomena.</article-title> <source><italic>Ecography</italic></source> <volume>33</volume> <fpage>23</fpage>&#x2013;<lpage>45</lpage>. <pub-id pub-id-type="doi">10.1111/j.1600-0587.2009.06148.x</pub-id></citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tweedie</surname> <given-names>F. J.</given-names></name> <name><surname>Baayen</surname> <given-names>R. H.</given-names></name></person-group> (<year>1998</year>). <article-title>How variable may a constant be? Measures of lexical richness in perspective.</article-title> <source><italic>Comput. Hum.</italic></source> <volume>32</volume> <fpage>323</fpage>&#x2013;<lpage>352</lpage>. <pub-id pub-id-type="doi">10.1023/A:1001749303137</pub-id></citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Uhl</surname> <given-names>J. D.</given-names></name> <name><surname>Shiroda</surname> <given-names>M.</given-names></name> <name><surname>Haudek</surname> <given-names>K. C.</given-names></name></person-group> (<year>2022</year>). <article-title>Developing assessments to elicit and characterize undergraduate mechanistic explanations about information flow in biology.</article-title> <source><italic>J. Biol. Educ.</italic></source> <fpage>1</fpage>&#x2013;<lpage>20</lpage>. <pub-id pub-id-type="doi">10.1080/00219266.2022.2041460</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Uhl</surname> <given-names>J. D.</given-names></name> <name><surname>Sripathi</surname> <given-names>K. N.</given-names></name> <name><surname>Meir</surname> <given-names>E.</given-names></name> <name><surname>Merrill</surname> <given-names>J.</given-names></name> <name><surname>Urban-Lurain</surname> <given-names>M.</given-names></name> <name><surname>Haudek</surname> <given-names>K. C.</given-names></name></person-group> (<year>2021</year>). <article-title>Automated writing assessments measure undergraduate learning after completion of a computer-based cellular respiration tutorial.</article-title> <source><italic>CBE Life Sci. Educ.</italic></source> <volume>20</volume>:<issue>ar33</issue>. <pub-id pub-id-type="doi">10.1187/cbe.20-06-0122</pub-id> <pub-id pub-id-type="pmid">34100647</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Vellend</surname> <given-names>M.</given-names></name></person-group> (<year>2001</year>). <article-title>Do commonly used indices of &#x03B2;-diversity measure species turnover?</article-title> <source><italic>J. Veg. Sci.</italic></source> <volume>12</volume> <fpage>545</fpage>&#x2013;<lpage>552</lpage>. <pub-id pub-id-type="doi">10.2307/3237006</pub-id></citation></ref>
<ref id="B51"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Voleti</surname> <given-names>R.</given-names></name> <name><surname>Liss</surname> <given-names>J. M.</given-names></name> <name><surname>Berisha</surname> <given-names>V.</given-names></name></person-group> (<year>2020</year>). <article-title>A review of automated speech and language features for assessment of cognitive and thought disorders.</article-title> <source><italic>IEEE J. Sel. Top. Signal Process.</italic></source> <volume>14</volume> <fpage>282</fpage>&#x2013;<lpage>298</lpage>. <pub-id pub-id-type="doi">10.1109/JSTSP.2019.2952087</pub-id> <pub-id pub-id-type="pmid">33907590</pub-id></citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Welbers</surname> <given-names>K.</given-names></name> <name><surname>Van Atteveldt</surname> <given-names>W.</given-names></name> <name><surname>Benoit</surname> <given-names>K.</given-names></name></person-group> (<year>2017</year>). <article-title>Text analysis in R.</article-title> <source><italic>Commun. Methods Meas.</italic></source> <volume>11</volume> <fpage>245</fpage>&#x2013;<lpage>265</lpage>. <pub-id pub-id-type="doi">10.1080/19312458.2017.1387238</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Whittaker</surname> <given-names>R. H.</given-names></name></person-group> (<year>1967</year>). <article-title>Gradient analysis of vegetation.</article-title> <source><italic>Biol. Rev.</italic></source> <volume>42</volume> <fpage>207</fpage>&#x2013;<lpage>264</lpage>. <pub-id pub-id-type="doi">10.1111/j.1469-185X.1967.tb01419.x</pub-id> <pub-id pub-id-type="pmid">4859903</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Whittaker</surname> <given-names>R. H.</given-names></name></person-group> (<year>1969</year>). <article-title>Evolution of diversity in plant communities.</article-title> <source><italic>Brookhaven Symp. Biol.</italic></source> <volume>22</volume> <fpage>178</fpage>&#x2013;<lpage>195</lpage>.</citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Whittaker</surname> <given-names>R. H.</given-names></name></person-group> (<year>1972</year>). <article-title>Evolution and measurement of species diversity.</article-title> <source><italic>Taxon</italic></source> <volume>21</volume> <fpage>213</fpage>&#x2013;<lpage>251</lpage>. <pub-id pub-id-type="doi">10.2307/1218190</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Xian</surname> <given-names>S.</given-names></name> <name><surname>Xia</surname> <given-names>H.</given-names></name> <name><surname>Yin</surname> <given-names>Y.</given-names></name> <name><surname>Zhai</surname> <given-names>Z.</given-names></name> <name><surname>Shang</surname> <given-names>Y.</given-names></name></person-group> (<year>2016</year>). <article-title>Principal component clustering approach to teaching quality discriminant analysis.</article-title> <source><italic>Cogent Educ.</italic></source> <volume>3</volume>:<issue>1194553</issue>. <pub-id pub-id-type="doi">10.1080/2331186X.2016.1194553</pub-id></citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yuan</surname> <given-names>Y.</given-names></name> <name><surname>Buckland</surname> <given-names>S. T.</given-names></name> <name><surname>Harrison</surname> <given-names>P. J.</given-names></name> <name><surname>Foss</surname> <given-names>S.</given-names></name> <name><surname>Johnston</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Using species proportions to quantify turnover in biodiversity.</article-title> <source><italic>JABES</italic></source> <volume>21</volume> <fpage>363</fpage>&#x2013;<lpage>381</lpage>. <pub-id pub-id-type="doi">10.1007/s13253-015-0243-0</pub-id></citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zelen&#x00FD;</surname> <given-names>D.</given-names></name></person-group> (<year>2021</year>). <source><italic>Analysis of community ecology data in R.</italic></source> Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.davidzeleny.net/anadat-r/doku.php/en:div-ind">https://www.davidzeleny.net/anadat-r/doku.php/en:div-ind</ext-link> <comment>(accessed June 26, 2022)</comment>.</citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zenker</surname> <given-names>F.</given-names></name> <name><surname>Kyle</surname> <given-names>K.</given-names></name></person-group> (<year>2021</year>). <article-title>Investigating minimum text lengths for lexical diversity indices.</article-title> <source><italic>Assess. Writ.</italic></source> <volume>47</volume>:<issue>100505</issue>. <pub-id pub-id-type="doi">10.1016/j.asw.2020.100505</pub-id></citation></ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhai</surname> <given-names>X.</given-names></name> <name><surname>Haudek</surname> <given-names>K. C.</given-names></name> <name><surname>Ma</surname> <given-names>W.</given-names></name></person-group> (<year>2022</year>). <article-title>Assessing argumentation using machine learning and cognitive diagnostic modeling.</article-title> <source><italic>Res. Sci. Educ.</italic></source> <pub-id pub-id-type="doi">10.1007/s11165-022-10062-w</pub-id></citation></ref>
</ref-list>
</back>
</article>
