<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2019.01011</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Characterizing Interactive Communications in Computer-Supported Collaborative Problem-Solving Tasks: A Conditional Transition Profile Approach</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Hao</surname> <given-names>Jiangang</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/516778/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Mislevy</surname> <given-names>Robert J.</given-names></name>
</contrib>
</contrib-group>
<aff><institution>Educational Testing Service</institution>, <addr-line>Princeton, NJ</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Jason C. Immekus, University of Louisville, United States</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Mark Billinghurst, University of South Australia, Australia; Bernard Veldkamp, University of Twente, Netherlands</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Jiangang Hao <email>jhao&#x00040;ets.org</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Quantitative Psychology and Measurement, a section of the journal Frontiers in Psychology</p></fn></author-notes>
<pub-date pub-type="epub">
<day>08</day>
<month>05</month>
<year>2019</year>
</pub-date>
<pub-date pub-type="collection">
<year>2019</year>
</pub-date>
<volume>10</volume>
<elocation-id>1011</elocation-id>
<history>
<date date-type="received">
<day>11</day>
<month>09</month>
<year>2018</year>
</date>
<date date-type="accepted">
<day>16</day>
<month>04</month>
<year>2019</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2019 Hao and Mislevy.</copyright-statement>
<copyright-year>2019</copyright-year>
<copyright-holder>Hao and Mislevy</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract><p>Communication in a collaborative problem-solving activity plays a pivotal role in the success of the collaboration in both academia and the workplace. Computer-supported collaboration makes it possible to collect large-scale communication data to investigate the process at a finer granularity. In this paper, we introduce a conditional transition profile (CTP) to characterize aspects of each team member&#x00027;s communication. Based on the data from a large-scale empirical study, we found that participants in the same team tend to show similar CTP compared to participants from different teams. We also found that team members who showed more &#x0201C;negotiation&#x0201D; after the partner &#x0201C;shared&#x0201D; information tended to show more improvement after the collaboration while those who continued sharing ideas while their partners were negotiating tended to improve less.</p></abstract>
<kwd-group>
<kwd>collaborative problem solving</kwd>
<kwd>communication</kwd>
<kwd>transition matrix</kwd>
<kwd>stochastic process</kwd>
<kwd>assessment</kwd>
</kwd-group>
<counts>
<fig-count count="7"/>
<table-count count="2"/>
<equation-count count="7"/>
<ref-count count="29"/>
<page-count count="9"/>
<word-count count="5715"/>
</counts>
</article-meta>
</front>
<body>
<sec sec-type="intro" id="s1">
<title>1. Introduction</title>
<p>Technology advancement allows computer-supported collaboration to be widely adopted in both academia and the workplace. Compared to face-to-face collaboration, online collaboration significantly reduces the effort and cost of organizing joint work, making it ideal for a wide range of collaborative activities (Stahl et al., <xref ref-type="bibr" rid="B26">2006</xref>). The communication data in computer-supported collaboration contain rich information regarding the collaboration process. Understanding the communication process will help to identify pathways to more successful collaboration outcomes. Such knowledge can further inform the development of real-time facilitation or intervention mechanisms to scaffold the collaboration.</p>
<p>The analysis of communication data (or discourse analysis as it is often called in the computer-supported collaborative learning (CSCL) community) usually starts with the coding or labeling of each turn (or several turns that constitute large speech units) of communications based on a framework (rubrics) being developed to address specific research questions. For example, a number of coding frameworks have been developed to analyze different aspects of the communications among team members, such as the coding framework for collaborative problem solving (CPS) skills (Liu et al., <xref ref-type="bibr" rid="B17">2015</xref>), for the interactive patterns in collaboration (Andrews et al., <xref ref-type="bibr" rid="B2">2017</xref>), for cohesion and language (Graesser et al., <xref ref-type="bibr" rid="B8">2004</xref>; Dowell et al., <xref ref-type="bibr" rid="B6">2016</xref>), and for dialog acts (Allen and Core, <xref ref-type="bibr" rid="B1">1997</xref>). Based on human-coded discourse, natural language processing (NLP) techniques can be employed to automate the annotation to an accuracy level that is close to human coding (Ros&#x000E9; et al., <xref ref-type="bibr" rid="B19">2008</xref>; Rus et al., <xref ref-type="bibr" rid="B21">2015</xref>; Flor et al., <xref ref-type="bibr" rid="B7">2016</xref>; Hao et al., <xref ref-type="bibr" rid="B11">2017a</xref>).</p>
<p>The codings of discourses are numerical representations of the communication data and can be used as input variables for developing higher level feature representations of the communication process, or for developing statistical models of the process. Given that the communication data and codings often involve multiple interacting team members, it is of interest to develop feature variables that characterize both team performance and individual performance. Traditional discourse analysis usually uses the frequency of different codings (e.g., Dowell et al., <xref ref-type="bibr" rid="B6">2016</xref>) or sequence of codings (e.g., Hao et al., <xref ref-type="bibr" rid="B13">2016</xref>) as the high-level representations of the communication. However, such representations fail to capture the information of how a specific member responds to different types of utterances from others throughout the communication process. To address this issue, in this paper, we introduce a conditional transition profile (CTP) approach to form representations of each team member&#x00027;s responses to different types of utterances (based on a given coding framework) from other members. In collaborative work, what one member says is important, but how a member responds to the others&#x00027; utterances may contain more information about the member&#x00027;s skills in collaboration. The CTP approach provides a quantitative measure of how a team member responds to other team members. To illustrate the effectiveness of the method, we apply the CTP to data collected through a large-scale online collaborative task from the ETS collaborative science assessment prototype (ECSAP) project and show an example of how the team members&#x00027; CTPs were related to their performance improvements after the collaboration.</p>
</sec>
<sec id="s2">
<title>2. Conditional Transition Profile</title>
<p>Suppose we have a coding framework that has <italic>k</italic> different categories, the <italic>t</italic>-th turn of the communication can be characterized by a <italic>k</italic> dimensional state vector <bold>X</bold><sub><italic>t</italic></sub>, with elements either 0 or 1, indicating whether a given category is assigned to this turn of discourse<xref ref-type="fn" rid="fn0001"><sup>1</sup></xref>. For coding frameworks that require mutually exclusive codings, the state vector will have only one element as 1 and all others as 0. The states in a communication process can be considered from both the team level and the individual level. At each level, the most straightforward measure is the cumulative counts of the different states. A CPS profile based on the counts of states at the team level has been introduced to characterize the overall collaboration process of the team (Hao et al., <xref ref-type="bibr" rid="B13">2016</xref>). In this CPS profile, we considered the counts of different states (unigram) and consecutive state pairs (bigram), though the approach can be extended to include the counts of n sequential states (n-gram). It has been shown that different CPS profiles are related to different collaboration outcomes of the team (Hao et al., <xref ref-type="bibr" rid="B13">2016</xref>).</p>
<p>In the current paper, we further generalize the CPS profile from characterizing the whole team process to characterizing each team member&#x00027;s communication process. The most straightforward way to generalize the CPS profile is the direct counts of different states from each team member instead of all the team members. However, in a communication, what one member (target team member) says depends heavily on the other members&#x00027; preceding discourses. As such, counting the states of a target team member by conditioning on other partners&#x00027; preceding discourse states should encode more information about the individual&#x00027;s communicative moves in context than merely counting all the states together. As such, we introduce a conditional transition profile for each team member as follows.</p>
<p>For a sequence of coded discourses<xref ref-type="fn" rid="fn0002"><sup>2</sup></xref>, we can represent the states of communication in <xref ref-type="table" rid="T1">Table 1</xref>, where the column name indicates the states of the discourse from the targeted team member and the rows indicate the states of the discourse from the most immediate preceding discourse category from other team members. The numbers in the cells are the counts of the occurrences of the states specified by the corresponding row and column names. It is worth noting that we consider only the most immediate turns of discourses and ignore longer range dependency, though the extension to longer range dependency is straightforward. The reason for doing this is that the majority of short online conversations do not display long range dependency (some empirical evidence of this can be found in Hao et al., <xref ref-type="bibr" rid="B11">2017a</xref>). The elements of a CTP are defined as follows,</p>
<disp-formula id="E1"><label>(1)</label><mml:math id="M1"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>C</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>D</italic><sub><italic>i</italic></sub> denotes the state (coding category) <italic>i</italic> of the discourse from the targeted team member and <inline-formula><mml:math id="M2"><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, denotes the state <italic>j</italic> of the immediately preceding discourse from other team members. Here <italic>i</italic> runs for the columns and <italic>j</italic> runs for the rows. <italic>N</italic><sub><italic>ij</italic></sub> is the count of occurrences of the state in the corresponding cell. Note that this matrix is very similar to the (weighted) adjacency matrix widely used in graph theory, except that the latter is traceless (Biggs, <xref ref-type="bibr" rid="B5">1993</xref>).</p>

<table-wrap position="float" id="T1">
<label>Table 1</label>
<caption><p>Conditional transition profile of the communication.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th/>
<th valign="top" align="center"><bold>State 1</bold></th>
<th valign="top" align="center"><bold>State 2</bold></th>
<th valign="top" align="center"><bold>State 3</bold></th>
<th valign="top" align="center"><bold>&#x022EF;</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">State 1</td>
<td valign="top" align="center"><italic>N</italic><sub>11</sub></td>
<td valign="top" align="center"><italic>N</italic><sub>12</sub></td>
<td valign="top" align="center"><italic>N</italic><sub>13</sub></td>
<td valign="top" align="center">&#x000B7;&#x000B7;&#x000B7;</td>
</tr>
<tr>
<td valign="top" align="left">State 2</td>
<td valign="top" align="center"><italic>N</italic><sub>21</sub></td>
<td valign="top" align="center"><italic>N</italic><sub>22</sub></td>
<td valign="top" align="center"><italic>N</italic><sub>23</sub></td>
<td valign="top" align="center">&#x000B7;&#x000B7;&#x000B7;</td>
</tr>
<tr>
<td valign="top" align="left">State 3</td>
<td valign="top" align="center"><italic>N</italic><sub>31</sub></td>
<td valign="top" align="center"><italic>N</italic><sub>32</sub></td>
<td valign="top" align="center"><italic>N</italic><sub>33</sub></td>
<td valign="top" align="center">&#x000B7;&#x000B7;&#x000B7;</td>
</tr>
<tr>
<td valign="top" align="left">&#x000B7;&#x000B7;&#x000B7;</td>
<td valign="top" align="center">&#x000B7;&#x000B7;&#x000B7;</td>
<td valign="top" align="center">&#x000B7;&#x000B7;&#x000B7;</td>
<td valign="top" align="center">&#x000B7;&#x000B7;&#x000B7;</td>
<td valign="top" align="center">&#x000B7;&#x000B7;&#x000B7;</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The columns correspond to the states of the discourse from the targeted team member and the rows correspond to the states of preceding discourses from the other team members</italic>.</p>
</table-wrap-foot>
</table-wrap>

<p>In many practical applications, the relative ratios of the categories are often considered important. A representation of the ratios can be obtained by normalizing each cell of the table by the sum of its row.</p>
<disp-formula id="E2"><label>(2)</label><mml:math id="M3"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>T</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>/</mml:mo><mml:mrow><mml:mo stretchy="true">(</mml:mo><mml:mstyle displaystyle="true"><mml:munder class="msub"><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:munder></mml:mstyle><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="true">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>We call this the normalized CTP. In practice, as some elements could be zero due to a small sample size, so smoothing techniques, such as Laplace smoothing (Sch&#x000FC;tze et al., <xref ref-type="bibr" rid="B24">2008</xref>), can be used to estimate the elements of the normalized CTP as follows,</p>
<disp-formula id="E3"><label>(3)</label><mml:math id="M4"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mi>T</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B1;</mml:mi></mml:mrow><mml:mrow><mml:mstyle displaystyle="true"><mml:msub><mml:mrow><mml:mo>&#x02211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mstyle><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x0002B;</mml:mo><mml:mi>&#x003B1;</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where &#x003B1; &#x0003E; 0 is a smoothing parameter. We call the <inline-formula><mml:math id="M5"><mml:mi>C</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> as conditional transition profile and <inline-formula><mml:math id="M6"><mml:mi>T</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> as normalized conditional transition profile. Generally speaking, the <inline-formula><mml:math id="M7"><mml:mi>C</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> contains more information than <inline-formula><mml:math id="M8"><mml:mi>T</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> as the latter can be derived from the former but not the other way around. <inline-formula><mml:math id="M9"><mml:mi>T</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> characterizes the probability of the transition among states and could be more generalizable than <inline-formula><mml:math id="M10"><mml:mi>C</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> under some circumstances. A reliable estimate of the elements in <inline-formula><mml:math id="M11"><mml:mi>T</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> requires that the number of the occurrences in each cell should be large enough, which suggests that one may want to use the <inline-formula><mml:math id="M12"><mml:mi>C</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> instead of <inline-formula><mml:math id="M13"><mml:mi>T</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo><mml:msub><mml:mrow><mml:mover accent="true"><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mo>&#x00304;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula> if the count numbers are low. In the above definition of the CTP, we consider the counts by conditioning only the most immediately preceding turn by others. One can extend this to higher order association for situations where long-range dependency prevails in the communication.</p>
<p>It is worth noting that the normalized CTP resembles the stochastic matrix (also known as Markov matrix) if the underlying communication process is a discrete time Markov process that meets the following condition (Van Kampen, <xref ref-type="bibr" rid="B27">1992</xref>; Grimmett and Stirzaker, <xref ref-type="bibr" rid="B10">2001</xref>).</p>
<disp-formula id="E4"><label>(4)</label><mml:math id="M14"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">P</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x022EF;</mml:mo><mml:mspace width="0.3em" class="thinspace"/><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">P</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>where <italic>t</italic> denotes the <italic>t</italic><sup><italic>th</italic></sup> step of the process. A transition matrix (or stochastic matrix) <bold>P</bold> with elements</p>
<disp-formula id="E5"><label>(5)</label><mml:math id="M15"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext class="textrm" mathvariant="normal">P</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">P</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>will characterize the transition structure of the Markov process. If a Markov process is stationary (homogeneous), e.g., the following equation holds for all <italic>t</italic>, <italic>i</italic>, and <italic>j</italic>:</p>
<disp-formula id="E6"><label>(6)</label><mml:math id="M16"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:mtext class="textrm" mathvariant="normal">P</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mtext class="textrm" mathvariant="normal">P</mml:mtext><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">|</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>x</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>and we can readily predict the probability of different states for the (t&#x0002B;1)<sup><italic>th</italic></sup> turn based on the preceding turn and the initial turn through the following equation,</p>
<disp-formula id="E7"><label>(7)</label><mml:math id="M17"><mml:mtable class="eqnarray" columnalign="left"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mstyle mathvariant="bold"><mml:mtext>P</mml:mtext></mml:mstyle><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>X</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mstyle mathvariant="bold"><mml:mtext>P</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<p>One notable difference between the normalized CTP and the stochastic matrix of Markov process is that the former is not defined on a closed set of states as one team member&#x00027;s states are dependent on other team members&#x00027; states instead of her own. As such, the (normalized) CTP introduced above is more a way to numerically represent an aspect of the coded communication process for each team member rather than claiming the mathematical properties associated with the stochastic matrix of a Markov process, though some methods based on the stochastic matrix may still be borrowed to analyze the normalized CTP.</p>
<p>In the next section, we will show how the CPT approach can be used to characterize empirical communication data.</p>
</sec>
<sec id="s3">
<title>3. Empirical Study</title>
<sec>
<title>3.1. Task and Data</title>
<p>We carried out the ECSAP project to explore the assessment of communications in large-scale online CPS activities. The goal is to investigate what CPS skills can be detected in the communications and how these skills are related to collaboration outcomes. The details of ECSAP are beyond the scope of this paper, and we refer the readers to Hao et al. (<xref ref-type="bibr" rid="B14">2017b</xref>) for a description of the study. The core part of the ECSAP is a simulation-based task that allows two human participants to collaborate through a chat window to complete a set of questions and tasks about volcano science (Hao et al., <xref ref-type="bibr" rid="B12">2015</xref>). <xref ref-type="fig" rid="F1">Figure 1</xref> shows two screenshots of the simulation-based collaborative task. In the simulation task, the participants were shown some tutorials about the factors related to volcano eruption. Then, they were asked to answer about fifteen questions, during which they need to carry out some small experiments, such as deploying seismometers around a virtual volcano to collect data, to assist them in answering the questions. The first seven questions are selected responses which allow us to impose a set of structured system prompts to maximize the information elicitation. For each of the seven questions, the system prompts each team member to respond individually at first and then prompts the team members to collaborate with each other to discuss their answers via a chat window. After the collaboration, each member is given a chance to revise her initial answer. By checking the difference in the scores on the initial and revised answers, we can calculate each person&#x00027;s gain/loss from the collaboration. The remaining eight questions require manipulation of the tools in the simulation, which makes it more difficult to impose the initial-discuss-revise procedure. They are not addressed in the current analysis. In addition to this simulation-based collaborative task, we also administered a general science knowledge test (Rundgren et al., <xref ref-type="bibr" rid="B20">2012</xref>) to each participant to measure her content-relevant knowledge.</p>
<fig id="F1" position="float">
<label>Figure 1</label>
<caption><p>Two screenshots of the simulation-based collaborative task used in the ECSAP.</p></caption>
<graphic xlink:href="fpsyg-10-01011-g0001.tif"/>
</fig>
<p>We collected data through a crowdsourcing data collection platform, Amazon Mechanical Turk (Kittur et al., <xref ref-type="bibr" rid="B16">2008</xref>). We recruited 1,000 participants located in the United States with at least one year of college education and randomly assigned them into 500 dyads to complete the simulation-based collaborative task. Seventy-eight percent of the participants were White, 7% were Black or African American, 5% were Asian, 5% were Hispanic or Latino, and 5% were multiracial. Half of the participants are males and half are females, and the age ranges from 25 to 54. Most of the participants have prior experience of online communication, though not necessarily collaborative problem solving. After removing the teams that did not complete the task successfully, we were left with 474 dyads. In each team&#x00027;s response, there are about 80 turns of chat in total and about 30 turns around the first seven questions. We noticed that many teams did not precisely follow the initial-collaborate-revise procedure we set forth and started some non-prompted discussions when they were asked to answer alone. In our analysis, we consider only the teams that have no more than two non-prompted discussions. After this cut, we were left with 237 out of the 474 dyads. The analyses in this paper are based on this subset unless otherwise stated.</p>
<p>The data from each collaborative session include both the responses to the questions in the simulation and the text-chat communication between the team members around each question. The responses to the questions were scored based on the rubrics shown in Zapata-Rivera et al. (<xref ref-type="bibr" rid="B29">2014</xref>). We developed a framework for coding the communication data in CPS (Liu et al., <xref ref-type="bibr" rid="B17">2015</xref>) based on CSCL literature and the assessment frameworks from PISA 2015 (Organization for Economic Co-operation and Development, <xref ref-type="bibr" rid="B18">2013</xref>) and ATC21S (Griffin et al., <xref ref-type="bibr" rid="B9">2012</xref>). This framework considers four skills, namely, sharing ideas, negotiating ideas, regulating problem-solving and maintaining communication, which have been identified to be highly relevant to the CPS activity we are targeting. Each turn of the chat communications was coded into one of the four categories of skills based on our CPS framework. <xref ref-type="table" rid="T2">Table 2</xref> shows some example chats and states. Two human raters were trained on the CPS framework, and they double-coded a subset of the discourse data (15% of the data). The unit of coding is each turn of a conversation or each conversational utterance. The inter-rater agreement in terms of unweighted kappa is 0.67.</p>

<table-wrap position="float" id="T2">
<label>Table 2</label>
<caption><p>Example of a part of annotated chat data from one teams.</p></caption>
<table frame="hsides" rules="groups">
<thead><tr>
<th valign="top" align="left"><bold>Topic</bold></th>
<th valign="top" align="center"><bold>Member</bold></th>
<th valign="top" align="left"><bold>Chat</bold></th>
<th valign="top" align="left"><bold>State</bold></th>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">IntroduceYourselves</td>
<td valign="top" align="center">A</td>
<td valign="top" align="left">Hi</td>
<td valign="top" align="left">Maintaining</td>
</tr>
<tr>
<td valign="top" align="left">IntroduceYourselves</td>
<td valign="top" align="center">B</td>
<td valign="top" align="left">Hi, I&#x00027;m Jennifer</td>
<td valign="top" align="left">Maintaining</td>
</tr>
<tr>
<td valign="top" align="left">Question1A</td>
<td valign="top" align="center">A</td>
<td valign="top" align="left">chose b, cause its rocks cracking that cause the high frequency events</td>
<td valign="top" align="left">Sharing</td>
</tr>
<tr>
<td valign="top" align="left">Question1A</td>
<td valign="top" align="center">B</td>
<td valign="top" align="left">yes, same here</td>
<td valign="top" align="left">Negotiating</td>
</tr>
<tr>
<td valign="top" align="left">Question1B</td>
<td valign="top" align="center">A</td>
<td valign="top" align="left">d sound right to you?</td>
<td valign="top" align="left">Regulating</td>
</tr>
<tr>
<td valign="top" align="left">Question1B</td>
<td valign="top" align="center">B</td>
<td valign="top" align="left">I couldn&#x00027;t remember, I thought it was C</td>
<td valign="top" align="left">Regulating</td>
</tr>
<tr>
<td valign="top" align="left">Question1B</td>
<td valign="top" align="center">A</td>
<td valign="top" align="left">you are right</td>
<td valign="top" align="left">Negotiating</td>
</tr>
<tr>
<td valign="top" align="left">QuestionsP2</td>
<td valign="top" align="center">B</td>
<td valign="top" align="left">A and B?</td>
<td valign="top" align="left">Regulating</td>
</tr>
<tr>
<td valign="top" align="left">QuestionsP2</td>
<td valign="top" align="center">A</td>
<td valign="top" align="left">yes, that&#x00027;s what i got</td>
<td valign="top" align="left">Negotiating</td>
</tr>
<tr>
<td valign="top" align="left">QuestionsP3</td>
<td valign="top" align="center">A</td>
<td valign="top" align="left">52431?</td>
<td valign="top" align="left">Regulating</td>
</tr>
<tr>
<td valign="top" align="left">QuestionsP3</td>
<td valign="top" align="center">B</td>
<td valign="top" align="left">I was only sure about 5 and 1 being first and last</td>
<td valign="top" align="left">Sharing</td>
</tr>
<tr>
<td valign="top" align="left">QuestionsP3</td>
<td valign="top" align="center">B</td>
<td valign="top" align="left">4 is probably second to last</td>
<td valign="top" align="left">Sharing</td>
</tr>
<tr>
<td valign="top" align="left">ExampleSeisQuestion1</td>
<td valign="top" align="center">B</td>
<td valign="top" align="left">A?</td>
<td valign="top" align="left">Regulating</td>
</tr>
<tr>
<td valign="top" align="left">ExampleSeisQuestion1</td>
<td valign="top" align="center">A</td>
<td valign="top" align="left">picked a</td>
<td valign="top" align="left">Sharing</td>
</tr>
<tr>
<td valign="top" align="left">ExampleSeisQuestion2</td>
<td valign="top" align="center">A</td>
<td valign="top" align="left">thoughts?</td>
<td valign="top" align="left">Regulating</td>
</tr>
<tr>
<td valign="top" align="left">ExampleSeisQuestion2</td>
<td valign="top" align="center">B</td>
<td valign="top" align="left">b?</td>
<td valign="top" align="left">Regulating</td>
</tr>
<tr>
<td valign="top" align="left">ExampleSeisQuestion2</td>
<td valign="top" align="center">A</td>
<td valign="top" align="left">same</td>
<td valign="top" align="left">Negotiating</td>
</tr>
<tr>
<td valign="top" align="left">ExampleSeisQuestion3</td>
<td valign="top" align="center">A</td>
<td valign="top" align="left">obviously c</td>
<td valign="top" align="left">Sharing</td>
</tr>
<tr>
<td valign="top" align="left">ExampleSeisQuestion3</td>
<td valign="top" align="center">B</td>
<td valign="top" align="left">c</td>
<td valign="top" align="left">Sharing</td>
</tr>
<tr>
<td valign="top" align="left">&#x000B7;&#x000B7;&#x000B7;</td>
<td valign="top" align="left">&#x000B7;&#x000B7;&#x000B7;</td>
<td valign="top" align="left">&#x000B7;&#x000B7;&#x000B7;</td>
<td valign="top" align="left">&#x000B7;&#x000B7;&#x000B7;</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<p><italic>The topic column indicates the specific question around which the conversations happened. The member column indicates which member in the dyadic team produced the discourse</italic>.</p>
</table-wrap-foot>
</table-wrap>

</sec>
<sec>
<title>3.2. Methods</title>
<p>Given that there are about 30 turns of conversations in each team and there are four different coding categories, the expected count in each cell of the four by four matrix is relatively low&#x02014;about two. Therefore, we choose to use the CTP instead of the normalized version in this paper. The central research question we want to address is the usefulness of the CTP representation of each participant&#x00027;s communication process. As one aspect of this question, we investigated whether such a representation of the communication process is related to the participant&#x00027;s gain or loss as measured based on their total score changes between the initial and revised responses. The hypothesis is that if the CTP is an effective method for characterizing the collaboration process, it should have implications for the collaboration outcomes. We try the following two approaches to gain some in-depth knowledge of the relationship between a team member&#x00027;s communication process and her outcome from the collaboration.</p>
<p>In the first approach, we started with the total score changes and examine how the CTPs are different in different groups. Specifically, we divide the participants into two groups, labeled effective gain and ineffective gain. Each participant in the effective gain group has a positive total score change while each in the ineffective gain group has a negative or zero total score change. One may notice that such a grouping may systematically penalize people with higher content-relevant knowledge, as they have a higher chance to have a correct initial response to a given item, so it is not possible to further improve. To ensure that we are considering people with comparable content-relevant knowledge, we removed the participants who correctly answered more than five of the seven questions in their initial response. After controlling on this, we have 151 and 101 participants in the effective gain and ineffective gain groups respectively. We verified that they have comparable content-relevant knowledge by comparing their performance in the general science knowledge test, as shown in <xref ref-type="fig" rid="F2">Figure 2</xref>. The findings from this approach may be useful in informing the teaching or training of what features of the communication process lead to more effective collaboration outcomes.</p>
<fig id="F2" position="float">
<label>Figure 2</label>
<caption><p>Comparison of the total scores from participants who gain effectively and ineffectively from the collaboration. A <italic>t</italic>-test shows that the two groups have similar contents-relevant science knowledge.</p></caption>
<graphic xlink:href="fpsyg-10-01011-g0002.tif"/>
</fig>
<p>In the second approach, we started with the communication process by clustering the participants based on their CTPs, then examined the total score changes in each of the clusters. To perform the cluster analysis, we flattened each CTP into a 16-dimensional vector by appending rows one after another, then calculated Euclidean distances based on the vectors between pairs of participants as a similarity measure of their communication processes. Based on this similarity measure, we first perform a hierarchical clustering analysis using Ward linkage (Ward, <xref ref-type="bibr" rid="B28">1963</xref>) to cluster the participants and then examine the difference of the outcomes in terms of the total score change in different clusters. The findings from this approach can help to uncover similar patterns from the communication process that are associated with similar or different collaboration outcomes, which may also lead to meaningful feedback for a better teaching or training strategies for improving collaboration.</p>
<p>Both approaches may thus lead to actionable procedures in practice to diagnose issues in a computer-supported collaboration and provide feedback to better scaffold the collaboration. For example, after an online collaboration, if we found students who tend to respond to partners in a particular way often show poor collaboration outcomes, we can design coaching or training program to help them to change their ways of communication to ways that are more likely to lead to successful collaboration. The consistency of the findings from the two approaches will substantiate the efficacy of the CTP method for characterizing the communication process in a collaborative activity; whether these characterizations support effective feedback is beyond the scope of the present article.</p>
</sec>
</sec>
<sec sec-type="results" id="s4">
<title>4. Results</title>
<p>Before we present the results corresponding to the two approaches described above, we would like first to check whether CTPs between team members are more similar compared to those between random pairs of participants. Given the interdependent nature of dyadic communication, we might expect the CTPs between the team members to be more correlated than those between random pairs of participants, which can serve as a check of the plausibility of the CTP approach. We carried out such an analysis based on the full dataset, i.e., without taking out those teams with more than three non-prompted conversations and show the results in <xref ref-type="fig" rid="F3">Figure 3</xref>, where we compare the Euclidean distance between the CTPs from team members and random pairs. The result confirms our hypothesis of the interdependence of the communication between team members, which also lends support to the effectiveness of the CTP approach for characterizing the team member&#x00027;s communication process.</p>
<fig id="F3" position="float">
<label>Figure 3</label>
<caption><p>Distance distribution of team pairs and random pairs. A <italic>t</italic>-test show that the two distributions&#x00027; means are significantly different.</p></caption>
<graphic xlink:href="fpsyg-10-01011-g0003.tif"/>
</fig>
<p>The results from our first approach is shown in <xref ref-type="fig" rid="F4">Figure 4</xref>, where we compare each element of the CTPs corresponding to the effective and ineffective gain groups via independent <italic>t</italic>-tests (2-tailed)<xref ref-type="fn" rid="fn0003"><sup>3</sup></xref>. The results show that the effective gain group has significantly more &#x0201C;negotiate&#x0201D; following the partner&#x00027;s &#x0201C;share&#x0201D; and &#x0201C;negotiate&#x0201D;, while the ineffective gain group shows significantly more &#x0201C;share&#x0201D; following the partner&#x00027;s &#x0201C;negotiate&#x0201D; and &#x0201C;maintain.&#x0201D; This findings suggests that a person is more likely to demonstrate improved performance if she shows more &#x0201C;negotiate&#x0201D; following her partner&#x00027;s &#x0201C;share&#x0201D; and &#x0201C;negotiate.&#x0201D; However, a person is less likely to get an improved response if she shows more &#x0201C;share&#x0201D; upon her partner&#x00027;s &#x0201C;negotiate&#x0201D; and &#x0201C;maintain.&#x0201D; This suggests the fact that negotiation is essential for gaining more from a collaboration, while excessively sharing information will contribute negatively, which is consistent with our earlier findings at the team level (Hao et al., <xref ref-type="bibr" rid="B13">2016</xref>).</p>
<fig id="F4" position="float">
<label>Figure 4</label>
<caption><p>Mean and standard error of the CTPs correspond to the effective and ineffective gain groups. The p-values of pairwise <italic>t</italic>-tests for different CTP components are also presented.</p></caption>
<graphic xlink:href="fpsyg-10-01011-g0004.tif"/>
</fig>
<p>For the second approach, we show the dendrogram of the hierarchical clustering analysis in <xref ref-type="fig" rid="F5">Figure 5</xref>. By examining the distance among the clusters at different levels, we noted that cutting the inter-cluster separations by the elbow point of the inter-cluster distances leads to four clusters. Each cluster is colored differently in <xref ref-type="fig" rid="F5">Figure 5</xref> and the number of members in each cluster is shown in the legend. To gain more insight into the differences among the four clusters, we compare their CTPs against the CTP of the overall participants by looking at the effect size in terms of Cohen&#x00027;s d. A positive value implies the people in that cluster show more conditional actions corresponding to that cell than the overall population, while a negative value implies the other way around. The results are shown in <xref ref-type="fig" rid="F6">Figure 6</xref>. A general guideline (Sawilowsky, <xref ref-type="bibr" rid="B22">2009</xref>) for interpreting the effect size is that a Cohen&#x00027;s d equal and greater than 0.8 is considered large effect. Then, in each panel of <xref ref-type="fig" rid="F6">Figure 6</xref>, readers can identify how the corresponding cluster is different from the overall participants. Such a plot can give readers a general sense of the major difference between the clusters. <xref ref-type="fig" rid="F7">Figure 7</xref> further shows the total score changes in each cluster. The participants in cluster 2 show significantly more positive gain compared to people in other clusters. Connecting back to <xref ref-type="fig" rid="F6">Figure 6</xref>, one can immediately identify the main feature of the cluster 3, e.g., participants show more &#x0201C;negotiate&#x0201D; actions when partners &#x0201C;share&#x0201D; information, which is consistent with the results from the first approach.</p>
<fig id="F5" position="float">
<label>Figure 5</label>
<caption><p>Dendrogram of the hierarchical clustering based on the Euclidean distance calculated from the CTPs. The horizontal dashed line is the distance cut corresponding to the elbow point of the inter-cluster distances. The numbers in the bracket in the legend show how many participants are in each of the clusters.</p></caption>
<graphic xlink:href="fpsyg-10-01011-g0005.tif"/>
</fig>
<fig id="F6" position="float">
<label>Figure 6</label>
<caption><p>The effect size in terms of Cohen&#x00027;s d between the CPTs of participants from each cluster and from all participants.</p></caption>
<graphic xlink:href="fpsyg-10-01011-g0006.tif"/>
</fig>
<fig id="F7" position="float">
<label>Figure 7</label>
<caption><p>The means and standard errors of the total score changes from each cluster.</p></caption>
<graphic xlink:href="fpsyg-10-01011-g0007.tif"/>
</fig>
</sec>
<sec id="s5">
<title>5. Conclusion and Future Work</title>
<p>In this paper, we introduced a CTP approach to characterize individual team member&#x00027;s communication process in computer-supported collaborations. Based on a large-scale empirical study and using two different approaches starting from the collaboration outcome and the communication process respectively, we show the CTP approach can effectively characterize aspects of one&#x00027;s communication process.</p>
<p>The purpose of the current study was to demonstrate the use of the CTP matrix rather than examine collaboration patterns in a controlled experiment. However, the results of applying CTP to the empirical study suggest that RM&#x00027;s one might try to negotiate while his/her team partner is sharing and negotiating ideas with him/her if he/she wants to gain more from the collaboration. Just sharing ideas seems less likely to help you gain more from collaboration, and even lead to worse outcomes if you do so while your partner is negotiating with you. This finding is consistent with our previous findings at the team level (Hao et al., <xref ref-type="bibr" rid="B13">2016</xref>) and findings in the CSCL literature (Scardamalia and Bereiter, <xref ref-type="bibr" rid="B23">1994</xref>; Stahl, <xref ref-type="bibr" rid="B25">2006</xref>). Moreover, such findings can be incorporated into the teaching of collaborative problem solving skills, and can also be included into real-time feedback mechanisms for scaffolding collaboration.</p>
<p>Despite the effectiveness of CTP, the approach has several known limitations. The first is that it does not capture timing information that could contain useful information concerning, for example, the participation and engagement of the team members regarding their communication and collaboration. Timing is often strongly dependent on the specific task design, however, and its relationship with the other aspects of a collaboration can vary significantly from task to task. As such, a time-dependent version of the CTP with proper inclusion of timing data may provide a better characterization of the process in a given task situation but at the cost of reduced generalizability.</p>
<p>The second is that the CTP does not address possible random errors of the states, such as those introduced during the coding process. A future line of work that may help to improve along this direction may be the introduction of hidden states and emission probabilities to connect the hidden states to the observed states to accommodate the random errors, as Hidden Markov Models (Baum and Petrie, <xref ref-type="bibr" rid="B3">1966</xref>).</p>
<p>The third is that the CTP may become very sparse if there are many coding categories and multiple participants. The average count of each element in the CTP scales down as 1/(<italic>nk</italic><sup>2</sup>) with <italic>n</italic> as the number of team members and <italic>k</italic> as the number of coding categories. Users need to make sensible decisions regarding whether to use this method if the communication sequence is very short. A future line of work to address this limitation could consider latent variable modeling, such as factor analysis, though which one can identify a small set of factors to deal with the sparsity.</p>
<p>Finally, the communication process data used in this paper is relatively short, only about thirty turns on average when considering the first seven questions. Though some statistically significant effects have been detected at the subgroup level (thanks to a large number of participants), it does not allow us to reveal more details of each team member&#x00027;s process. In ongoing work, we have collected new data using a task hosted on the ETS Platform for Collaborative Assessment and Learning (Hao et al., <xref ref-type="bibr" rid="B15">2017c</xref>). The new task elicits over 120 turns of communication in each team. We will report the findings based on the new data set in future work.</p>
</sec>
<sec id="s6">
<title>Ethics Statement</title>
<p>The data collection is approved through ETS&#x00027;s IRB.</p>
</sec>
<sec id="s7">
<title>Author Contributions</title>
<p>JH contributed to task development, data collection and analysis, research idea and method development, and presentation. RM contributed to research idea and method development, presentation and interpretation.</p>
<sec>
<title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</sec>
</body>
<back>
<ref-list>
<title>References</title>
<ref id="B1">
<citation citation-type="web"><person-group person-group-type="author"><name><surname>Allen</surname> <given-names>J.</given-names></name> <name><surname>Core</surname> <given-names>M.</given-names></name></person-group> (<year>1997</year>). <source>Draft of Damsl: Dialog Act Markup in Several Layers</source>. Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.cs.rochester.edu/research/cisd/resources/damsl/RevisedManual/">http://www.cs.rochester.edu/research/cisd/resources/damsl/RevisedManual/</ext-link></citation></ref>
<ref id="B2">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Andrews</surname> <given-names>J. J.</given-names></name> <name><surname>Kerr</surname> <given-names>D.</given-names></name> <name><surname>Mislevy</surname> <given-names>R. J.</given-names></name> <name><surname>Davier</surname> <given-names>A.</given-names></name> <name><surname>Hao</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>L.</given-names></name></person-group> (<year>2017</year>). <article-title>Modeling collaborative interaction patterns in a simulation-based task</article-title>. <source>J. Educ. Measure.</source> <volume>54</volume>, <fpage>54</fpage>&#x02013;<lpage>69</lpage>. <pub-id pub-id-type="doi">10.1111/jedm.12132</pub-id></citation></ref>
<ref id="B3">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Baum</surname> <given-names>L. E.</given-names></name> <name><surname>Petrie</surname> <given-names>T.</given-names></name></person-group> (<year>1966</year>). <article-title>Statistical inference for probabilistic functions of finite state markov chains</article-title>. <source>Ann. Math. Stat.</source> <volume>37</volume>, <fpage>1554</fpage>&#x02013;<lpage>1563</lpage>. <pub-id pub-id-type="doi">10.1214/aoms/1177699147</pub-id></citation></ref>
<ref id="B4">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Benjamini</surname> <given-names>Y.</given-names></name> <name><surname>Hochberg</surname> <given-names>Y.</given-names></name></person-group> (<year>1995</year>). <article-title>Controlling the false discovery rate: a practical and powerful approach to multiple testing</article-title>. <source>J. R. Stat. Soc. Ser. B</source> <volume>57</volume>, <fpage>289</fpage>&#x02013;<lpage>300</lpage>.</citation></ref>
<ref id="B5">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Biggs</surname> <given-names>N.</given-names></name></person-group> (<year>1993</year>). <source>Algebraic Graph Theory, 2nd edn</source>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press; Cambridge Mathematical Library</publisher-name>.</citation></ref>
<ref id="B6">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dowell</surname> <given-names>N. M.</given-names></name> <name><surname>Graesser</surname> <given-names>A. C.</given-names></name> <name><surname>Cai</surname> <given-names>Z.</given-names></name></person-group> (<year>2016</year>). <article-title>Language and discourse analysis with coh-metrix: applications from educational material to learning environments at scale</article-title>. <source>J. Learn. Anal.</source> <volume>3</volume>, <fpage>72</fpage>&#x02013;<lpage>95</lpage>. <pub-id pub-id-type="doi">10.18608/jla.2016.33.5</pub-id></citation></ref>
<ref id="B7">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Flor</surname> <given-names>M.</given-names></name> <name><surname>Yoon</surname> <given-names>S.-Y.</given-names></name> <name><surname>Hao</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>von Davier</surname> <given-names>A.</given-names></name></person-group> (<year>2016</year>). <article-title>Automated classification of collaborative problem solving interactions in simulated science tasks</article-title>, in <source>Proceedings of 11th Workshop on Innovative Use of NLP for Building Educational Applications</source> (<publisher-loc>San Diego, CA</publisher-loc>). <pub-id pub-id-type="doi">10.18653/v1/W16-0504</pub-id></citation></ref>
<ref id="B8">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Graesser</surname> <given-names>A. C.</given-names></name> <name><surname>McNamara</surname> <given-names>D. S.</given-names></name> <name><surname>Louwerse</surname> <given-names>M. M.</given-names></name> <name><surname>Cai</surname> <given-names>Z.</given-names></name></person-group> (<year>2004</year>). <article-title>Coh-metrix: analysis of text on cohesion and language</article-title>. <source>Behav. Res. Methods Instrum. Comput.</source> <volume>36</volume>, <fpage>193</fpage>&#x02013;<lpage>202</lpage>. <pub-id pub-id-type="doi">10.3758/BF03195564</pub-id><pub-id pub-id-type="pmid">15354684</pub-id></citation></ref>
<ref id="B9">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Griffin</surname> <given-names>P.</given-names></name> <name><surname>McGaw</surname> <given-names>B.</given-names></name> <name><surname>Care</surname> <given-names>E.</given-names></name></person-group> (<year>2012</year>). <source>Assessment and Teaching of 21st Century Skills</source>. <publisher-name>Springer</publisher-name>. <pub-id pub-id-type="doi">10.1007/978-94-007-2324-5</pub-id></citation></ref>
<ref id="B10">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Grimmett</surname> <given-names>G.</given-names></name> <name><surname>Stirzaker</surname> <given-names>D.</given-names></name></person-group> (<year>2001</year>). <source>Probability and Random Processes</source>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>.</citation></ref>
<ref id="B11">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hao</surname> <given-names>J.</given-names></name> <name><surname>Chen</surname> <given-names>L.</given-names></name> <name><surname>Flor</surname> <given-names>M.</given-names></name> <name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>von Davier</surname> <given-names>A. A.</given-names></name></person-group> (<year>2017a</year>). <article-title>Cps-rater: automated sequential annotation for conversations in collaborative problem-solving activities</article-title>. <source>ETS Res. Report Ser.</source> <volume>2017</volume>, <fpage>1</fpage>&#x02013;<lpage>9</lpage>. <pub-id pub-id-type="doi">10.1002/ets2.12184</pub-id></citation></ref>
<ref id="B12">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hao</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>von Davier</surname> <given-names>A.</given-names></name> <name><surname>Kyllonen</surname> <given-names>P.</given-names></name></person-group> (<year>2015</year>). <source>Assessing Collaborative Problem Solving With Simulation Based Tasks.</source> <publisher-loc>Gothenburg</publisher-loc>: <publisher-name>International Society of the Learning Sciences, Inc</publisher-name>.[ISLS].</citation></ref>
<ref id="B13">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Hao</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>von Davier</surname> <given-names>A.</given-names></name> <name><surname>Kyllonen</surname> <given-names>P.</given-names></name> <name><surname>Kitchen</surname> <given-names>C.</given-names></name></person-group> (<year>2016</year>). <article-title>Collaborative problem solving skills versus collaboration outcomes: Findings from statistical analysis and data mining</article-title>, in <source>EDM</source>, eds <person-group person-group-type="editor"><name><surname>Barnes</surname> <given-names>T.</given-names></name> <name><surname>Chi</surname> <given-names>M.</given-names></name> <name><surname>Feng</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Raleigh, NC</publisher-loc>: <publisher-name>International Conference on Educational Data Mining</publisher-name>), <fpage>382</fpage>&#x02013;<lpage>387</lpage>.</citation></ref>
<ref id="B14">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hao</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>von Davier</surname> <given-names>A. A.</given-names></name> <name><surname>Kyllonen</surname> <given-names>P. C.</given-names></name></person-group> (<year>2017b</year>). <article-title>Initial steps towards a standardized assessment for collaborative problem solving (cps): practical challenges and strategies</article-title>, in <source>Innovative Assessment of Collaboration</source>, eds <person-group person-group-type="editor"><name><surname>von Davier</surname> <given-names>A. A.</given-names></name> <name><surname>Zhu</surname> <given-names>M.</given-names></name> <name><surname>Kyllonen</surname> <given-names>P. C.</given-names></name></person-group> (Springer), <fpage>135</fpage>&#x02013;<lpage>156</lpage>. <pub-id pub-id-type="doi">10.1007/978-3-319-33261-1-9</pub-id></citation></ref>
<ref id="B15">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hao</surname> <given-names>J.</given-names></name> <name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>von Davier</surname> <given-names>A. A.</given-names></name> <name><surname>Lederer</surname> <given-names>N.</given-names></name> <name><surname>Zapata-Rivera</surname> <given-names>D.</given-names></name> <name><surname>Jakl</surname> <given-names>P.</given-names></name> <etal/></person-group>. (<year>2017c</year>). <article-title>Epcal: Ets platform for collaborative assessment and learning</article-title>. <source>ETS Res. Report Ser.</source> <volume>2017</volume>, <fpage>1</fpage>&#x02013;<lpage>14</lpage>. <pub-id pub-id-type="doi">10.1002/ets2.12181</pub-id></citation></ref>
<ref id="B16">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Kittur</surname> <given-names>A.</given-names></name> <name><surname>Chi</surname> <given-names>E. H.</given-names></name> <name><surname>Suh</surname> <given-names>B.</given-names></name></person-group> (<year>2008</year>). <article-title>Crowdsourcing user studies with mechanical turk</article-title>, in <source>Proceedings of the SIGCHI Conference on Human Factors in Computing Systems</source> (<publisher-loc>Florence</publisher-loc>: <publisher-name>ACM</publisher-name>), <fpage>453</fpage>&#x02013;<lpage>456</lpage>. <pub-id pub-id-type="doi">10.1145/1357054.1357127</pub-id></citation></ref>
<ref id="B17">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>Hao</surname> <given-names>J.</given-names></name> <name><surname>von Davier</surname> <given-names>A. A.</given-names></name> <name><surname>Kyllonen</surname> <given-names>P.</given-names></name> <name><surname>Zapata-Rivera</surname> <given-names>D.</given-names></name></person-group> (<year>2015</year>). <article-title>A tough nut to crack: Measuring collaborative problem solving</article-title>, in <source>Handbook of Research on Technology Tools for Real-World Skill Development</source>, eds <person-group person-group-type="editor"><name><surname>Rosen</surname> <given-names>Y.</given-names></name> <name><surname>Ferrara</surname> <given-names>S.</given-names></name> <name><surname>Mosharraf</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Hershey, PA</publisher-loc>: <publisher-name>IGI Global</publisher-name>), <fpage>344</fpage>. <pub-id pub-id-type="doi">10.4018/978-1-4666-9441-5.ch013</pub-id></citation></ref>
<ref id="B18">
<citation citation-type="book"><person-group person-group-type="author"><collab>Organization for Economic Co-operation and Development</collab></person-group> (<year>2013</year>). <source>Pisa 2015 Draft Collaborative Problem Solving Assessment Framework</source>. <publisher-loc>Paris</publisher-loc>: <publisher-name>OECD Publishing</publisher-name>.</citation></ref>
<ref id="B19">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ros&#x000E9;</surname> <given-names>C.</given-names></name> <name><surname>Wang</surname> <given-names>Y.-C.</given-names></name> <name><surname>Cui</surname> <given-names>Y.</given-names></name> <name><surname>Arguello</surname> <given-names>J.</given-names></name> <name><surname>Stegmann</surname> <given-names>K.</given-names></name> <name><surname>Weinberger</surname> <given-names>A.</given-names></name> <etal/></person-group>. (<year>2008</year>). <article-title>Analyzing collaborative learning processes automatically: exploiting the advances of computational linguistics in computer-supported collaborative learning</article-title>. <source>Int. J. Comput Supp. Collabor. Learn.</source> <volume>3</volume>, <fpage>237</fpage>&#x02013;<lpage>271</lpage>. <pub-id pub-id-type="doi">10.1007/s11412-007-9034-0</pub-id></citation></ref>
<ref id="B20">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rundgren</surname> <given-names>C.-J.</given-names></name> <name><surname>Rundgren</surname> <given-names>S.-N. C.</given-names></name> <name><surname>Tseng</surname> <given-names>Y.-H.</given-names></name> <name><surname>Lin</surname> <given-names>P.-L.</given-names></name> <name><surname>Chang</surname> <given-names>C.-Y.</given-names></name></person-group> (<year>2012</year>). <article-title>Are you slim? developing an instrument for civic scientific literacy measurement (slim) based on media coverage</article-title>. <source>Public Understand. Sci.</source> <volume>21</volume>, <fpage>759</fpage>&#x02013;<lpage>773</lpage>. <pub-id pub-id-type="doi">10.1177/0963662510377562</pub-id><pub-id pub-id-type="pmid">23832159</pub-id></citation></ref>
<ref id="B21">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Rus</surname> <given-names>V.</given-names></name> <name><surname>Niraula</surname> <given-names>N. B.</given-names></name> <name><surname>Maharjan</surname> <given-names>N.</given-names></name> <name><surname>Banjade</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>Automated labelling of dialogue modes in tutorial dialogues</article-title>, in <source>FLAIRS Conference</source> (<publisher-loc>Hollywood, FL</publisher-loc>), <fpage>205</fpage>&#x02013;<lpage>210</lpage>.</citation></ref>
<ref id="B22">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sawilowsky</surname> <given-names>S. S.</given-names></name></person-group> (<year>2009</year>). <article-title>New effect size rules of thumb</article-title>. <source>J. Modern Appl. Statist. Methods</source> <volume>8</volume>, <fpage>467</fpage>&#x02013;<lpage>474</lpage>. <pub-id pub-id-type="doi">10.22237/jmasm/1257035100</pub-id></citation></ref>
<ref id="B23">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Scardamalia</surname> <given-names>M.</given-names></name> <name><surname>Bereiter</surname> <given-names>C.</given-names></name></person-group> (<year>1994</year>). <article-title>Computer support for knowledge-building communities</article-title>. <source>J Learn. Sci.</source> <volume>3</volume>, <fpage>265</fpage>&#x02013;<lpage>283</lpage>. <pub-id pub-id-type="doi">10.1207/s15327809jls0303-3</pub-id></citation></ref>
<ref id="B24">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Sch&#x000FC;tze</surname> <given-names>H.</given-names></name> <name><surname>Manning</surname> <given-names>C. D.</given-names></name> <name><surname>Raghavan</surname> <given-names>P.</given-names></name></person-group> (<year>2008</year>). <source>Introduction to Information Retrieval</source>, Vol <volume>39</volume>. <publisher-loc>Cambridge, UK</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>.</citation></ref>
<ref id="B25">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stahl</surname> <given-names>G.</given-names></name></person-group> (<year>2006</year>). <source>Group Cognition: Computer Support for Building Collaborative Knowledge (Acting With Technology)</source>. <publisher-name>The MIT Press</publisher-name>.</citation></ref>
<ref id="B26">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stahl</surname> <given-names>G.</given-names></name> <name><surname>Koschmann</surname> <given-names>T.</given-names></name> <name><surname>Suthers</surname> <given-names>D.</given-names></name></person-group> (<year>2006</year>). <article-title>Computer-supported collaborative learning: an historical perspective</article-title>. <source>Cambridge Handbook Learn Sci.</source> <volume>2006</volume>, <fpage>409</fpage>&#x02013;<lpage>426</lpage>. <pub-id pub-id-type="doi">10.1017/CBO9780511816833.025</pub-id></citation></ref>
<ref id="B27">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Van Kampen</surname> <given-names>N. G.</given-names></name></person-group> (<year>1992</year>). <source>Stochastic Processes in Physics and Chemistry</source>, Vol <volume>1</volume>. <publisher-loc>Amsterdam</publisher-loc>: <publisher-name>Elsevier</publisher-name>.</citation></ref>
<ref id="B28">
<citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ward</surname> <given-names>J. H.</given-names> <suffix>Jr</suffix></name></person-group>. (<year>1963</year>). <article-title>Hierarchical grouping to optimize an objective function</article-title>. <source>J. Am. Statist. Assoc.</source> <volume>58</volume>, <fpage>236</fpage>&#x02013;<lpage>244</lpage>. <pub-id pub-id-type="doi">10.1080/01621459.1963.10500845</pub-id></citation></ref>
<ref id="B29">
<citation citation-type="book"><person-group person-group-type="author"><name><surname>Zapata-Rivera</surname> <given-names>D.</given-names></name> <name><surname>Jackson</surname> <given-names>T.</given-names></name> <name><surname>Liu</surname> <given-names>L.</given-names></name> <name><surname>Bertling</surname> <given-names>M.</given-names></name> <name><surname>Vezzu</surname> <given-names>M.</given-names></name> <name><surname>Katz</surname> <given-names>I. R.</given-names></name></person-group> (<year>2014</year>). <article-title>Assessing science inquiry skills using trialogues</article-title>, in <source>Intelligent Tutoring Systems</source>, eds <person-group person-group-type="editor"><name><surname>Trausan-Matu</surname> <given-names>S.</given-names></name> <name><surname>Boyer</surname> <given-names>K. E.</given-names></name> <name><surname>Crosby</surname> <given-names>M. E.</given-names></name> <name><surname>Panourgia</surname> <given-names>K.</given-names></name></person-group> (<publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>), <fpage>625</fpage>&#x02013;<lpage>626</lpage>.</citation></ref>
</ref-list>
<fn-group>
<fn id="fn0001"><p><sup>1</sup>In practice, the categories or states are assigned either by human coders or automated coding algorithms.</p></fn>
<fn id="fn0002"><p><sup>2</sup><xref ref-type="table" rid="T2">Table 2</xref> shows an empirical example of coded discourses.</p></fn>
<fn id="fn0003"><p><sup>3</sup>Note that multiple comparison happens in this case. As the Bonferroni correction is well-known to be too stringent for discovery-oriented studies, we adopted the False Discover Rate (FDR Benjamini and Hochberg, <xref ref-type="bibr" rid="B4">1995</xref>) approach by setting the level of FDR to 0.2, which means we tolerate 20% of the discoveries to be false. At this FDR level, the adjusted p-value for significance is still 0.05 (which is a coincidence).</p></fn>
</fn-group>
<fn-group>
<fn fn-type="financial-disclosure"><p><bold>Funding.</bold> The work is funded through the Research Allocation Funding at Educational Testing Service.</p>
</fn>
</fn-group>
</back>
</article>