<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Psychol.</journal-id>
<journal-title>Frontiers in Psychology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Psychol.</abbrev-journal-title>
<issn pub-type="epub">1664-1078</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fpsyg.2022.800983</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Psychology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Lexical Profile of Newspapers Revisited: A Corpus-Based Analysis</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Ha</surname> <given-names>Hung Tan</given-names></name>
<xref ref-type="corresp" rid="c001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1529496/overview"/>
</contrib>
</contrib-group>
<aff><institution>School of Foreign Languages, University of Economics Ho Chi Minh City (UEH)</institution>, <addr-line>Ho Chi Minh City</addr-line>, <country>Vietnam</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: Sascha Schroeder, University of G&#x00F6;ttingen, Germany</p></fn>
<fn fn-type="edited-by"><p>Reviewed by: Kay-Michael W&#x00FC;rzner, Technical University Dresden, Germany; Jutta Trautwein, University of Paderborn, Germany</p></fn>
<corresp id="c001">&#x002A;Correspondence: Hung Tan Ha, <email>hatanhung1991@gmail.com</email>; <ext-link ext-link-type="uri" xlink:href="https://orcid.org/0000-0002-5901-7718">orcid.org/0000-0002-5901-7718</ext-link></corresp>
<fn fn-type="other" id="fn004"><p>This article was submitted to Language Sciences, a section of the journal Frontiers in Psychology</p></fn>
</author-notes>
<pub-date pub-type="epub">
<day>24</day>
<month>02</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>13</volume>
<elocation-id>800983</elocation-id>
<history>
<date date-type="received">
<day>24</day>
<month>10</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>02</day>
<month>02</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2022 Ha.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Ha</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>The present study analyzed the vocabulary profile of the News on the Web (NOW) corpus, which contained 12 billion words from online newspapers and magazines in 20 countries to determine the vocabulary knowledge needed to reasonably understand online newspaper and magazine articles. The results showed that, in general, knowledge of the most frequent 4,000 word families in the British National Corpus/Corpus of Contemporary American English (BNC/COCA) wordlist plus proper nouns, marginal words, transparent compounds and acronyms was necessary to gain 95% coverage for the NOW corpus. However, when it came to the 98% coverage, online newspaper and magazine articles from different countries had relatively distinct lexical demands. In-depth analyses were carried out and the findings offered comprehensive insights into the issue. Implications for teaching and learning were also provided.</p>
</abstract>
<kwd-group>
<kwd>lexical coverage</kwd>
<kwd>vocabulary profile</kwd>
<kwd>BNC</kwd>
<kwd>COCA</kwd>
<kwd>News on the Web</kwd>
</kwd-group>
<counts>
<fig-count count="1"/>
<table-count count="1"/>
<equation-count count="0"/>
<ref-count count="72"/>
<page-count count="10"/>
<word-count count="8131"/>
</counts>
</article-meta>
</front>
<body>
<sec id="S1" sec-type="intro">
<title>Introduction</title>
<p>Newspapers have been a crucial of part people&#x2019;s lives, with 52&#x2013;85% of the people in different countries reading news more than once a day (<xref ref-type="bibr" rid="B4">Cabrera, 2020</xref>). Together with the development of technology, most aspects of our lives have been digitized, and the way we receive our daily news is certainly no exception. Researches have shown that people are giving up traditional, paper-based news and giving their favor to its digital, online counterparts (<xref ref-type="bibr" rid="B4">Cabrera, 2020</xref>; <xref ref-type="bibr" rid="B50">Pew Research Center, 2021a</xref>,<xref ref-type="bibr" rid="B51">b</xref>). In a recent research conducted on American news habit by <xref ref-type="bibr" rid="B56">Shearer (2021)</xref>, 86% of United States adults got their news from a digital device (e.g., smartphone, computer, or tablet). These figures were significantly higher than television (68%), radio (50%) and print publications (32%) (<xref ref-type="bibr" rid="B56">Shearer, 2021</xref>). The main reason for this shift could be due to convenience. Compared to print newspapers or magazines, online publications are more easily accessible, could be read from anywhere and are supported by state-of-the-art technology which offers better reading experience. On top of that, most, if not all, online newspapers are eco- and reader-friendly, that is, they are free to read and do not harm any tree.</p>
<p>Due to the popularity of online newspapers, the ability to read and comprehend this type of publication has been viewed as a critical goal for second language learning, and the use of newspapers has always been emphasized in language teaching (<xref ref-type="bibr" rid="B43">Nation, 2013</xref>; <xref ref-type="bibr" rid="B46">Nation and Macalister, 2021</xref>). In fact, many language proficiency tests, like IELTS, have long incorporated newspaper and magazine articles in their reading component (<xref ref-type="bibr" rid="B39">Moore et al., 2011</xref>, <xref ref-type="bibr" rid="B40">2015</xref>). As a result, it is crucial for English teachers and learners to be informed of the amount of vocabulary necessary to comprehend an online newspaper article, such a knowledge would strongly inform various decisions in goal setting, lesson planning and course book design. The aim of this manuscript is to offer an answer to such question, that is, to determine the number of words required to understand online newspaper or magazine articles.</p>
</sec>
<sec id="S2">
<title>Literature Review</title>
<sec id="S2.SS1">
<title>Vocabulary Demand and Comprehension</title>
<p>Vocabulary is the most important aspect in language and plays a fundamental role in most if not all language abilities or skills (<xref ref-type="bibr" rid="B32">Laufer and Ravenhorst-Kalovski, 2010</xref>; <xref ref-type="bibr" rid="B55">Schmitt et al., 2011</xref>; <xref ref-type="bibr" rid="B60">van Zeeland and Schmitt, 2013</xref>; <xref ref-type="bibr" rid="B6">Cheng and Matthews, 2018</xref>; <xref ref-type="bibr" rid="B28">Lange and Matthews, 2020</xref>; <xref ref-type="bibr" rid="B52">Qian and Lin, 2020</xref>; <xref ref-type="bibr" rid="B19">Ha, 2021b</xref>). In fact, the lexical resource of learners has been proven to be of even greater importance to their comprehension compared to the knowledge of grammatical structures and subject matters (<xref ref-type="bibr" rid="B35">Lewis, 2002</xref>; <xref ref-type="bibr" rid="B2">Barcroft, 2007</xref>; <xref ref-type="bibr" rid="B18">Guo and Roehrig, 2011</xref>; <xref ref-type="bibr" rid="B71">Zhang, 2012</xref>; <xref ref-type="bibr" rid="B72">Zhang and Koda, 2013</xref>). As <xref ref-type="bibr" rid="B69">Wilkins (1972)</xref> pressed, &#x201C;[&#x2026;] while without grammar very little can be conveyed, without vocabulary nothing can be conveyed&#x201D; (pp. 111, 112).</p>
<p>The term &#x201C;lexical demand&#x201D; is now quite familiar in the field of applied linguistics and vocabulary these days. The idea behind the terminology is that a reader need to know a certain proportion of words in a text in order to reasonably comprehend it (<xref ref-type="bibr" rid="B43">Nation, 2013</xref>; <xref ref-type="bibr" rid="B64">Webb and Nation, 2013</xref>). In general, it has been widely accepted that readers or listeners of different text genres need to be familiar with at least 95% and preferably 98% of the running words in a text to gain adequate comprehension (<xref ref-type="bibr" rid="B32">Laufer and Ravenhorst-Kalovski, 2010</xref>; <xref ref-type="bibr" rid="B55">Schmitt et al., 2011</xref>; <xref ref-type="bibr" rid="B30">Laufer, 2013</xref>; <xref ref-type="bibr" rid="B60">van Zeeland and Schmitt, 2013</xref>). Despite a small gap of only 3% coverage, the difference between the two thresholds could be far more significant than some people may think. Simply speaking, with a 95% coverage, readers would encounter an unfamiliar word in every twenty words, but that ratio would be down to 1/50 if they were to know 98% of what they were reading (<xref ref-type="bibr" rid="B25">Hu and Nation, 2000</xref>). In other words, the 98% coverage would reduce more than half of the unknown words in the text which would be encountered if readers only had 95% coverage. In support of that claim, studies on the lexical demands of various text genres pointed out that learners who have vocabulary knowledge at the 95% threshold would have to double or even triple their lexical resources if they wished to gain 98% coverage of the same text genres (<xref ref-type="bibr" rid="B41">Nation, 2006</xref>; <xref ref-type="bibr" rid="B9">Coxhead and Walls, 2012</xref>; <xref ref-type="bibr" rid="B63">Webb and Macalister, 2013</xref>; <xref ref-type="bibr" rid="B12">Dang and Webb, 2014</xref>; <xref ref-type="bibr" rid="B48">Nurmukhamedov, 2017</xref>; <xref ref-type="bibr" rid="B59">Tegge, 2017</xref>). <xref ref-type="bibr" rid="B25">Hu and Nation (2000)</xref> also stated that 98% was the desirable threshold for adequate comprehension while 95% was only the acceptable threshold for minimal comprehension in which some may gain adequate comprehension but most may not.</p>
<p>Research have also shown a close relationship between lexical coverage and language teaching, especially when selecting materials for reading-related activities. According to <xref ref-type="bibr" rid="B42">Nation&#x2019;s (2007)</xref> principles of the four strands, for language-focused or form-focused instructions, it is suggested that learners should know no less than 85% of the words in their reading texts (<xref ref-type="bibr" rid="B55">Schmitt et al., 2011</xref>; <xref ref-type="bibr" rid="B58">Stoeckel et al., 2020</xref>). If the purpose involved supported reading comprehension, a 95% coverage would be demanded (<xref ref-type="bibr" rid="B29">Laufer, 1989</xref>; <xref ref-type="bibr" rid="B55">Schmitt et al., 2011</xref>). For meaning-focused or extensive reading, learners would be required to be familiar with 98% of the tokens in their reading materials (<xref ref-type="bibr" rid="B42">Nation, 2007</xref>; <xref ref-type="bibr" rid="B65">Webb and Nation, 2017</xref>). And for fluency development, a coverage threshold of 100% would be necessary (<xref ref-type="bibr" rid="B42">Nation, 2007</xref>).</p>
</sec>
<sec id="S2.SS2">
<title>Word-Frequency Lists</title>
<p>One of the reasons why findings of lexical profiling researches are of so much interest to linguists is because they are based on word-frequency lists. These lists classify English words into several 1,000-word levels according to how frequent they appear in authentic texts, which offers teachers and learners of English a clear and fast route to their learning goal (<xref ref-type="bibr" rid="B43">Nation, 2013</xref>). The British National Corpus (BNC) lists that contain fourteen 1,000-word levels (<xref ref-type="bibr" rid="B41">Nation, 2006</xref>) and the British National Corpus/Corpus of Contemporary American English (BNC/COCA) lists (<xref ref-type="bibr" rid="B44">Nation, 2017</xref>) that consist of twenty-five 1,000-word levels are typical examples of these wordlists.</p>
<p>Most of these wordlists were built on a word counting unit called &#x201C;word family&#x201D; which refers to a headword and all of its inflectional and derivational forms through a level 6 affix criteria (also known as WF6) (<xref ref-type="bibr" rid="B3">Bauer and Nation, 1993</xref>; <xref ref-type="bibr" rid="B45">Nation, 2020</xref>). The rationale for using WF6 was based on the assumption of learning burden, that was, when a learner knew a family member, he or she could understand or recognize the rest of the family with little or zero effort (<xref ref-type="bibr" rid="B43">Nation, 2013</xref>; <xref ref-type="bibr" rid="B31">Laufer and Cobb, 2020</xref>; <xref ref-type="bibr" rid="B34">Laufer, 2021</xref>; <xref ref-type="bibr" rid="B33">Laufer et al., 2021</xref>). It is worth noting that the WF6 have served as a basis for most aspects of vocabulary researches including assessment (<xref ref-type="bibr" rid="B37">McLean and Kramer, 2015</xref>; <xref ref-type="bibr" rid="B38">McLean et al., 2015</xref>; <xref ref-type="bibr" rid="B68">Webb et al., 2017</xref>; <xref ref-type="bibr" rid="B20">Ha, 2021a</xref>) and other psycholinguistic areas (<xref ref-type="bibr" rid="B32">Laufer and Ravenhorst-Kalovski, 2010</xref>; <xref ref-type="bibr" rid="B28">Lange and Matthews, 2020</xref>; <xref ref-type="bibr" rid="B52">Qian and Lin, 2020</xref>; <xref ref-type="bibr" rid="B19">Ha, 2021b</xref>).</p>
</sec>
<sec id="S2.SS3">
<title>Lexical Demands of Written and Spoken Texts</title>
<p>Over decades, researchers in the field of vocabulary studies have documented a sound lexical profile of various spoken text genres. For example, <xref ref-type="bibr" rid="B66">Webb and Rodgers (2009a</xref>,<xref ref-type="bibr" rid="B67">b)</xref> told us that learners would need to know the most frequent 3,000 and 7,000 word families in the BNC list to understand 95 and 98% of the words in movies and TV programs, respectively. These figures aligned really well with what we would need to comprehend daily conversations (<xref ref-type="bibr" rid="B41">Nation, 2006</xref>). It seemed that the language people used when they were in the stage did not differ much from what they used in their everyday talking. Songs, Soap opera, sitcom and podcast were relatively less demanding as they only required approximately 2,000&#x2013;3,000 word families for 95% coverage, and 5,000&#x2013;7,000 for 98% coverage (<xref ref-type="bibr" rid="B1">Al-Surmi, 2014</xref>; <xref ref-type="bibr" rid="B59">Tegge, 2017</xref>; <xref ref-type="bibr" rid="B49">Nurmukhamedov and Sharakhimov, 2021</xref>). When we decided to take things a little bit more serious and look at spoken discourses in academic contexts, some scholars would be happy to give us the answers. For instance, to understand 95 and 98% of the words in academic lectures and seminars, audience would need to have a lexical resource equivalent to 4,000 and 8,000 word families in the BNC word list, correspondingly (<xref ref-type="bibr" rid="B12">Dang and Webb, 2014</xref>). It was interesting to see that TED talks would share the same lexical demands (<xref ref-type="bibr" rid="B9">Coxhead and Walls, 2012</xref>; <xref ref-type="bibr" rid="B48">Nurmukhamedov, 2017</xref>).</p>
<p>Compared to spoken discourses, the lexical profile of written texts received relatively less attention. In <xref ref-type="bibr" rid="B23">Hsu (2011)</xref> pointed out that 5,000 and 8,000 most frequent word families in <xref ref-type="bibr" rid="B41">Nation&#x2019;s (2006)</xref> BNC word list would account for 95 and 98% of the words in business textbooks and business research articles. Seven years later, <xref ref-type="bibr" rid="B24">Hsu (2018)</xref> examined the lexical coverage of English-written Chinese medicine textbooks and found that 10,000 most frequent word families in <xref ref-type="bibr" rid="B44">Nation&#x2019;s (2017)</xref> BNC/COCA word list would provide 98% coverage for the corpus. In 2013, Webb and Macalister examined the difference in lexical demands between written literatures for native English speakers and learners of English as a second language. Their results showed that, at 95% coverage threshold, only 3,000 and 2,000 most frequent word families in the BNC list was required for L1 and L2 literature, correspondingly, signaling a small difference of only 1,000 word families. However, at the 98% threshold, while L2 literature only required a vocabulary knowledge at the 3,000 level, written texts for L1 learners needed the lexical knowledge at 10,000 level, which was more than triple. In attempts to provide updates on the vocabulary profile of textbooks for English as a foreign language (EFL) learners, researchers have found that the knowledge of 3,000&#x2013;4,000 most frequent word families in <xref ref-type="bibr" rid="B44">Nation&#x2019;s (2017)</xref> BNC/COCA was sufficient to provide 95% coverage, and for learners to understand 98% of the words in those books, a word knowledge at 5,000&#x2013;6,000 levels were required (<xref ref-type="bibr" rid="B70">Yang and Coxhead, 2020</xref>; <xref ref-type="bibr" rid="B53">Rahmat and Coxhead, 2021</xref>).</p>
<p>The most influential manuscript that investigated the lexical demand of written English was undoubtedly <xref ref-type="bibr" rid="B41">Nation&#x2019;s (2006)</xref> study. In his study, <xref ref-type="bibr" rid="B41">Nation (2006)</xref> found that learners would need about 4,000 most frequent word families in the BNC list plus proper nouns to reach 95% coverage in newspapers and novels, and approximately 8,000&#x2013;9,000 word families plus proper nouns to gain 98% coverage. Despite the impact given by his study, those figures demand to be revisited for two reasons. First, this research was carried out approximately 15 years ago using a relatively small corpus (only 440,000 words), and therefore, those findings &#x201C;now need to be checked with larger, more comprehensive corpora&#x201D; (<xref ref-type="bibr" rid="B54">Schmitt et al., 2017</xref>, p. 217). The second reason lies with the methodology <xref ref-type="bibr" rid="B41">Nation (2006)</xref> used for indicating vocabulary size. In his study, <xref ref-type="bibr" rid="B41">Nation (2006)</xref> utilized the BNC wordlist based entirely on British English which &#x201C;may be due for updating and revision&#x201D; (<xref ref-type="bibr" rid="B54">Schmitt et al., 2017</xref>, p. 218). Fortunately, Nation made significant effort in improving his wordlists which eventually resulted in the introduction of BNC/COCA in 2012, which was updated in 2017 (<xref ref-type="bibr" rid="B44">Nation, 2017</xref>). The BNC/COCA is a very powerful wordlist that covers both British and American Englishes and are proven to outperform other wordlists (<xref ref-type="bibr" rid="B10">Dang and Webb, 2016</xref>; <xref ref-type="bibr" rid="B11">Dang et al., 2020</xref>). As <xref ref-type="bibr" rid="B54">Schmitt et al. (2017)</xref> wrote, &#x201C;Assuming the new combined BNC-COCA lists are a better indication of word frequency, then everything that has been done using the original BNC-based lists is ripe for replication using these new lists&#x201D; (p. 218).</p>
</sec>
</sec>
<sec id="S3">
<title>The Present Study</title>
<p>The present study sets out to revisit <xref ref-type="bibr" rid="B41">Nation&#x2019;s (2006)</xref> figures following the two major suggestions put forward by <xref ref-type="bibr" rid="B54">Schmitt et al. (2017)</xref>: increasing sample size and employing up-to-date research methodology.</p>
<p>To re-examine the lexical profile of newspapers from the perspective of a larger sample size, the present study analyzed <xref ref-type="bibr" rid="B14">Davies&#x2019;s (2016/2021)</xref> News on the Web (NOW) corpus, the largest corpus of English newspapers available. Besides the ultra-large sample size, the current study also employed the most comprehensive and up-to-date BNC/COCA wordlist (<xref ref-type="bibr" rid="B44">Nation, 2017</xref>). The word list contains twenty-five 1,000-word levels which reflects current English. In addition, the BNC/COCA is accompanied by four supplementary lists of proper nouns (<italic>Aaron, Greece, Grecian, Greenberry</italic>, and <italic>Waterloo&#x2026;</italic>), marginal words (<italic>hm, huh, er, ah</italic>, and <italic>phew&#x2026;</italic>), transparent compounds (<italic>aftershock, afterword, airbag</italic>, and <italic>powerboat&#x2026;</italic>) and acronyms (<italic>PHD, UFO</italic>, and <italic>UDA&#x2026;</italic>) (<xref ref-type="bibr" rid="B45">Nation, 2020</xref>), which allow more detailed analyses compared to the BNC list which is only accompanied by two supplementary lists of proper nouns and marginal words.</p>
<p>Moreover, the use of the BNC/COCA lists in lexical profiling research also contributes to the methodological shift in the field of vocabulary studies. As the most widely used vocabulary tests have long utilized the BNC/COCA lists as the source for their test items (<xref ref-type="bibr" rid="B37">McLean and Kramer, 2015</xref>; <xref ref-type="bibr" rid="B38">McLean et al., 2015</xref>; <xref ref-type="bibr" rid="B68">Webb et al., 2017</xref>), it would not take long for researches on most aspects of vocabulary knowledge and lexical development to follow. Therefore, it would be methodologically inconsistent to relate the lexical profile of a text based on the BNC lists to a study that reflected learners&#x2019; vocabulary knowledge of the BNC/COCA lists.</p>
<p>The study also responds to <xref ref-type="bibr" rid="B62">Webb&#x2019;s (2021)</xref> call for more attention to the variation in lexical demands. In a recent manuscript, <xref ref-type="bibr" rid="B62">Webb (2021)</xref> expressed his concern that lexical profiling researches only &#x201C;reflect the mean number of word families needed to reach a certain lexical coverage figure&#x201D; and often ignore the fact that &#x201C;each corpus is made up of a large number of texts, and there is likely to be a great deal of variation in the vocabulary of each text&#x201D; (p. 286). It is true that we should not assume the same coverage to be reliably applied on different texts just because they belong to the same text genre, especially for newspapers. Researches have shown that the use of grammar and vocabulary could greatly vary among regions and generations (<xref ref-type="bibr" rid="B5">Chambers, 2000</xref>; <xref ref-type="bibr" rid="B15">Davies and Fuchs, 2015</xref>; <xref ref-type="bibr" rid="B13">Davies, 2021</xref>; <xref ref-type="bibr" rid="B61">Wan and Cowie, 2021</xref>). Therefore, it would be reasonable to hypothesize that the lexical demands of newspaper and magazine articles from different countries and periods of time bear certain degrees of distinction. The NOW corpus comprises data from online newspapers and magazines collected from twenty countries over a period of 11 years. The analysis of such corpus not only provides reliable figures on the lexical demand of newspapers, but also offers deep insights into the variation of the vocabulary knowledge required to comprehend English newspapers written in different countries and years.</p>
<p>In particular, the study seeks to answer the following questions:</p>
<list list-type="simple">
<list-item>
<label>(1)</label>
<p><italic>How many words do English learners need to gain 95 and 98% coverage of online newspapers?</italic></p>
</list-item>
<list-item>
<label>(2)</label>
<p><italic>Does the lexical profile of online newspapers and magazines vary over time and across countries?</italic></p>
</list-item>
</list>
</sec>
<sec id="S4">
<title>Methodology</title>
<sec id="S4.SS1">
<title>Data Collection</title>
<p>The present study analyzed data from The NOW corpus (<xref ref-type="bibr" rid="B14">Davies, 2016/2021</xref>), the corpus is available for purchase on Mark Davies&#x2019;s website.<sup><xref ref-type="fn" rid="footnote1">1</xref></sup> The NOW corpus contains data from articles on web-based newspapers and magazines collected from twenty different countries. The corpus has been continuously updated from 2010 to the present time and grows approximately 200 million words per month, which is equivalent to three or four hundred thousand articles. At the time of data collection, May 2021, the NOW corpus contains approximately 12.5 billion words of data. The full-text data of the corpus was purchased by the researcher and license for academic use was appropriately obtained.</p>
</sec>
<sec id="S4.SS2">
<title>Data Preparation</title>
<p>A preliminary analysis was carried out for the NOW corpus by a lexical profiler software (<xref ref-type="bibr" rid="B22">Heatley et al., 2002</xref>). After that, two major adjustments were made to the corpus. Firstly, words that were falsely classified as &#x201C;Not in the lists&#x201D; due to spelling errors or typos were corrected and returned to their frequency levels. Secondly, since the lexical profiler software cannot read hyphenated words (e.g., <italic>full-time, second-hand, money-driven, customer-focus</italic>, etc.), hyphens in the texts were replaced by spaces so that the words that made up hyphenated items could be classified in accordance with their frequency (e.g., <italic>second, hand, money, customer, focus</italic>, etc.). These processes were done using the mass replace (or Ctrl + Shift + F) function of Notepad++.</p>
</sec>
<sec id="S4.SS3">
<title>Data Analysis</title>
<p>The RANGE program (<xref ref-type="bibr" rid="B22">Heatley et al., 2002</xref>) was used for data analysis. RANGE classifies all the words in a text to their frequency levels according and the number of times they were used. The &#x201C;frequency&#x201D; that RANGE would base its lexical analysis on depends on the wordlist they are being used with. In other words, RANGE allows us to know exactly the number of words at each level in a wordlist, which would later facilitate various conclusions and predictions. Currently, there are three wordlists that can be used with RANGE: The General Service List/Academic Word List (GSL/AWL) which include 2,570 word families, the BNC wordlist consisted of fourteen 1,000-word levels plus two levels of proper nouns and marginal words, and the BNC/COCA wordlist which contains twenty-five lists of word families from the 1,000 to 25,000 levels plus four additional lists of proper nouns, marginal words, transparent compounds and acronyms.</p>
<p>The current study utilizes the BNC/COCA word list (<xref ref-type="bibr" rid="B44">Nation, 2017</xref>). RANGE is available at: <ext-link ext-link-type="uri" xlink:href="https://www.wgtn.ac.nz/lals/resources/paul-nations-resources/vocabulary-analysis-programs">https://www.wgtn.ac.nz/lals/resources/paul-nations-resources/vocabulary-analysis-programs</ext-link>. The RANGE program automatically read and recognized contractions (can&#x2019;t, don&#x2019;t&#x2026;) and connected speech (wanna, gonna, and kinda&#x2026;). For instance, RANGE counted the word <italic>don&#x2019;t</italic> as two separated words of <italic>do</italic> and <italic>not</italic> and <italic>wanna</italic> as a family member of <italic>want.</italic></p>
</sec>
</sec>
<sec id="S5" sec-type="results">
<title>Results</title>
<p>The second and third columns of <xref ref-type="table" rid="T1">Table 1</xref> present the coverage of each word level for the NOW corpus. The most frequent 1,000 word families in the BNC/COCA wordlist accounted for the greatest proportion of tokens, 72.48%. The coverage then dropped significantly to 10.27% at the second 1,000-word level. After the 2,000 word families level, the number of tokens as well as its coverage gradually decreased as the word frequency went down. Lower-frequency levels from the 5,000 level onward only accounted for less than 1% of the running words in the corpus, which generally highlighted the importance of high-frequency words to reading comprehension.</p>
<table-wrap position="float" id="T1">
<label>TABLE 1</label>
<caption><p>The proportions of tokens each word level and the cumulative coverage with and without proper nouns, marginal words, transparent compounds, and acronyms for the News on the Web (NOW) corpus.</p></caption>
<table cellspacing="5" cellpadding="5" frame="hsides" rules="groups">
<thead>
<tr>
<td valign="top" align="left">Word list</td>
<td valign="top" align="center">Tokens</td>
<td valign="top" align="center">Coverage at each level (%)</td>
<td valign="top" align="center">Cumulative coverage without PN, MW, TC, and acronym (%)</td>
<td valign="top" align="center">Cumulative coverage with PN, MW, TC, and acronym (%)</td>
</tr>
</thead>
<tbody>
<tr>
<td valign="top" align="left">1,000</td>
<td valign="top" align="center">8,616,263,239</td>
<td valign="top" align="center">72.48</td>
<td valign="top" align="center">72.48</td>
<td valign="top" align="center">77.87</td>
</tr>
<tr>
<td valign="top" align="left">2,000</td>
<td valign="top" align="center">1,221,457,304</td>
<td valign="top" align="center">10.27</td>
<td valign="top" align="center">82.75</td>
<td valign="top" align="center">88.15</td>
</tr>
<tr>
<td valign="top" align="left">3,000</td>
<td valign="top" align="center">739,364,222</td>
<td valign="top" align="center">6.22</td>
<td valign="top" align="center">88.97</td>
<td valign="top" align="center">94.37</td>
</tr>
<tr>
<td valign="top" align="left">4,000</td>
<td valign="top" align="center">206,631,853</td>
<td valign="top" align="center">1.74</td>
<td valign="top" align="center">90.71</td>
<td valign="top" align="center">96.11<xref ref-type="table-fn" rid="t1fns1"><sup>a</sup></xref></td>
</tr>
<tr>
<td valign="top" align="left">5,000</td>
<td valign="top" align="center">115,906,531</td>
<td valign="top" align="center">0.98</td>
<td valign="top" align="center">91.69</td>
<td valign="top" align="center">97.08</td>
</tr>
<tr>
<td valign="top" align="left">6,000</td>
<td valign="top" align="center">79,362,050</td>
<td valign="top" align="center">0.67</td>
<td valign="top" align="center">92.35</td>
<td valign="top" align="center">97.75</td>
</tr>
<tr>
<td valign="top" align="left">7,000</td>
<td valign="top" align="center">52,221,650</td>
<td valign="top" align="center">0.44</td>
<td valign="top" align="center">92.79</td>
<td valign="top" align="center">98.19<xref ref-type="table-fn" rid="t1fns1"><sup>b</sup></xref></td>
</tr>
<tr>
<td valign="top" align="left">8,000</td>
<td valign="top" align="center">46,147,921</td>
<td valign="top" align="center">0.39</td>
<td valign="top" align="center">93.18</td>
<td valign="top" align="center">98.58</td>
</tr>
<tr>
<td valign="top" align="left">9,000</td>
<td valign="top" align="center">29,320,140</td>
<td valign="top" align="center">0.25</td>
<td valign="top" align="center">93.43</td>
<td valign="top" align="center">98.82</td>
</tr>
<tr>
<td valign="top" align="left">10,000</td>
<td valign="top" align="center">19,053,027</td>
<td valign="top" align="center">0.16</td>
<td valign="top" align="center">93.59</td>
<td valign="top" align="center">98.98</td>
</tr>
<tr>
<td valign="top" align="left">11,000</td>
<td valign="top" align="center">16,225,949</td>
<td valign="top" align="center">0.14</td>
<td valign="top" align="center">93.73</td>
<td valign="top" align="center">99.12</td>
</tr>
<tr>
<td valign="top" align="left">12,000</td>
<td valign="top" align="center">12,531,144</td>
<td valign="top" align="center">0.11</td>
<td valign="top" align="center">93.83</td>
<td valign="top" align="center">99.23</td>
</tr>
<tr>
<td valign="top" align="left">13,000</td>
<td valign="top" align="center">8,788,706</td>
<td valign="top" align="center">0.07</td>
<td valign="top" align="center">93.91</td>
<td valign="top" align="center">99.30</td>
</tr>
<tr>
<td valign="top" align="left">14,000</td>
<td valign="top" align="center">8,608,025</td>
<td valign="top" align="center">0.07</td>
<td valign="top" align="center">93.98</td>
<td valign="top" align="center">99.37</td>
</tr>
<tr>
<td valign="top" align="left">15,000</td>
<td valign="top" align="center">4,628,179</td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">94.02</td>
<td valign="top" align="center">99.41</td>
</tr>
<tr>
<td valign="top" align="left">16,000</td>
<td valign="top" align="center">4,451,764</td>
<td valign="top" align="center">0.04</td>
<td valign="top" align="center">94.05</td>
<td valign="top" align="center">99.45</td>
</tr>
<tr>
<td valign="top" align="left">17,000</td>
<td valign="top" align="center">3,129,096</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">94.08</td>
<td valign="top" align="center">99.47</td>
</tr>
<tr>
<td valign="top" align="left">18,000</td>
<td valign="top" align="center">3,778,722</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">94.11</td>
<td valign="top" align="center">99.51</td>
</tr>
<tr>
<td valign="top" align="left">19,000</td>
<td valign="top" align="center">2,329,532</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">94.13</td>
<td valign="top" align="center">99.53</td>
</tr>
<tr>
<td valign="top" align="left">20,000</td>
<td valign="top" align="center">2,432,275</td>
<td valign="top" align="center">0.02</td>
<td valign="top" align="center">94.15</td>
<td valign="top" align="center">99.55</td>
</tr>
<tr>
<td valign="top" align="left">21,000</td>
<td valign="top" align="center">1,674,960</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">94.17</td>
<td valign="top" align="center">99.56</td>
</tr>
<tr>
<td valign="top" align="left">22,000</td>
<td valign="top" align="center">1,248,712</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">94.18</td>
<td valign="top" align="center">99.57</td>
</tr>
<tr>
<td valign="top" align="left">23,000</td>
<td valign="top" align="center">3,314,878</td>
<td valign="top" align="center">0.03</td>
<td valign="top" align="center">94.20</td>
<td valign="top" align="center">99.60</td>
</tr>
<tr>
<td valign="top" align="left">24,000</td>
<td valign="top" align="center">1,613,907</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">94.22</td>
<td valign="top" align="center">99.61</td>
</tr>
<tr>
<td valign="top" align="left">25,000</td>
<td valign="top" align="center">1,660,694</td>
<td valign="top" align="center">0.01</td>
<td valign="top" align="center">94.23</td>
<td valign="top" align="center">99.63</td>
</tr>
<tr>
<td valign="top" align="left">Proper nouns</td>
<td valign="top" align="center">465,033,862</td>
<td valign="top" align="center">3.91</td>
<td valign="top" align="center">3.91</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">Marginal words</td>
<td valign="top" align="center">59,592,391</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">0.50</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">Transparent compounds</td>
<td valign="top" align="center">69,863,500</td>
<td valign="top" align="center">0.59</td>
<td valign="top" align="center">0.59</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">Acronyms</td>
<td valign="top" align="center">46,806,237</td>
<td valign="top" align="center">0.39</td>
<td valign="top" align="center">0.39</td>
<td valign="top" align="center">&#x2013;</td>
</tr>
<tr>
<td valign="top" align="left">Not in the lists</td>
<td valign="top" align="center">44,385,236</td>
<td valign="top" align="center">0.37</td>
<td valign="top" align="center">0.37</td>
<td valign="top" align="center">100</td>
</tr>
<tr>
<td valign="top" align="left">Total</td>
<td valign="top" align="center" colspan="4">11,887,825,705</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="t1fns1"><p><italic><sup>a</sup>Reaching 95% coverage; <sup>b</sup>Reaching 95% coverage.</italic></p></fn>
</table-wrap-foot>
</table-wrap>
<p>Another worth noting detail was the proportion of proper nouns, marginal words, transparent compounds and acronyms in the corpus. Proper nouns alone were found to make up of 3.91% of the tokens in the NOW corpus. The combined coverage of PN, MW, TC, and acronyms made up of 5.39% of the running words, which was close to the coverage provided by the third most frequent 3,000 word families in the BNC/COCA word list. These figures demonstrate the relative importance of being able to recognize and understand proper nouns as well as marginal words, transparent compounds and acronym.</p>
<p>The last two columns of <xref ref-type="table" rid="T1">Table 1</xref> show the vocabulary knowledge needed to reach 95 and 98% coverage of online newspapers. The results from the analyses displayed two hypothesized scenarios: one assumed that all proper nouns, marginal words, transparent compounds and acronyms were easily understood or recognized, and one supposed that they were not. Since more than 5% of the tokens accounted by the four supplementary lists, it was more than certain that understanding 95% of the running words in online newspapers with the sheer knowledge of the twenty-five thousand word families in <xref ref-type="bibr" rid="B44">Nation&#x2019;s (2017)</xref> BNC/COCA lists was impossible. However, if proper nouns, marginal words, transparent compounds and acronyms were assumed to be known, then the knowledge of 4,000 and 7,000 most frequent word families were necessary to achieve 95 and 98% coverage, respectively.</p>
<p>However, when looking at the NOW corpus from another angle, we could easily realize that the corpus was made up of newspaper and magazine articles from twenty different countries over a period of 11 years. Therefore, it may not be appropriate to judge the corpus&#x2019;s lexical demand by its twelve billion tokens in combination, and the lexical profile of the corpus deserves a deeper investigation into its variation. <xref ref-type="supplementary-material" rid="TS1">Supplementary Appendix</xref> offers data for such analysis.</p>
<p><xref ref-type="supplementary-material" rid="TS1">Supplementary Appendix</xref> provides data about the cumulative coverage of each sub-corpora including proper nouns, marginal words, transparent compounds and acronyms. Results from the analyses of the sub-corpora yielded interesting findings. <xref ref-type="fig" rid="F1">Figure 1</xref> is a graphic representation of <xref ref-type="supplementary-material" rid="TS1">Supplementary Appendix</xref> and offers visual support for the amount of words needed to achieve 85, 95, and 98% coverage for online newspapers in different countries. Due to space limitation, <xref ref-type="fig" rid="F1">Figure 1</xref> can only demonstrate a rough summary of the vocabulary demands of newspaper and magazine articles in different nations. Several &#x201C;middle&#x201D; numbers like 3,500 or 6,500 could be spotted in <xref ref-type="fig" rid="F1">Figure 1</xref>, which did not necessarily mean that 3,500 or 6,500 word families were needed to understand 95 or 98% of the words in newspapers in certain countries. In fact, these &#x201C;middle&#x201D; figures signaled that a vocabulary knowledge of 3,000&#x2013;4,000 or 6,000&#x2013;7,000 word families was required for these coverage thresholds. This was due to the variation of vocabulary demands between different years or even between periods of time. Take Hong Kong as an example, although figures from 2012 to 2021 suggested that the 4,000 most frequent word families in the BNC/COCA lists were needed for 95% coverage, data from 2010 to 2011 showed us that it only took 3,000 word families to reach the same threshold.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>The amount of vocabulary needed to achieve 85, 95, and 98% coverage for online newspapers.</p></caption>
<graphic mimetype="image" mime-subtype="tiff" xlink:href="fpsyg-13-800983-g001.tif"/>
</fig>
<p>It could be observed that 2,000 most frequent word families in the BNC/COCA wordlist covered 85&#x2013;90% of the running words in online newspaper and magazine articles from all 20 countries. This highlighted the feasibility of using web-based newspapers as reading materials in English classes as well as the value of the BNC/COCA 2,000 to English learners. Although a one-size-fits-all threshold for 95 and 98% coverage could be said to be fictional, the data suggested that the vocabulary knowledge of the most frequent 3,000&#x2013;4,000 word families in the BNC/COCA word list was necessary to gain 95% coverage. However, things became more complicated when it came to the 98% coverage. For some countries like Jamaica, New Zealand and Canada, the vocabulary knowledge at 6,000 level was enough to provide 98% coverage of web-based newspapers and magazines. Online newspaper and magazine articles from countries including Australia, Hong Kong, Ireland, Tanzania, and the United Kingdom seemed to be a little more demanding and required 6,000&#x2013;7,000 for optimal comprehension. Online newspapers and magazines in the United States, South Africa, Singapore, Pakistan, Nigeria, Kenya, and Ghana needed the knowledge of the most frequent 7,000 word families to comfortably understand. For Malaysia and Sri Lanka, the required vocabulary knowledge for the 98% threshold was at 7,000&#x2013;8,000 levels. Online newspapers and magazines written by publishers based in India and Philippines required readers to have a vocabulary knowledge at 8,000 level for optimal comprehension.</p>
<p>Bangladesh&#x2019; was the most unique country which required a range of vocabulary knowledge from 8,000 to 10,000 word families for 98% coverage. It was also worth noting that online newspapers and magazines in Bangladesh only needed a knowledge at 4,000 level for 95% coverage. The reason for the difference between the lexical demands for 95 and 98% coverage of Bangladesh&#x2019;s newspapers might lie with the substantial proportion of words at the 13,000 level. Normally, lexical coverage at the 10,000 level and higher dropped below 0.2 or even 0.1 for most country sub-corpora in the NOW corpus. The same also went for the Bangladesh sub-corpus when the tenth, eleventh and twelfth 1,000-word level only represented the coverage of less than 0.2%. However, the coverage of the 13,000 level went up to 0.45&#x2013;0.68% for the Bangladesh sub-corpora, which were relatively high for such a low-frequency word-level.</p>
</sec>
<sec id="S6" sec-type="discussion">
<title>Discussion</title>
<p>In answer to the first research questions, the vocabulary knowledge of the most frequent 4,000 families in the BNC/COCA list plus proper nouns, marginal words, transparent compounds and acronyms would reliably provide 95% coverage of the NOW corpus. This means that if 95% coverage was assumed to be sufficient for reasonable comprehension, then learning the most frequent 4,000 word families would be the ultimate learning goal for English learners in ESL and EFL contexts. As <xref ref-type="bibr" rid="B32">Laufer and Ravenhorst-Kalovski (2010)</xref> and <xref ref-type="bibr" rid="B55">Schmitt et al. (2011)</xref> suggested, 95% coverage of a text would result in acceptable comprehension, however, they also pointed out that the degree of comprehension at the 95% threshold is not really reliable and that 98% was truly the threshold for unsupported comprehension. If 98% were supposed to be the necessary threshold for text comprehension, then a one-size-fits-all answer would be nearly impossible to give. Although <xref ref-type="table" rid="T1">Table 1</xref> showed that 7,000 most frequent word families in the BNC/COCA lists would be sufficient to provide 98% coverage for online newspapers and magazines, <xref ref-type="fig" rid="F1">Figure 1</xref> demonstrated that it was certainly not the case.</p>
<p>Still, if we were to take a broad view to the NOW country sub-corpora and consider Bangladesh as a special case, we could generally conclude that 4,000 and 8,000 most frequent word families in the BNC/COCA lists would reliably provide 95 and 98% coverage of the articles from online newspapers and magazines, which could be a rough answer to research question 1. The findings aligned really well with what Nation found in 2006 with the BNC wordlist. However, it is also worth noting that the 3,000-word level in some cases proved itself to be able to represent 95% coverage for online newspapers and magazines. Most importantly, a considerable proportion of data from the sub-corpora showed that the most frequent 6,000 word families can be a feasible learning goal to rely on. It is obvious that for some countries such as New Zealand, the readers&#x2019; vocabulary knowledge only needed to be at the 6,000 level to read online news and magazines for unsupported comprehension.</p>
<p>It is also interesting to compare the results to other researches that also employed the BNC/COCA lists. Specifically, online newspapers and magazines were found to share relatively similar lexical demands with English textbooks for EFL learners (<xref ref-type="bibr" rid="B70">Yang and Coxhead, 2020</xref>; <xref ref-type="bibr" rid="B53">Rahmat and Coxhead, 2021</xref>) and reading passages in international tests of English proficiency like TOEFL, IELTS, TOIEC, etc. (<xref ref-type="bibr" rid="B27">Kaneko, 2020</xref>). This proves that online newspapers and magazines could be a great source for English learners who are preparing for their IELTS or TOEFL tests. However, compared to academic books written in English (<xref ref-type="bibr" rid="B24">Hsu, 2018</xref>; <xref ref-type="bibr" rid="B36">Lu and Coxhead, 2020</xref>), newspapers were shown to be less lexically demanding, which is normal due to the nature of general and academic English.</p>
<p>Data from <xref ref-type="fig" rid="F1">Figure 1</xref> as well as <xref ref-type="supplementary-material" rid="TS1">Supplementary Appendix</xref>, demonstrated a &#x201C;yes&#x201D; answer to the second research question. In fact, it was really interesting to see that the most lexically demanding newspapers and magazines came from countries where English was a second or even foreign language like Malaysia, Sri Lanka, Philippines, India, and Bangladesh. On the other hand, in countries where people spoke English as a first language or had native-like English language proficiency such as Canada, Australia, New Zealand, Ireland, United States, United Kingdom, Singapore, and Hong Kong, online newspapers and magazines written in English seemed to be easier to read and understand. Certain explanations could be given to this phenomenon, one of which was the components of the BNC/COCA wordlist. As its name suggested, the corpora that were used to create <xref ref-type="bibr" rid="B44">Nation&#x2019;s (2017)</xref> BNC/COCA frequency lists contained spoken and written texts primarily collected from American and British contexts. As a result, the BNC/COCA lists may have aligned better with the written texts from countries that have been heavily influenced by American and British Englishes. In other words, newspapers and magazines articles that had similar wording patterns to those of American or British written English showed better lexical coverage compared to other countries that had different wording patterns.</p>
<p>The findings would be even more interesting if we were to consider word frequency as an indicator of text difficulty. As <xref ref-type="bibr" rid="B21">Hashimoto (2021)</xref> and <xref ref-type="bibr" rid="B57">Stewart et al. (2021)</xref> discussed, there was a really strong relationship between word frequency rank and word difficulty. Therefore, it could be said that English newspapers from certain countries may pose greater or lesser challenges to certain English learners. International students and immigrants that have been studying British or American English may find these findings interesting since being able to understand local news could be a great way to establish a sense of connection and belonging to the local communities and networks of a country (<xref ref-type="bibr" rid="B26">Juang et al., 2018</xref>).</p>
<p>The study&#x2019;s results would also of help for English teachers around the world, especially those who are thinking of using English newspapers in their own countries as teaching materials. In fact, articles from online newspapers and magazines could be a great source for English language teaching as they provide up-to-date and interesting information while offering a rich linguistic resources. Using newspapers as reading material could easily trigger learners&#x2019; interest and facilitate discussions, especially when they are about hot issues in the country or around the world. Teachers and course book writers would have different criteria when selecting teaching materials. But generally speaking, input resources selected for language learning should be lexically less challenging than what&#x2019;s in the real world. For example, <xref ref-type="bibr" rid="B59">Tegge (2017)</xref> pointed out that songs selected by teachers were 1,000&#x2013;2,000 word families less demanding than other songs on billboard chart. <xref ref-type="bibr" rid="B7">Collins&#x2019;s (2017)</xref> study also indicated that reading passages in EFL textbooks were significantly easier to read than those appeared in standardized tests of English proficiency. If the lexical demands of the input texts have become the number one concern for lesson and material design, then the most obvious and maybe best practice would be to actively choose articles from English speaking countries including Canada, New Zealand, Australia, and the United Kingdom. Newspapers and magazines from ESL contexts like Hong Kong, Ireland, Jamaica and Tanzania could also be put into consideration when choosing reading materials since their articles showed relatively low lexical demands.</p>
<p>It is also noteworthy that even the lowest figures of lexical demands suggested that a word knowledge at 3,000 level was needed to read online newspaper and magazine articles without having to depend too much on dictionaries. Therefore, it is suggestive that the use of authentic articles from online newspapers and magazines in language courses could only be feasible for upper-intermediate or advanced learners. Language teachers should also make sure that their learners know the 2,000 most frequent word families in the BNC/COCA word list, the knowledge threshold where more than 85% coverage could be guaranteed. This could be done by using vocabulary tests that employed the BNC/COCA lists as the source for test items such as the New Vocabulary Levels Test (<xref ref-type="bibr" rid="B37">McLean and Kramer, 2015</xref>) the Listening Vocabulary Levels Test (<xref ref-type="bibr" rid="B38">McLean et al., 2015</xref>; <xref ref-type="bibr" rid="B20">Ha, 2021a</xref>) and <xref ref-type="bibr" rid="B68">Webb et al.&#x2019;s (2017)</xref> Updated Vocabulary Levels Test. The vocabulary knowledge at 2,000 level generally ensures that learners could at least work with the material, of course with the support from teachers and/or more capable peers.</p>
<p>English teachers of advanced classes may use up-to-date newspaper articles as in-class reading activities where learners together read an interesting article and then discuss it. Language instructors can also assign learners to pick articles of their interest that reflect current situations around the world to read extensively, and then discuss what they have read with their peers when they come back to the class. Such practices might be especially suitable for English teachers of immigrants and refugees who would be in dire need of both the language and the updated information of the countries where they were currently based. However, it might be somewhat unrealistic to expect most immigrants and refugees to have knowledge of the most frequent 2,000 word families in the BNC/COCA word list.</p>
</sec>
<sec id="S7">
<title>Limitations</title>
<p>The present study bears certain limitations that need to be addressed. As the study employed lexical profiler program accompanied by designed wordlists as the primary research methodology, it was unavoidably affected by the limitations of such approach (<xref ref-type="bibr" rid="B47">Nation and Webb, 2011</xref>).</p>
<p>The first limitation was the inability of lexical profiler programs such as RANGE to identify homographs [e.g., <italic>proceeds</italic> (meaning <italic>continues</italic>) and <italic>proceeds</italic> (meaning <italic>profits</italic>)]. Such a constraint may also have affected the classification of proper nouns since certain proper nouns bear the same spelling as other words (<italic>Gates, Walkers, Bush</italic>, etc.), leading to the difficult situation where manually adding these words to the list of proper nouns would cause severe conflicts in processing, and leaving them alone would result in these words being ranked as high-frequency words.</p>
<p>Second, lexical profiler programs such as RANGE could not count multiword items as single items. This was, in my opinion, one of the most serious flaws which most lexical profiling research that utilized the same research methodology have been suffering. As the classification of RANGE and other programs like AntWordProfiler was guided by word lists that contained primarily single-item words, they would read phrasal verbs and idioms such as <italic>out of the blue, out of the box, sleep on it, come across, sit up&#x2026;</italic> separately and rank the components words of these phrases according to their designed frequency levels. Although most the component words of phrasal verbs and idioms are high-frequency verbs, understanding every single item in such phrases could not guarantee the comprehension of the phrase as a whole (<xref ref-type="bibr" rid="B8">Cornell, 1985</xref>; <xref ref-type="bibr" rid="B16">Gardner and Davies, 2007</xref>; <xref ref-type="bibr" rid="B17">Garnier and Schmitt, 2015</xref>).</p>
<p>The third point that deserves attention concerned how transparent and hyphenated compounds were treated. The present study adopted two assumptions that have been widely applied in vocabulary profiling research (<xref ref-type="bibr" rid="B12">Dang and Webb, 2014</xref>; <xref ref-type="bibr" rid="B48">Nurmukhamedov, 2017</xref>; <xref ref-type="bibr" rid="B59">Tegge, 2017</xref>; <xref ref-type="bibr" rid="B70">Yang and Coxhead, 2020</xref>; <xref ref-type="bibr" rid="B49">Nurmukhamedov and Sharakhimov, 2021</xref>; <xref ref-type="bibr" rid="B53">Rahmat and Coxhead, 2021</xref>) that transparent and hyphenated compounds could be easily understood by knowing the meaning of their component words, which could be problematic to certain extents. For example, whether or not transparent compounds such as <italic>aftershock</italic>, <italic>afterglow</italic> or <italic>absentminded</italic> could be understood by the sheer knowledge of <italic>after, shock, glow</italic>, and <italic>mind</italic> was actually a myth. Similarly, assuming a learner could understand <italic>sale-driven</italic> based on his or her knowledge of <italic>driven</italic> in the sentence: &#x201C;<italic>The car is driven by Jack.&#x201D;</italic> could be also be a questionable practice.</p>
<p>Besides the limitations concerning research methodology, there was another area that the present study could not address. Although the manuscript showed strong variations in the lexical demands between different countries, it only provided statistical arguments and did not address the issue from a socio-linguistic perspective. Therefore, future studies are encouraged to explore the link between cultural and social-economic factors of countries and their publications&#x2019; lexical profile.</p>
</sec>
<sec id="S8" sec-type="conclusion">
<title>Conclusion</title>
<p>The present study offers insights into the vocabulary load of the most popular sources of written English that people read every day. Its findings indicate that knowledge of the most frequent 3,000&#x2013;4,000 word families plus proper nouns, marginal words, transparent compounds and acronyms could provide 95% coverage of the articles in online newspapers and magazines, which might be a degree of coverage for adequate comprehension and incidental vocabulary learning.</p>
<p>The study confirms <xref ref-type="bibr" rid="B41">Nation&#x2019;s (2006)</xref> findings but emphasizes that coverage of newspaper and magazine articles varies greatly between countries, and that articles from English speaking countries are less lexically demanding than those in ESL and EFL contexts. The results also suggest that web-based newspapers and magazines could be good resources for language teaching and learning. However, it is advised that teachers and learners need to be very selective when choosing what to read as articles from certain countries were shown to be relatively more difficult to understand than others.</p>
</sec>
<sec id="S9" sec-type="data-availability">
<title>Data Availability Statement</title>
<p>The data analyzed in this study is subject to the following licenses/restrictions: the corpora that support the findings of this study are available from Mark Davies. Restrictions apply to the availability of these corpora, which were used under academic license for this study. Data are available from <ext-link ext-link-type="uri" xlink:href="https://www.english-corpora.org/">https://www.english-corpora.org/</ext-link> with the permission of Mark Davies. Requests to access these datasets should be directed to <email>mark.davies@corpusdata.org</email>.</p>
</sec>
<sec id="S10">
<title>Author Contributions</title>
<p>The author confirms sole responsibility for study conception and design, data collection, analysis and interpretation of results, and manuscript preparation.</p>
</sec>
<sec id="conf1" sec-type="COI-statement">
<title>Conflict of Interest</title>
<p>The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec id="pudiscl1" sec-type="disclaimer">
<title>Publisher&#x2019;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
<back>
<sec id="S11" sec-type="supplementary-material">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fpsyg.2022.800983/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fpsyg.2022.800983/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Table_1.docx" id="TS1" mimetype="application/vnd.openxmlformats-officedocument.wordprocessingml.document" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Al-Surmi</surname> <given-names>M.</given-names></name></person-group> (<year>2014</year>). &#x201C;<article-title>TV shows, word coverage, and incidental vocabulary learning</article-title>,&#x201D; in <source><italic>Teaching and Learning English in the Arabic-Speaking World</italic></source>, <role>eds</role> <person-group person-group-type="editor"><name><surname>Bailey</surname> <given-names>K.</given-names></name> <name><surname>Damerow</surname> <given-names>R.</given-names></name></person-group> (<publisher-loc>London</publisher-loc>: <publisher-name>Routledge</publisher-name>), <fpage>132</fpage>&#x2013;<lpage>147</lpage>.</citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Barcroft</surname> <given-names>J.</given-names></name></person-group> (<year>2007</year>). <article-title>When knowing grammar depends on knowing vocabulary: native speaker grammaticality judgements of sentences with real and unreal words.</article-title> <source><italic>Can. Modern Lang. Rev.</italic></source> <volume>63</volume> <fpage>313</fpage>&#x2013;<lpage>343</lpage>. <pub-id pub-id-type="doi">10.3138/R601-H212-5582-0737</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bauer</surname> <given-names>L.</given-names></name> <name><surname>Nation</surname> <given-names>P.</given-names></name></person-group> (<year>1993</year>). <article-title>Word families.</article-title> <source><italic>Int. J. Lexicogr.</italic></source> <volume>6</volume> <fpage>253</fpage>&#x2013;<lpage>279</lpage>. <pub-id pub-id-type="doi">10.1093/ijl/6.4.253</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cabrera</surname> <given-names>I.</given-names></name></person-group> (<year>2020</year>). <source><italic>World Reading Habits in 2020 [Infographic]. GlobalEnglishEditing.</italic></source> Available online at: <ext-link ext-link-type="uri" xlink:href="https://geediting.com/world-reading-habits-2020/">https://geediting.com/world-reading-habits-2020/</ext-link> <comment>(accessed November 6, 2020)</comment>.</citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Chambers</surname> <given-names>J. K.</given-names></name></person-group> (<year>2000</year>). <article-title>Region and language variation.</article-title> <source><italic>Engl. World Wide</italic></source> <volume>21</volume> <fpage>169</fpage>&#x2013;<lpage>199</lpage>. <pub-id pub-id-type="doi">10.1075/eww.21.2.02cha</pub-id> <pub-id pub-id-type="pmid">33486653</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cheng</surname> <given-names>J.</given-names></name> <name><surname>Matthews</surname> <given-names>J.</given-names></name></person-group> (<year>2018</year>). <article-title>The relationship between three measures of L2 vocabulary knowledge and L2 listening and reading.</article-title> <source><italic>Lang. Test.</italic></source> <volume>35</volume> <fpage>3</fpage>&#x2013;<lpage>25</lpage>. <pub-id pub-id-type="doi">10.1177/0265532216676851</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Collins</surname> <given-names>J.</given-names></name></person-group> (<year>2017</year>). <article-title>Applying the lexical coverage hypothesis to establish the suitability of EFL reading materials: a case study of the TOEFL (ITP).</article-title> <source><italic>APU J. Lang. Res.</italic></source> <volume>3</volume> <fpage>29</fpage>&#x2013;<lpage>39</lpage>.</citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cornell</surname> <given-names>A.</given-names></name></person-group> (<year>1985</year>). <article-title>Realistic goals in teaching and learning phrasal verbs.</article-title> <source><italic>Int. Rev. Appl. Linguist. Lang. Teach.</italic></source> <volume>23</volume> <fpage>269</fpage>&#x2013;<lpage>280</lpage>.</citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coxhead</surname> <given-names>A.</given-names></name> <name><surname>Walls</surname> <given-names>R.</given-names></name></person-group> (<year>2012</year>). <article-title>TED Talks, vocabulary, and listening for EAP.</article-title> <source><italic>TESOL ANZ J.</italic></source> <volume>20</volume> <fpage>55</fpage>&#x2013;<lpage>65</lpage>.</citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dang</surname> <given-names>T. N. Y.</given-names></name> <name><surname>Webb</surname> <given-names>S.</given-names></name></person-group> (<year>2016</year>). <article-title>Evaluating lists of high-frequency words.</article-title> <source><italic>ITL Int. J. Appl. Linguist.</italic></source> <volume>167</volume> <fpage>132</fpage>&#x2013;<lpage>158</lpage>.</citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dang</surname> <given-names>T. N. Y.</given-names></name> <name><surname>Webb</surname> <given-names>S.</given-names></name> <name><surname>Coxhead</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>Evaluating lists of high-frequency words: teachers&#x2019; and learners&#x2019; perspectives.</article-title> <source><italic>Lang. Teach. Res.</italic></source> <fpage>1</fpage>&#x2013;<lpage>25</lpage>. <pub-id pub-id-type="doi">10.1177/1362168820911189</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dang</surname> <given-names>T.</given-names></name> <name><surname>Webb</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>The lexical profile of academic spoken English.</article-title> <source><italic>Engl. Specif. Purp.</italic></source> <volume>33</volume> <fpage>66</fpage>&#x2013;<lpage>76</lpage>. <pub-id pub-id-type="doi">10.1016/j.esp.2013.08.001</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davies</surname> <given-names>M.</given-names></name></person-group> (<year>2021</year>). <article-title>The TV and Movies corpora: design, construction, and use</article-title>. <source><italic>Int. J. Corpus Linguist.</italic></source> <volume>26</volume>, <fpage>10</fpage>&#x2013;<lpage>37</lpage>. <pub-id pub-id-type="doi">10.1075/ijcl.00035.dav</pub-id> <pub-id pub-id-type="pmid">33486653</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davies</surname> <given-names>M.</given-names></name></person-group> (<year>2016/2021</year>). <source><italic>Corpus of News on the Web (NOW).</italic></source> Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.english-corpora.org/now/">https://www.english-corpora.org/now/</ext-link> (accesses January 26, 2022).</citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Davies</surname> <given-names>M.</given-names></name> <name><surname>Fuchs</surname> <given-names>R.</given-names></name></person-group> (<year>2015</year>). <article-title>Expanding horizons in the study of World Englishes with the 1.9 billion word Global Web-based English Corpus (GloWbE).</article-title> <source><italic>Engl. World Wide</italic></source> <volume>36</volume> <fpage>1</fpage>&#x2013;<lpage>28</lpage>. <pub-id pub-id-type="doi">10.1075/eww.36.1.01dav</pub-id> <pub-id pub-id-type="pmid">33486653</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gardner</surname> <given-names>D.</given-names></name> <name><surname>Davies</surname> <given-names>M.</given-names></name></person-group> (<year>2007</year>). <article-title>Pointing out frequent phrasal verbs: a corpus-based analysis.</article-title> <source><italic>TESOL Q.</italic></source> <volume>41</volume> <fpage>339</fpage>&#x2013;<lpage>359</lpage>. <pub-id pub-id-type="doi">10.1002/j.1545-7249.2007.tb00062.x</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Garnier</surname> <given-names>M.</given-names></name> <name><surname>Schmitt</surname> <given-names>N.</given-names></name></person-group> (<year>2015</year>). <article-title>The PHaVE list: a pedagogical list of phrasal verbs and their most frequent meaning senses.</article-title> <source><italic>Lang. Teach. Res.</italic></source> <volume>19</volume> <fpage>645</fpage>&#x2013;<lpage>666</lpage>. <pub-id pub-id-type="doi">10.1177/1362168814559798</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname> <given-names>Y.</given-names></name> <name><surname>Roehrig</surname> <given-names>A. D.</given-names></name></person-group> (<year>2011</year>). <article-title>Roles of general versus second language (L2) knowledge in L2 reading comprehension.</article-title> <source><italic>Read. Foreign Lang.</italic></source> <volume>23</volume> <fpage>42</fpage>&#x2013;<lpage>64</lpage>.</citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ha</surname> <given-names>T. H.</given-names></name></person-group> (<year>2021b</year>). <article-title>Exploring the relationships between various dimensions of receptive vocabulary knowledge and L2 listening and reading comprehension.</article-title> <source><italic>Lang. Test. Asia</italic></source> <volume>11</volume>:<fpage>20</fpage>. <pub-id pub-id-type="doi">10.1186/s40468-021-00131-8</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ha</surname> <given-names>T. H.</given-names></name></person-group> (<year>2021a</year>). <article-title>A rasch-based validation of the vietnamese version of the listening vocabulary levels test.</article-title> <source><italic>Lang. Test. Asia</italic></source> <volume>11</volume>:<fpage>16</fpage>. <pub-id pub-id-type="doi">10.1186/s40468-021-00132-7</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hashimoto</surname> <given-names>B. J.</given-names></name></person-group> (<year>2021</year>). <article-title>Is frequency enough?: The frequency model in vocabulary size testing.</article-title> <source><italic>Lang. Assess. Q.</italic></source> <volume>18</volume> <fpage>171</fpage>&#x2013;<lpage>187</lpage>. <pub-id pub-id-type="doi">10.1080/15434303.2020.1860058</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Heatley</surname> <given-names>A.</given-names></name> <name><surname>Nation</surname> <given-names>I. S. P.</given-names></name> <name><surname>Coxhead</surname> <given-names>A.</given-names></name></person-group> (<year>2002</year>). <source><italic>Range: A Program for the Analysis of Vocabulary in Texts.</italic></source> Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.victoria.ac.nz/lals/about/staff/paul-nation">http://www.victoria.ac.nz/lals/about/staff/paul-nation</ext-link> (accesses January 26, 2022).</citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hsu</surname> <given-names>W.</given-names></name></person-group> (<year>2011</year>). <article-title>The vocabulary thresholds of business textbooks and business research articles for EFL learners.</article-title> <source><italic>Engl. Specif. Purp.</italic></source> <volume>30</volume> <fpage>247</fpage>&#x2013;<lpage>257</lpage>. <pub-id pub-id-type="doi">10.1016/j.esp.2011.04.005</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hsu</surname> <given-names>W.</given-names></name></person-group> (<year>2018</year>). <article-title>The most frequent BNC/COCA mid- and low-frequency word families in English-medium traditional Chinese medicine (TCM) textbooks.</article-title> <source><italic>Engl. Specif. Purp.</italic></source> <volume>51</volume> <fpage>98</fpage>&#x2013;<lpage>110</lpage>. <pub-id pub-id-type="doi">10.1016/j.esp.2018.04.001</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Hu</surname> <given-names>M.</given-names></name> <name><surname>Nation</surname> <given-names>I. S. P.</given-names></name></person-group> (<year>2000</year>). <article-title>Unknown vocabulary density and reading comprehension.</article-title> <source><italic>Read. Foreign Lang.</italic></source> <volume>13</volume> <fpage>403</fpage>&#x2013;<lpage>430</lpage>.</citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Juang</surname> <given-names>L. P.</given-names></name> <name><surname>Simpson</surname> <given-names>J. A.</given-names></name> <name><surname>Lee</surname> <given-names>R. M.</given-names></name> <name><surname>Rothman</surname> <given-names>A. J.</given-names></name> <name><surname>Titzmann</surname> <given-names>P. F.</given-names></name> <name><surname>Schachner</surname> <given-names>M. K.</given-names></name><etal/></person-group> (<year>2018</year>). <article-title>Using attachment and relational perspectives to understand adaptation and resilience among immigrant and refugee youth.</article-title> <source><italic>Am. Psychol.</italic></source> <volume>73</volume> <fpage>797</fpage>&#x2013;<lpage>811</lpage>. <pub-id pub-id-type="doi">10.1037/amp0000286</pub-id> <pub-id pub-id-type="pmid">30188167</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kaneko</surname> <given-names>M.</given-names></name></person-group> (<year>2020</year>). <article-title>Lexical frequency profiling of high-stakes english tests: text coverage of Cambridge first, EIKEN, GTEC, IELTS, TEAP, TOEFL, and TOEIC.</article-title> <source><italic>JACET J.</italic></source> <volume>64</volume> <fpage>79</fpage>&#x2013;<lpage>93</lpage>.</citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lange</surname> <given-names>K.</given-names></name> <name><surname>Matthews</surname> <given-names>J.</given-names></name></person-group> (<year>2020</year>). <article-title>Exploring the relationships between L2 vocabulary knowledge, lexical segmentation, and L2 listening comprehension.</article-title> <source><italic>Stud. Second Lang. Learn. Teach.</italic></source> <volume>10</volume> <fpage>723</fpage>&#x2013;<lpage>749</lpage>. <pub-id pub-id-type="doi">10.14746/ssllt.2020.10.4.4</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Laufer</surname> <given-names>B.</given-names></name></person-group> (<year>1989</year>). &#x201C;<article-title>What percentage of text-lexis is essential for comprehension?</article-title>,&#x201D; in <source><italic>Special Language: From Humans Thinking to Thinking Machines</italic></source>, <role>eds</role> <person-group person-group-type="editor"><name><surname>Lauren</surname> <given-names>C.</given-names></name> <name><surname>Nordman</surname> <given-names>M.</given-names></name></person-group> (<publisher-loc>Clevedon</publisher-loc>: <publisher-name>Multilingual Matters</publisher-name>), <fpage>316</fpage>&#x2013;<lpage>223</lpage>.</citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Laufer</surname> <given-names>B.</given-names></name></person-group> (<year>2013</year>). <article-title>Lexical thresholds for reading comprehension: what they are and how they can be used for teaching purposes.</article-title> <source><italic>TESOL Q.</italic></source> <volume>47</volume> <fpage>867</fpage>&#x2013;<lpage>872</lpage>. <pub-id pub-id-type="doi">10.1002/tesq.140</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Laufer</surname> <given-names>B.</given-names></name> <name><surname>Cobb</surname> <given-names>T.</given-names></name></person-group> (<year>2020</year>). <article-title>How much knowledge of derived words is needed for reading?</article-title> <source><italic>Appl. Linguist.</italic></source> <volume>41</volume> <fpage>971</fpage>&#x2013;<lpage>998</lpage>. <pub-id pub-id-type="doi">10.1093/applin/amz051</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Laufer</surname> <given-names>B.</given-names></name> <name><surname>Ravenhorst-Kalovski</surname> <given-names>G. C.</given-names></name></person-group> (<year>2010</year>). <article-title>Lexical threshold revisited: lexical text coverage, learners&#x2019; vocabulary size and reading comprehension.</article-title> <source><italic>Read. Foreign Lang.</italic></source> <volume>22</volume> <fpage>15</fpage>&#x2013;<lpage>30</lpage>.</citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Laufer</surname> <given-names>B.</given-names></name> <name><surname>Webb</surname> <given-names>S.</given-names></name> <name><surname>Kim</surname> <given-names>S. K.</given-names></name> <name><surname>Yohanan</surname> <given-names>B.</given-names></name></person-group> (<year>2021</year>). <article-title>How well do learners know derived words in a second language? The effect of proficiency, word frequency and type of affix.</article-title> <source><italic>ITL Int. J. Appl. Linguist.</italic></source> <volume>172</volume> <fpage>229</fpage>&#x2013;<lpage>258</lpage>. <pub-id pub-id-type="doi">10.1075/itl.20020.lau</pub-id> <pub-id pub-id-type="pmid">33486653</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Laufer</surname> <given-names>B.</given-names></name></person-group> (<year>2021</year>). <article-title>Lexical thresholds and alleged threats to validity: a storm in a teacup?</article-title> <source><italic>Read. Foreign Lang.</italic></source> <volume>33</volume> <fpage>238</fpage>&#x2013;<lpage>246</lpage>.</citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lewis</surname> <given-names>M.</given-names></name></person-group> (<year>2002</year>). <source><italic>Implementing the Lexical Approach: Putting Theory Into Practice.</italic></source> <publisher-loc>Boston, MA</publisher-loc>: <publisher-name>Thomson Heinle</publisher-name>.</citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lu</surname> <given-names>C.</given-names></name> <name><surname>Coxhead</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>Vocabulary in traditional chinese medicine insights from corpora.</article-title> <source><italic>ITL Int. J. Appl. Linguist.</italic></source> <volume>171</volume> <fpage>34</fpage>&#x2013;<lpage>61</lpage>. <pub-id pub-id-type="doi">10.1075/itl.18020.lu</pub-id> <pub-id pub-id-type="pmid">33486653</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McLean</surname> <given-names>S.</given-names></name> <name><surname>Kramer</surname> <given-names>B.</given-names></name></person-group> (<year>2015</year>). <article-title>The creation of a new vocabulary levels test.</article-title> <source><italic>Shiken</italic></source> <volume>19</volume> <fpage>1</fpage>&#x2013;<lpage>11</lpage>. <pub-id pub-id-type="doi">10.1002/9781118784235.eelt0499</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McLean</surname> <given-names>S.</given-names></name> <name><surname>Kramer</surname> <given-names>B.</given-names></name> <name><surname>Beglar</surname> <given-names>D.</given-names></name></person-group> (<year>2015</year>). <article-title>The creation and validation of a listening vocabulary levels test.</article-title> <source><italic>Lang. Teach. Res.</italic></source> <volume>19</volume> <fpage>741</fpage>&#x2013;<lpage>760</lpage>. <pub-id pub-id-type="doi">10.1177/1362168814567889</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moore</surname> <given-names>T.</given-names></name> <name><surname>Morton</surname> <given-names>J.</given-names></name> <name><surname>Price</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <article-title>Construct validity in the IELTS academic reading test: a comparison of reading requirements in IELTS test items and in university study.</article-title> <source><italic>IELTS Coll. Pap.</italic></source> <volume>2</volume> <fpage>120</fpage>&#x2013;<lpage>211</lpage>.</citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Moore</surname> <given-names>T.</given-names></name> <name><surname>Morton</surname> <given-names>J.</given-names></name> <name><surname>Hall</surname> <given-names>D.</given-names></name> <name><surname>Wallis</surname> <given-names>C.</given-names></name></person-group> (<year>2015</year>). <article-title>Literacy practices in the professional workplace: implications for the IELTS reading and writing tests.</article-title> <source><italic>IELTS Res. Rep. Online Ser.</italic></source> <volume>1</volume> <fpage>1</fpage>&#x2013;<lpage>46</lpage>.</citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nation</surname> <given-names>I. S. P.</given-names></name></person-group> (<year>2006</year>). <article-title>How large a vocabulary is needed to reading and listening?</article-title> <source><italic>Can. Modern Lang. Rev.</italic></source> <volume>63</volume> <fpage>59</fpage>&#x2013;<lpage>82</lpage>. <pub-id pub-id-type="doi">10.3138/cmlr.63.1.59</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nation</surname> <given-names>I. S. P.</given-names></name></person-group> (<year>2007</year>). <article-title>The four strands.</article-title> <source><italic>Int. J. Innov. Lang. Learn. Teach.</italic></source> <volume>1</volume> <fpage>2</fpage>&#x2013;<lpage>13</lpage>. <pub-id pub-id-type="doi">10.2167/illt039.0</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nation</surname> <given-names>I. S. P.</given-names></name></person-group> (<year>2013</year>). <source><italic>Learning Vocabulary in Another Language</italic></source>, <edition>2nd Edn</edition>. <publisher-loc>Cambridge</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>.</citation></ref>
<ref id="B44"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nation</surname> <given-names>I. S. P.</given-names></name></person-group> (<year>2017</year>). <source><italic>The BNC/COCA Level 6 word family lists (Version 1.0.0) [Data file].</italic></source> Available online at: <ext-link ext-link-type="uri" xlink:href="http://www.victoria.ac.nz/lals/staff/paul-nation.aspx">http://www.victoria.ac.nz/lals/staff/paul-nation.aspx</ext-link> (accesses January 26, 2022).</citation></ref>
<ref id="B45"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nation</surname> <given-names>I. S. P.</given-names></name></person-group> (<year>2020</year>). <source><italic>About the BNC/COCA Headword Lists.</italic></source> Available online at: <ext-link ext-link-type="uri" xlink:href="https://www.wgtn.ac.nz/lals/resources/paul-nations-resources/vocabulary-lists">https://www.wgtn.ac.nz/lals/resources/paul-nations-resources/vocabulary-lists</ext-link> (accesses January 26, 2022).</citation></ref>
<ref id="B46"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nation</surname> <given-names>I. S. P.</given-names></name> <name><surname>Macalister</surname> <given-names>J.</given-names></name></person-group> (<year>2021</year>). <source><italic>Teaching ESL/EFL Reading and Writing</italic></source>, <edition>2nd Edn</edition>. <publisher-loc>New York, NY</publisher-loc>: <publisher-name>Routledge</publisher-name>.</citation></ref>
<ref id="B47"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nation</surname> <given-names>I. S. P.</given-names></name> <name><surname>Webb</surname> <given-names>S.</given-names></name></person-group> (<year>2011</year>). <source><italic>Researching and Analyzing Vocabulary.</italic></source> <publisher-loc>Boston, MA</publisher-loc>: <publisher-name>Heinle Cengage Learning</publisher-name>.</citation></ref>
<ref id="B48"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nurmukhamedov</surname> <given-names>U.</given-names></name></person-group> (<year>2017</year>). <article-title>Lexical coverage of TED talks: implications for vocabulary instruction.</article-title> <source><italic>TESOL J.</italic></source> <volume>8</volume> <fpage>768</fpage>&#x2013;<lpage>790</lpage>. <pub-id pub-id-type="doi">10.1002/tesj.323</pub-id></citation></ref>
<ref id="B49"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Nurmukhamedov</surname> <given-names>U.</given-names></name> <name><surname>Sharakhimov</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>Corpus-based vocabulary analysis of english podcasts.</article-title> <source><italic>RELC J.</italic></source> <pub-id pub-id-type="doi">10.1177/0033688220979315</pub-id></citation></ref>
<ref id="B50"><citation citation-type="journal"><collab>Pew Research Center</collab> (<year>2021a</year>). <source><italic>Newspapers Fact Sheet.</italic></source> <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>Pew Research Center</publisher-name>.</citation></ref>
<ref id="B51"><citation citation-type="journal"><collab>Pew Research Center</collab> (<year>2021b</year>). <source><italic>Digital News Fact Sheet.</italic></source> <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>Pew Research Center</publisher-name>.</citation></ref>
<ref id="B52"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Qian</surname> <given-names>D. D.</given-names></name> <name><surname>Lin</surname> <given-names>L. H. F.</given-names></name></person-group> (<year>2020</year>). &#x201C;<article-title>The relationship between vocabulary knowledge and language proficiency</article-title>,&#x201D; in <source><italic>The Routledge Handbook of Vocabulary Studies</italic></source>, <role>ed.</role> <person-group person-group-type="editor"><name><surname>Webb</surname> <given-names>S.</given-names></name></person-group> (<publisher-loc>London</publisher-loc>: <publisher-name>Routledge</publisher-name>), <fpage>66</fpage>&#x2013;<lpage>80</lpage>. <pub-id pub-id-type="doi">10.4324/9780429291586-5</pub-id></citation></ref>
<ref id="B53"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rahmat</surname> <given-names>Y. N.</given-names></name> <name><surname>Coxhead</surname> <given-names>A.</given-names></name></person-group> (<year>2021</year>). <article-title>Investigating vocabulary coverage and load in an Indonesian EFL textbook series.</article-title> <source><italic>Indones. J. Appl. Linguist.</italic></source> <volume>10</volume> <fpage>804</fpage>&#x2013;<lpage>814</lpage>. <pub-id pub-id-type="doi">10.17509/IJAL.V10I3.31768</pub-id></citation></ref>
<ref id="B54"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schmitt</surname> <given-names>N.</given-names></name> <name><surname>Cobb</surname> <given-names>T.</given-names></name> <name><surname>Horst</surname> <given-names>M.</given-names></name> <name><surname>Schmitt</surname> <given-names>D.</given-names></name></person-group> (<year>2017</year>). <article-title>How much vocabulary is needed to use English? Replication of van Zeeland &#x0026; Schmitt (2012), Nation (2006) and Cobb (2007).</article-title> <source><italic>Lang. Teach.</italic></source> <volume>50</volume> <fpage>212</fpage>&#x2013;<lpage>226</lpage>. <pub-id pub-id-type="doi">10.1017/s0261444815000075</pub-id></citation></ref>
<ref id="B55"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schmitt</surname> <given-names>N.</given-names></name> <name><surname>Jiang</surname> <given-names>X.</given-names></name> <name><surname>Grabe</surname> <given-names>W.</given-names></name></person-group> (<year>2011</year>). <article-title>The percentage of words known in a text and reading comprehension.</article-title> <source><italic>Modern Lang. J.</italic></source> <volume>95</volume> <fpage>26</fpage>&#x2013;<lpage>43</lpage>. <pub-id pub-id-type="doi">10.1111/j.1540-4781.2011.01146.x</pub-id></citation></ref>
<ref id="B56"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shearer</surname> <given-names>E.</given-names></name></person-group> (<year>2021</year>). <source><italic>More Than Eight-in-ten Americans Get News from Digital Devices.</italic></source> <publisher-loc>Washington, DC</publisher-loc>: <publisher-name>Pew Research Center</publisher-name>.</citation></ref>
<ref id="B57"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stewart</surname> <given-names>J.</given-names></name> <name><surname>Vitta</surname> <given-names>J. P.</given-names></name> <name><surname>Nicklin</surname> <given-names>C.</given-names></name> <name><surname>McLean</surname> <given-names>S.</given-names></name> <name><surname>Geoffrey</surname> <given-names>G.</given-names></name> <name><surname>Pinchbeck</surname> <given-names>G. G.</given-names></name><etal/></person-group> (<year>2021</year>). <article-title>The relationship between word difficulty and frequency: a response to Hashimoto (2021).</article-title> <source><italic>Lang. Assess. Q.</italic></source> <pub-id pub-id-type="doi">10.1080/15434303.2021.1992629</pub-id></citation></ref>
<ref id="B58"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Stoeckel</surname> <given-names>T.</given-names></name> <name><surname>McLean</surname> <given-names>S.</given-names></name> <name><surname>Nation</surname> <given-names>P.</given-names></name></person-group> (<year>2020</year>). <article-title>Limitations of size and levels tests of written receptive vocabulary knowledge.</article-title> <source><italic>Stud. Second Lang. Acquis.</italic></source> <volume>43</volume> <fpage>181</fpage>&#x2013;<lpage>203</lpage>. <pub-id pub-id-type="doi">10.1017/S027226312000025X</pub-id></citation></ref>
<ref id="B59"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tegge</surname> <given-names>F.</given-names></name></person-group> (<year>2017</year>). <article-title>The lexical coverage of popular songs in English language teaching.</article-title> <source><italic>System</italic></source> <volume>67</volume> <fpage>87</fpage>&#x2013;<lpage>98</lpage>. <pub-id pub-id-type="doi">10.1016/j.system.2017.04.016</pub-id></citation></ref>
<ref id="B60"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>van Zeeland</surname> <given-names>H.</given-names></name> <name><surname>Schmitt</surname> <given-names>N.</given-names></name></person-group> (<year>2013</year>). <article-title>Lexical coverage in L1 and L2 listening comprehension: the same or different from reading comprehension?</article-title> <source><italic>Appl. Linguist.</italic></source> <volume>34</volume> <fpage>457</fpage>&#x2013;<lpage>479</lpage>. <pub-id pub-id-type="doi">10.1093/applin/ams074</pub-id></citation></ref>
<ref id="B61"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wan</surname> <given-names>T. L. A.</given-names></name> <name><surname>Cowie</surname> <given-names>C.</given-names></name></person-group> (<year>2021</year>). <article-title>Conflicts between world Englishes &#x2013; online metalinguistic discourse about Singapore Colloquial English.</article-title> <source><italic>Engl. World Wide</italic></source> <volume>42</volume> <fpage>85</fpage>&#x2013;<lpage>110</lpage>. <pub-id pub-id-type="doi">10.1075/eww.00061.wan</pub-id> <pub-id pub-id-type="pmid">33486653</pub-id></citation></ref>
<ref id="B62"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Webb</surname> <given-names>S.</given-names></name></person-group> (<year>2021</year>). <article-title>Research investigating lexical coverage and lexical profiling: what we know, what we don&#x2019;t know, and what needs to be examined.</article-title> <source><italic>Read. Foreign Lang.</italic></source> <volume>33</volume> <fpage>278</fpage>&#x2013;<lpage>293</lpage>.</citation></ref>
<ref id="B63"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Webb</surname> <given-names>S.</given-names></name> <name><surname>Macalister</surname> <given-names>J.</given-names></name></person-group> (<year>2013</year>). <article-title>Is text written for children useful for L2 extensive reading?</article-title> <source><italic>TESOL Q.</italic></source> <volume>47</volume> <fpage>300</fpage>&#x2013;<lpage>322</lpage>. <pub-id pub-id-type="doi">10.1002/tesq.70</pub-id></citation></ref>
<ref id="B64"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Webb</surname> <given-names>S.</given-names></name> <name><surname>Nation</surname> <given-names>I. S. P.</given-names></name></person-group> (<year>2013</year>). &#x201C;<article-title>Computer-assisted vocabulary load analysis</article-title>,&#x201D; in <source><italic>The Encyclopaedia of Applied Linguistics</italic></source>, <role>ed.</role> <person-group person-group-type="editor"><name><surname>Chappelle</surname> <given-names>C.</given-names></name></person-group> (<publisher-loc>London</publisher-loc>: <publisher-name>Wiley-Blackwell</publisher-name>), <fpage>1</fpage>&#x2013;<lpage>10</lpage>.</citation></ref>
<ref id="B65"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Webb</surname> <given-names>S.</given-names></name> <name><surname>Nation</surname> <given-names>I. S. P.</given-names></name></person-group> (<year>2017</year>). <source><italic>How Vocabulary is Learned.</italic></source> <publisher-loc>Oxford</publisher-loc>: <publisher-name>Oxford University Press</publisher-name>.</citation></ref>
<ref id="B66"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Webb</surname> <given-names>S.</given-names></name> <name><surname>Rodgers</surname> <given-names>M. P. H.</given-names></name></person-group> (<year>2009a</year>). <article-title>The lexical coverage of movies.</article-title> <source><italic>Appl. Linguist.</italic></source> <volume>30</volume> <fpage>407</fpage>&#x2013;<lpage>427</lpage>. <pub-id pub-id-type="doi">10.1093/applin/amp010</pub-id></citation></ref>
<ref id="B67"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Webb</surname> <given-names>S.</given-names></name> <name><surname>Rodgers</surname> <given-names>M. P. H.</given-names></name></person-group> (<year>2009b</year>). <article-title>Vocabulary demands of television programs.</article-title> <source><italic>Lang. Learn.</italic></source> <volume>59</volume> <fpage>235</fpage>&#x2013;<lpage>366</lpage>.</citation></ref>
<ref id="B68"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Webb</surname> <given-names>S.</given-names></name> <name><surname>Sasao</surname> <given-names>Y.</given-names></name> <name><surname>Balance</surname> <given-names>O.</given-names></name></person-group> (<year>2017</year>). <article-title>The updated vocabulary levels test.</article-title> <source><italic>ITL Int. J. Appl. Linguist.</italic></source> <volume>168</volume> <fpage>33</fpage>&#x2013;<lpage>69</lpage>. <pub-id pub-id-type="doi">10.1075/itl.168.1.02web</pub-id> <pub-id pub-id-type="pmid">33486653</pub-id></citation></ref>
<ref id="B69"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Wilkins</surname> <given-names>D. A.</given-names></name></person-group> (<year>1972</year>). <source><italic>Linguistics in Language Teaching.</italic></source> <publisher-loc>Cambridge, MA</publisher-loc>: <publisher-name>MIT Press</publisher-name>.</citation></ref>
<ref id="B70"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Yang</surname> <given-names>L.</given-names></name> <name><surname>Coxhead</surname> <given-names>A.</given-names></name></person-group> (<year>2020</year>). <article-title>A corpus-based study of vocabulary in the new concept English textbook series.</article-title> <source><italic>RELC J.</italic></source> <pub-id pub-id-type="doi">10.1177/0033688220964162</pub-id></citation></ref>
<ref id="B71"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>D.</given-names></name></person-group> (<year>2012</year>). <article-title>Vocabulary and grammar knowledge in second language reading comprehension: a structural equation modeling study.</article-title> <source><italic>Modern Lang. J.</italic></source> <volume>96</volume> <fpage>558</fpage>&#x2013;<lpage>575</lpage>. <pub-id pub-id-type="doi">10.1111/j.1540-4781.2012.01398.x</pub-id></citation></ref>
<ref id="B72"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Zhang</surname> <given-names>D.</given-names></name> <name><surname>Koda</surname> <given-names>K.</given-names></name></person-group> (<year>2013</year>). <article-title>Morphological awareness and reading comprehension in a foreign language: a study of young Chinese EFL learners.</article-title> <source><italic>System</italic></source> <volume>41</volume> <fpage>901</fpage>&#x2013;<lpage>913</lpage>. <pub-id pub-id-type="doi">10.1016/j.system.2013.09.009</pub-id></citation></ref>
</ref-list>
<glossary>
<title>Abbreviations</title>
<def-list id="DL1">
<def-item><term>BNC</term><def><p>British National Corpus</p></def></def-item>
<def-item><term>COCA</term><def><p>Corpus of Contemporary American English</p></def></def-item>
<def-item><term>ESL</term><def><p>English as a second language</p></def></def-item>
<def-item><term>EFL</term><def><p>English as a foreign language</p></def></def-item>
<def-item><term>MW</term><def><p>marginal words</p></def></def-item>
<def-item><term>NOW</term><def><p>News on the Web</p></def></def-item>
<def-item><term>PN</term><def><p>proper nouns</p></def></def-item>
<def-item><term>TC</term><def><p>transparent compounds</p></def></def-item>
<def-item><term>US</term><def><p>United States</p></def></def-item>
<def-item><term>UK</term><def><p>United Kingdom.</p></def></def-item>
</def-list>
</glossary>
<fn-group>
<fn id="footnote1">
<label>1</label>
<p><ext-link ext-link-type="uri" xlink:href="https://www.english-corpora.org/">https://www.english-corpora.org/</ext-link></p></fn>
</fn-group>
</back>
</article>
