<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xml:lang="EN" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="editorial">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Big Data</journal-id>
<journal-title>Frontiers in Big Data</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Big Data</abbrev-journal-title>
<issn pub-type="epub">2624-909X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fdata.2022.898643</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Big Data</subject>
<subj-group>
<subject>Editorial</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Editorial: Statistical Learning for Predicting Air Quality</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Rybarczyk</surname> <given-names>Yves Philippe</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="corresp" rid="c001"><sup>&#x0002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/42861/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Zalakeviciute</surname> <given-names>Rasa</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/1208949/overview"/>
</contrib>
</contrib-group>
<aff id="aff1"><sup>1</sup><institution>School of Information and Engineering, Dalarna University</institution>, <addr-line>Falun</addr-line>, <country>Sweden</country></aff>
<aff id="aff2"><sup>2</sup><institution>Grupo de Biodiversidad Medio Ambiente y Salud, Universidad de Las Am&#x000E9;ricas</institution>, <addr-line>Quito</addr-line>, <country>Ecuador</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited and Reviewed by: John S. Kimball, University of Montana, United States</p></fn>
<corresp id="c001">&#x0002A;Correspondence: Yves Philippe Rybarczyk <email>rybar63&#x00040;gmail.com</email></corresp>
<fn fn-type="other" id="fn001"><p>This article was submitted to Data-driven Climate Sciences, a section of the journal Frontiers in Big Data</p></fn></author-notes>
<pub-date pub-type="epub">
<day>05</day>
<month>05</month>
<year>2022</year>
</pub-date>
<pub-date pub-type="collection">
<year>2022</year>
</pub-date>
<volume>5</volume>
<elocation-id>898643</elocation-id>
<history>
<date date-type="received">
<day>17</day>
<month>03</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>31</day>
<month>03</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x000A9; 2022 Rybarczyk and Zalakeviciute.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Rybarczyk and Zalakeviciute</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<related-article id="RA1" related-article-type="commentary-article" xlink:href="https://www.frontiersin.org/research-topics/19136/statistical-learning-for-predicting-air-quality" ext-link-type="uri">Editorial on Research Topic <article-title>Statistical Learning for Predicting Air Quality</article-title>
</related-article>
<kwd-group>
<kwd>machine learning</kwd>
<kwd>urban pollution</kwd>
<kwd>deep learning</kwd>
<kwd>chemical transport model (CTM)</kwd>
<kwd>forecast</kwd>
</kwd-group>
<counts>
<fig-count count="0"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="0"/>
<page-count count="2"/>
<word-count count="1095"/>
</counts>
</article-meta>
</front>
<body>
<p>The concentration of air pollutants is traditionally explained by complex physical and chemical processes of dispersion and advection. This is the reason why the prediction of air quality is usually addressed through deterministic models, such as Chemical Transport Models (CTMs).</p>
<p>However, the CTMs show several limitations and constraints. Their performance depends on an updated emission inventory of the urban area, which is often compromised in developing countries. They also struggle to make an accurate air pollution forecast in complex terrain regions. Moreover, they require high computational power, in order to run time-consuming simulations.</p>
<p>More recently, statistical models based on Machine Learning (ML) algorithms have appeared as a valuable alternative to tackle many disadvantages of the CTMs. They seem particularly relevant to provide a fine resolution at an urban scale, where the estimation of air contamination is of the most importance for health concerns. In that sense, ML could become the new paradigm for pollution forecasting.</p>
<p>The main goal of this Research Topic is to understand if ML can become the new standard for air quality prediction. Among the several ML methods, we intend to identify the most suitable algorithms for atmospheric pollution forecasting. Such an investigation considers all the dimensions of the prediction performance, which includes both the accuracy and the interpretability of the models. For example, the non-linear models (e.g., ensemble learning or artificial neural networks) tend to be more accurate but less interpretable than a linear regression.</p>
<p>The first paper highlights the fact that a data-driven method such as ML can consider an infinite number of factors affecting air quality, which can improve drastically the prediction. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fdata.2022.822573">Saheer et al.</ext-link> explain that ML can take into account several heterogenous factors, such as urban traffic, aerial imagery of terrains and vegetation, and weather conditions, for a more reliable prediction of air quality. The authors propose a cost-effective framework composed of different machine learning methods, from statistical to deep learning algorithms.</p>
<p>The benefit of ML over the CTM approach is demonstrated in the second article. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fdata.2022.781309">Fan et al.</ext-link> compare the performance of a chemical transport model (AIRPACT) and two machine leaning (ML) models to forecast O<sub>3</sub> in Kennewick (WA, USA). The first ML model (ML1) uses the random forest (RF) classifier and multiple linear regression (MLR) models, and the second model (ML2) uses a two-phase RF regression model. ML1 and ML2 are the best models to predict high and low O<sub>3</sub> pollution events, respectively. On top of that, the ML models require much less computational resources than AIRPACT, which suggests that ML is a better solution than CTMs to forecast O<sub>3</sub>.</p>
<p>The third study shows that ML is not only a suitable method to predict O<sub>3</sub> but can be applied to predict any kind of pollutants. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fdata.2022.826517">Mendes et al.</ext-link> use Classification and Regression Tree (CART) and multiple regression (MR) to forecast PM<sub>10</sub>, PM<sub>2.5</sub>, NO<sub>2</sub>, and O<sub>3</sub> concentrations in Portugal (Lisbon and Madeira) and Macao. The proposed models are able to predict the concentration of the pollutants for the next day, with a good accuracy.</p>
<p>Finally, the last manuscript addresses the question of the effect of the COVID-19 Lockdown on air quality change. <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.3389/fdata.2022.842455">Chau et al.</ext-link> propose a new approach based on Weather Normalized Modeling to get a more reliable estimation of the concentration of pollutants under a business-as-usual assumption. Several Deep Learning (DL) algorithms and Gradient Boosted Machine (GBM) are tested to quantify the impact of the human mobility reduction on the concentration of the criteria pollutants (CO, NO<sub>2</sub>, PM<sub>2.5</sub>, SO<sub>2</sub>, and O<sub>3</sub>) in Quito, Ecuador. The results show that Long-Short Term Memory (LSTM) and Bidirectional Recurrent Neural Network (BiRNN) outperform the other algorithms. All the pollutants have significantly reduced, except O<sub>3</sub> that increased by titration effect. Besides revealing the better accuracy of DL over the other methods, this work identifies the most important factors to predict air pollution.</p>
<p>Overall, the studies of this Research Topic tend to demonstrate that statistical or machine learning is a powerful alternative method to the traditional CTM approach, whatever the aspect of pollution forecasting considered. ML is a fast and affordable technique which requires less computational power for an accuracy that can be higher than CTM. Also, the recent progress in the ML algorithms allow a disclosure of the models, which were until now considered as a black box. Resolving the model interpretation issue can definitely rank the ML approach as the best method for predicting air quality.</p>
<sec id="s1">
<title>Author Contributions</title>
<p>YR has written the article. RZ has revised and edited the text. All authors contributed to the article and approved the submitted version.</p>
</sec>
<sec sec-type="COI-statement" id="conf1">
<title>Conflict of Interest</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
<sec sec-type="disclaimer" id="s2">
<title>Publisher&#x00027;s Note</title>
<p>All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.</p>
</sec>
</body>
</article>