This article was submitted to Stellar and Solar Physics, a section of the journal Frontiers in Astronomy and Space Sciences
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Coronal mass ejections (CMEs), the most important pieces of the puzzle that drive space weather, are continuously studied for their geomagnetic impact. We present here an update of a logistic regression method model, that attempts to forecast if a CME will arrive at the Earth and it will be associated with a geomagnetic storm defined by a minimum Dst value smaller than −30 nT. The model is run for a selection of CMEs listed in the LASCO catalogue during the solar cycle 24. It is trained on three fourths of these events and validated for the remaining one fourth. Based on five CME properties (the speed at 20 solar radii, the angular width, the acceleration, the measured position angle and the source position – binary variable) the model successfully predicted 98% of the events from the training set, and 98% of the events from the validation one.
Forecasting if a coronal mass ejection (CME) is geoeffective (i.e., capable of causing a geomagnetic disturbance) is a subject of increasing interest during the last decade, because of the high impact these eruptive events may have on the technological system in orbit or on Earth. Each model must take into consideration some approximation and, thus, no model can currently predict with a 100% accuracy the impact of a CME.
It is known that the CMEs reaching Earth’s magnetosphere can produce large perturbations in the geomagnetic field known as geomagnetic storms. The first indication of a geomagnetic storm is shown by a decrease of the Dst index, with storms being classified as small if −50 nT
The geoeffective CMEs predominantly originate from sources near the central meridian, mostly from the western hemisphere (
The geoeffectiveness of the CME will also depend on its particular evolution, which is related to both internal CME properties (kinematic, geometric and magnetic), and (external) solar wind plasma properties (see e.g., the review by
It was shown that interacting CMEs in the heliosphere amplify the geomagnetic response (
Correlations between CMEs, ICMEs and the sunspot number have been intensively studied (
Comparing the last two solar cycles, the 23rd one has been more geoeffective than the 24th one (
The present study covers the solar cycle 24 and it takes into consideration the possibility that there is a model simple enough, based only on CME parameters derived close to the Sun, which could predict that a CME will reach the Earth and it will produce a geomagnetic storm.
The model is based on an updated logistic regression method (
The model takes into consideration the full-chain of events CMEs-ICMEs-Geomagnetic Storms and outputs the probability of a CME being geoeffective or not.
The paper is structured as follows:
In order to select our events we looked in the LASCO CME catalogue (
In this period there were approximately 17,000 CMEs detected. We excluded CMEs catalogued with “poor events” and “very poor events,” which amounted to more than 12,500 CMEs, i.e., about 73% of the total CMEs observed by LASCO in the studied period. The classification of events is linked to the quality index (0–5) for the tracking feature (leading edge) of each CME: very poor, poor, fair, typical, good, and excellent. Very poor event (quality index 0) means a CME with an ill-defined leading edge and poor event (quality index 1) is a CME where the leading edge is not clear and sharp enough to be accurately tracked in different frames (see e.g.,
We further excluded the CMEs that have an angular width smaller than 60°, leaving us with 2,794 CMEs to study. This second selection criteria is justified since, in order for a CME to arrive at the Earth and to produce a geomagnetic storm it should have a large angular extent (e.g.,
In general, full halo (apparent angular width of 360°) and partial halo (apparent angular width larger than 120°) CMEs in LASCO images are considered as potential candidates to impact the Earth (if their source region is on the Earth-facing solar disk). A normal CME, seen above the limb with an angular width of around 60°, will appear as a halo CME or partial halo CME when oriented along the Sun-Earth-line (both: towards to or away from Earth) or some 40° off that line, respectively (e.g.,
However, it was also demonstrated that narrow CMEs (AW
The association of our events with the interplanetary disturbances was extracted from the ICME Catalogue (
Out of these 49 ICMEs, 16 did not produce any geomagnetic disturbances (i.e., Dst
The Dst
The location of the CME on the solar disk was derived by checking each event individually. We looked for signatures like dimmings, waves, eruptive prominences. We looked at the combined EUV (SOHO/EIT or SDO/AIA) and white-light (LASCO) movies as given in the catalogue. If nothing was seen in running difference images, we checked EUV normal movies (for e.g., sdoa193_c2rdf.html in Java Movie) to better see the dimmings and the waves. For dimmings we also checked the Solar Demon catalogue (
Predictive models are used in almost every scientific field. Given a set of independent variables, the output of such a model will compute the probability that the dependant variable will have a certain behavior when the combination of the independent variables is the “right-one.”
The logistic regression is a class of regression that needs an independent variable or a set of independent variables to predict a dependent one. Therefore, besides the five independent variables (CME speed at 20 solar radii, its angular width, measured position angle, the acceleration and a binary variable for position), the model needs a dependent one. For this we have chosen a binary variable defined by 0 if the Dst
The solar wind sometimes completes accelerating before 20 solar radii (
The model used in this study is a modified version of
The equation used in the model is:
The initial
In this study we use a similar approach to
The software used was selected from the IMSL package of the Interactive Data Language (IDL). IMSL_nonlinregress is a function that fits a nonlinear regression model using least squares. All the details about its programming notes, usage and output can be found at
A.I. flow.
We studied all the properties listed in the LASCO catalogue linear speed, second order speed at final height, second order speed at 20 solar radii, the central and measured position angle, the angular width, the acceleration, the mass and energy of the CME.
We eliminated variables that correlated amongst them. The full correlation tables of all CME parameters can be found in
Hence, in this study the new set of independent variables consisted of: the speed of the CME at 20 solar radii, its angular width, measured position angle, acceleration and the location of the source region. The location bin variable was set to be 0 if the source was on the backside of the Sun, and 1 if the source was on the frontside, disregarding its exact latitude and longitude. Thus our dataset of 2,796 CMES consists of 1,647 frontside CMEs and 1,149 backside ones.
The measurements that were not binary variables defined (speed at 20 solar radii, measured position angle and angular width) were normalized to unity in order to minimize the possible numerical errors or discrepancies due to the variable ranges.
We also used a set of standardized data computed by removing the mean and dividing by the standard deviation (e.g.,
The resulting logistic regression coefficients following the non-linear logistic regression model for normalized values (first row) and for standardized values (second row).
Indenpendent variable/coefficient | V20R | AW
|
MPA
|
Acc
|
Pos
|
b0 |
---|---|---|---|---|---|---|
Regression coefficient | −1.8616 | 34.4414 | −0.3007 | 21.4110 | 9.1932 | −44.7622 |
Regression coefficient *ST | −0.1706 | 8.3054 | −0.0862 | 0.8432 | 9.2338 | −31.9990 |
The output after running the non-linear logistic regression model are the six coefficients, b0 … b5 (see
Choosing the normalization method of data preparation suggest that the CME angular width is the most important predictor, while choosing the standardization method, the most important one is the CME source position.
The other predictors have the same importance in both methods. The residual sum of squares for both methods has the same value.
The presented set of independent variables was selected because it had the smallest residual sum of squares value. The residual sum of squares was calculated by IDL and stored into the SSE variable.
Other sets that we have tried were: [
As already mentioned, we divided the events into the two categories needed for running the model, training and validation, three fourths for the training one, and the remaining one fourth for the validation one.
Thus, the training set contained 2,097 events, with 33 positive events included. By positive event we define a CME that reached the Earth and that was associated with a geomagnetic storm (i.e., a minimum Dst value
Using the coefficients displayed in
The validation set contained 699 events with six positive events included. For this set the success rate was 0.989 and 0.989, respectively for the normalized and standardized set. The validation set din not correctly forecasted any of the six CMEs that were associated with geomagnetic storms.
In order to study the geoeffectiveness of our 2,796 CMEs during SC24, we have attempted a statistical analysis of the CME evolution with the solar cycle.
Every aspect of the solar activity varies during the 11-years solar cycle. Taking the sunspot number as the most significant indicator of the cycle’s activity, this would mean that coronal mass ejections will also vary with the sunspot number, either in correlation or anticorrelation.
Solar cycle 24 began on January 1, 2007 with its ascending phase lasting fifteen months until April 1, 2010. Solar cycle’s maximum phase started on July 1, 2011 and ended on March 31, 2015. It had two maxima on October 2013 and February 2014. The descending phase ended on July 31, 2017. The maximum number of detected CMEs coincides with the year of the maximum monthly smoothed sunspot number. In another study, no significant correlation between the phases of solar cycle and yearly occurrence of intense and great storms has been found (
Generally, the yearly number of detected CMEs follows the yearly smoothed sunspot number as seen in
We have observed that 68% of the CMEs were detected during the maximum phase of the solar cycle and that the descending phase had the least events–only 10%. Similarly, high speed CMEs (the speed at 20 solar radii exceeding 1,000 km/s) were significantly more during the maximum phase of the cycle (129), while the descending phase had the smallest number (20).
Considering CMEs from the point of view of the MPA, there are more CMEs measured in the northern hemisphere–with
The right panel of
The slight preference for the northern hemisphere is not reproduced for the CMEs that were detected near Earth. There were 15 CMEs coming from regions near the poles (
Nine out of these 15 ICMEs were detected during the maximum phase of SC24, which is contrary to the fact that most of the ICMEs were detected during the descending phase (29 ICMEs out of the 49 included in our set). 21 were followed by geomagnetic storms. Halo CMEs are most geoeffective between the maximum and descending phases of SC23 (
A better understanding of the linkage between CMEs and solar activity cycle should improve our understanding about their geoeffectiveness. Some studies (e.g.,
A classification of CMEs by their linear speed into three categories (v
Our study has 1,352 CMEs coming from the western hemisphere and 1,444 from the eastern one. Out of these, there were 23 ICMEs and 26, respectively. Cycle 24 lacks in events driving extreme geomagnetic storms compared to past solar cycles. Out of the 49 ICMEs included in our study, 33 have been followed by geomagnetic storms.
For solar cycle 24
Only around 50% of the ICMEs were generating GSs during the years 1996–2017. Out of these, around 23% generated intense GSs (with Dst
In our dataset containing 49 ICMEs there are nine intense geomagnetic storms associated with them, and only one severe storm (Dst
Using a Spearman rank correlation coefficient between Dst index and CME speed for 33 halo CMEs from the beginning of the past solar cycle (2009–2013).
In our study, out of the 2,796 CMEs, there were 276 halo ones, out of which 24 were associated with geomagnetic storms, having velocities ranging from 143 to 3,163 km/s. This resulted in a 0.08 Spearman coefficient between the linear speed and the Dst index.
In a propagation through the interplanetary space analysis of 53 fast Earth-directed halo CMEs observed by the LASCO instrument during the period January 2009–September 2015
No other statistics of the measured CME properties have shown a noticeable dependence of the solar cycle evolution.
We have applied a non-linear logistic regression model to a selected set of CMEs detected by LASCO in order to evaluate their geoeffectiveness such as defined by their association to a geomagnetic storm. The selected CMEs excluded “poor” and “very poor” events and CMEs with angular width less than 60
Besides CME-CME interaction there is now an increasing concern that stealth CMEs are also important from the space weather perspective (e.g.,
As the stealth CMEs are lacking low-coronal signatures, their source regions could not be identified. In consequence, these CMEs were considered as originating from the backside of the Sun (location variable was set to 0). This implies that our model will not forecast that stealth CMEs will have any impact on the Earth.
During their journey from the Sun to the Earth, CMEs can accelerate/decelerate, deflect, rotate and deform (see e.g.,
The paragraphs above reveal the limitation of our model by using only the CME parameters as input for the model. A possible improvement might be the addition of some weighting coefficients to increase the significance of the positive events in the training process. For a more robust analysis one also needs to take into consideration the interaction between the CMEs and the ambient solar wind during their journey to the Earth. Throughout their propagation, the CME parameters like speed, shape, etc. change considerably and this has a big impact on their geoeffective response.
However, we consider this model to be a sustainable one for the purpose of predicting the association of a geomagnetic storm to a CME which arrived at Earth, based solely on the measurements of the CME’s properties.
Another improvement of this model could be the addition of the tilt angle of the CME to the dataset, in order to better estimate the direction of the CME propagation, even though it will not take into consideration the interplanetary interactions.
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
DB-I contributed to construction of database, running the model, main writer of the text. MM contributed to construction of database, writing and reviewing text and software.
Part of DB-I’s work was supported by a grant of the Romanian Ministery of Research and Innovation, CCCDI - UEFISCDI, project number PN-III-P1-1.2-PCCDI-2017-0226/16PCCDI/2018, within PNCDI III.
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The CME LASCO catalog is generated and maintained at the CDAW Data Center by NASA and The Catholic University of America in cooperation with the Naval Research Laboratory. SOHO is a project of international cooperation between ESA and NASA.