Global Magnetohydrodynamic Simulations: Performance Quantification of Magnetopause Distances and Convection Potential Predictions

The performance of three global magnetohydrodynamic (MHD) models in estimating the Earth's magnetopause location and ionospheric cross polar cap potential (CPCP) have been presented. Using the Community Coordinated Modeling Center's Run-on-Request system and extensive database on results of various magnetospheric scenarios simulated for a variety of solar weather patterns, the aforementioned model predictions have been compared with magnetopause standoff distance estimations obtained from six empirical models, and with cross polar cap potential estimations obtained from the Assimilative Mapping of Ionospheric Electrodynamics (AMIE) Model and the Super Dual Auroral Radar Network (SuperDARN) observations. We have considered a range of events spanning different space weather activity to analyze the performance of these models. Using a fit performance metric analysis for each event, the models' reproducibility of magnetopause standoff distances and CPCP against empirically-predicted observations were quantified, and salient features that govern the performance characteristics of the modeled magnetospheric and ionospheric quantities were identified. Results indicate mixed outcomes for different models during different events, with almost all models underperforming during the extreme-most events. The quantification also indicates a tendency to underpredict magnetopause distances in the absence of an inner magnetospheric model, and an inclination toward over predicting CPCP values under general conditions.

The performance of three global magnetohydrodynamic (MHD) models in estimating the Earth's magnetopause location and ionospheric cross polar cap potential (CPCP) have been presented. Using the Community Coordinated Modeling Center's Run-on-Request system and extensive database on results of various magnetospheric scenarios simulated for a variety of solar weather patterns, the aforementioned model predictions have been compared with magnetopause standoff distance estimations obtained from six empirical models, and with cross polar cap potential estimations obtained from the Assimilative Mapping of Ionospheric Electrodynamics (AMIE) Model and the Super Dual Auroral Radar Network (SuperDARN) observations. We have considered a range of events spanning different space weather activity to analyze the performance of these models. Using a fit performance metric analysis for each event, the models' reproducibility of magnetopause standoff distances and CPCP against empirically-predicted observations were quantified, and salient features that govern the performance characteristics of the modeled magnetospheric and ionospheric quantities were identified. Results indicate mixed outcomes for different models during different events, with almost all models underperforming during the extreme-most events. The quantification also indicates a tendency to underpredict magnetopause distances in the absence of an inner magnetospheric model, and an inclination toward over predicting CPCP values under general conditions.

INTRODUCTION
The global state of the terrestrial magnetosphere may be broadly characterized by two categories of physical identifiers: (a) geomagnetic indices which indicate variations in the near-Earth space environment due to activity (e.g., Dst, Sym-H, Kp, AE; Pulkkinen et al., 2011;Glocer et al., 2016;Liemohn et al., 2018), and (b) physical quantities that help describe the morphology and energy balance in the magnetosphere (ground magnetic perturbations dB/dt and B, field aligned currents, polar cap potential; Rastätter et al., 2011;Honkonen et al., 2013;Pulkkinen et al., 2013;Anderson et al., 2017;Welling et al., 2017). In the latter set, the cross polar cap potential (CPCP) and magnetopause standoff distances (MPSD) are two widely used physical quantities that simultaneously help define the structure and state of the magnetospheric system. The MPSD, defined as the nearest subsolar point of the magnetopause to the Earth's surface (e.g., Fairfield, 1971;Elsen and Winglee, 1997;Gombosi, 1998), has been a predominant measure in studying compression of the Earth's dayside magnetosphere (e.g., Welling et al., 2021), while providing an instantaneous value of the energy imparted on the terrestrial magnetic system by the solar wind (e.g., Lin et al., 2010). The CPCP, on the other hand, acts as an instantaneous indicator of the amount of energy flowing into the Earth's magnetosphere-ionosphere system from the solar wind (e.g., Boyle et al., 1997;Burke et al., 1999;Russell et al., 2001;Liemohn and Ridley, 2002;Ridley and Liemohn, 2002;Ridley, 2005;Ridley et al., 2010), and is frequently used in conjunction with field aligned currents (FACs) to describe ionospheric electrodynamics (e.g., Reiff et al., 1981;Siscoe et al., 2002a,b;Khachikjan et al., 2008;Mukhopadhyay et al., 2020). Observationally, these two quantities are difficult to measure globally, with MPSD estimates largely depending on satellite crossings of the magnetopause over a distributed period of time (e.g., Shue et al., 1997), and CPCP depending on incomplete global coverage of the hemisphere using groundbased observations and/or in-situ measurements from space (e.g., Gao, 2012). These quantities are, therefore, measured using physics-driven empirical (e.g., Petrinec and Russell, 1993;Boyle et al., 1997;Shue et al., 1997) or assimilative techniques (e.g., Kihn and Ridley, 2005). Since most of these techniques were created for different initial conditions (e.g., Lin et al., 2010;Gao, 2012), comparison of multiple such models against firstprinciples-based global models or each other is a daunting task. This task is made especially precarious when studying extreme events, as most of these techniques were not designed to simulate extreme conditions (e.g., Welling et al., 2017;Mukhopadhyay et al., 2020).
Several empirical models have been developed to estimate the MPSD. Physically, the size and shape of the magnetopause can be estimated based on the dynamic and static pressure of the solar wind (e.g., Kivelson and Russell, 1995) along with sufficient knowledge of the interplanetary magnetic field. This is the primary basis of these models that estimate MPSD by assuming a general shape of the magnetopause. The most commonly used magnetopause models such as the Shue et al. (1997Shue et al. ( , 1998) models or the Russell (1993, 1996) model use trigonometric functions and solar wind parameters to describe the MPSD. Later models such as the Liu et al. (2015) model have attempted to include additional pressure and magnetic field components of the solar wind using predicted values from first-principlesbased models in addition to satellite crossing data in order to improve on these empirical models. A performance analysis of many such models was presented by Lin et al. (2010) to compare their model against a range of empirical models dating back to 1993. More recently, Staples et al. (2020) conducted a thorough analysis of MPSD model performance, especially during extreme driving.
In contrast to MPSD models, the CPCP which is defined as the difference between the maxima and minima of the ionospheric potential (e.g., Boyle et al., 1997) is largely derived from instantaneous observations of ionospheric and/or groundbased quantities. The four most commonly used techniques to estimate the ionospheric CPCP are: (1) polar observations by the Defense Meteorological Satellite Program (e.g., Hairston and Heelis, 1996), (2) the polar cap index (e.g., Troshichev et al., 1996), (3) measurements by the Super Dual Auroral Radar Network (SuperDARN; e.g., Khachikjan et al., 2008), and (4) the Assimilative Mapping of Ionospheric Electrodynamics (AMIE) technique (e.g., Ridley and Kihn, 2004). An extensive comparison of the general features, advantages, and limitations of these datasets could be found in the work by Gao (2012).
With the advent of physics-driven space weather prediction over the last couple of decades, validation of global firstprinciples-based models has become a common exercise in the space science community to identify and improve on our physical understanding of the near-Earth system (e.g., Pulkkinen et al., 2011Pulkkinen et al., , 2013Rastätter et al., 2011). Compared to other space weather indices and/or space-based plasma quantities, fewer studies have compared the performance of MPSD and CPCP values from global models until recently (Mukhopadhyay et al., 2018Burleigh et al., 2019;Collado-Vega et al., 2019). This is partly because, contrary to space weather indices (e.g., Glocer et al., 2013) and most other space weather quantities like FACs (e.g., Anderson et al., 2017) or B (e.g., Welling et al., 2017), both MPSD and CPCP are measured by multiple methods and datasets. This means that a metric analysis of these quantities modeled after the GEM Challenges, which compared globallymodeled results against singular observational datasets, will not yield meaningful results.
In this study, an attempt to quantitatively compare globally-simulated MPSD and CPCP against multiple observationally-derived datasets has been undertaken. Three global magnetospheric models -the Space Weather Modeling Framework (SWMF), the Lyon-Fedder-Mobarry (LFM) model, and the Open General Geospace Circulation Model (OpenGGCM) have been simulated through the NASA Community Coordinated Modeling Center (CCMC) website for seven space weather events. The global results are compared against six empirical MPSD models and two CPCP datasets. The performance analysis conducted in Pulkkinen et al. (2011), Rastätter et al. (2011), and Honkonen et al. (2013, one of the few validation studies to have compared MPSD and CPCP against the Lin et al. (2010) model and SuperDARN respectively, were used as a basis to select events and construct a metric performance analysis. However, to better serve the primary aim of the study, a new metric, Exclusion Parameter in addition to modified versions of the Root-Mean-Square Error and Maximum Amplitude Ratio has been used to dissociate physicsdriven deficiencies in each model that impact the prediction of MPSD and CPCP. Results indicate global models to be overpredicting CPCP, while reasonably estimating MPSD values.

Global Models and Event Selection
Three global models have been compared in this study -(1) SWMF, (2) LFM model, and (3) OpenGGCM. The SWMF is a true framework containing a number of physics-based models (Tóth et al., 2005(Tóth et al., , 2012 and is operationally used in space weather prediction (e.g., Cash et al., 2018). It employs the BATS-R-US model (Powell et al., 1999) to simulate the global magnetospheric domain using conservative MHD equations. BATS-R-US is dynamically coupled to an inner magnetospheric model like Rice Convection Model (Wolf et al., 1982) which provides realistic ring current pressure and density Glocer et al., 2016;Welling et al., 2018). The global and inner magnetospheric components are connected to the Ridley Ionosphere Model (RIM) which solves for the ionospheric electrodynamics using a prescribed empirical conductance model Mukhopadhyay et al., 2020).
The LFM model (Merkine et al., 2003;Lyon et al., 2004;Merkin et al., 2005a,b) is another global model that is actively used throughout the space science community. The MHD component employs a 3D stretched spherical grid to solve for semi-conservative MHD equations in the magnetospheric domain, which is then coupled with a magnetosphere-ionosphere coupler/solver (MIX). MIX solves for the ionospheric electric potential using a semi-empirical auroral conductance module that is driven using MHD inputs (Fedder et al., 1995;Wiltberger et al., 2001). Although the model is capable of additional coupling to an inner magnetospheric module (Pembroke et al., 2012), this coupling is not yet fully available on the CCMC website, and, therefore, was not utilized in the simulations conducted for this study.
OpenGGCM (Raeder et al., 2001(Raeder et al., , 2008) employs a nonuniform static Cartesian grid to solve the semi-conservative resistive MHD equations in the GSE coordinate system. It is coupled with the Coupled Thermosphere-Ionosphere Model (CTIM; e.g., Connor et al., 2016) to solve for the ionospheric potential using both first-principle based and empirical methods. OpenGGCM provides auroral precipitation and ionospheric FACs to CTIM, and receives the potential as an inner boundary condition. In spite of its capability (Cramer et al., 2017), like LFM, there is no coupled inner magnetospheric model for OpenGGCM available through the CCMC website, and therefore only OpenGGCM with coupled CTIM was used in this study.
Seven geospace events, listed in Table 1, were chosen for the study. The selected events vary in strength and magnetospheric structure as indicated by the minimum Dst and maximum AE reached during the course of each event. Each event has been studied at least once in previous work. (Miyoshi et al., 2006;Yermolaev et al., 2008;Pulkkinen et al., 2011;Honkonen et al., 2013). All global models have been executed through the CCMC website (http://ccmc.gsfc.nasa.gov/) and receive as input the solar wind value at L1. The ionospheric CPCP of the MHD models, made available as DPhi on the CCMC website, was used. The features and settings of the global models were kept as similar to each other as possible. All models were run with solar wind parameters provided by ACE and/or WIND, depending on availability. The simulation results have been listed in the dataset provided with this manuscript, and have been made available through the CCMC website using the CCMC-assigned run names.

Magnetopause Standoff Distance Models
All magnetopause models used in this study have been listed in Table 2 along with a summary of their fitting details with the solar wind. A total of six empirical MPSD models were chosen for validation, and driven using the same solar inputs used to drive the global models. The results of Lin et al. (2010) were primarily used to select the list of empirical models. In order to better evaluate MPSD models, Lin et al. (2010) used the standard deviation σ (d) to compare their model's performance with existing models against 246 satellite crossings of the magnetopause with 5 min average solar wind parameters (see Table 10 in Lin et al., 2010). The present study has included only those empirical models that predicted with a standard deviation lesser than ∼1. In addition to the above, a later model developed by Liu et al. (2015) has also been used.

Cross Polar Cap Potential Models
Observations from SuperDARN and assimilated results from AMIE have been used to derive CPCP for this study. SuperDARN is a network of radars that measures line-of-sight ionospheric convection velocities with a ground-based network of radars and then infers functional forms of the electrostatic potential, as a function of the colatitude and longitude (Ruohoniemi and Baker, 1998). For more detail on SuperDARN's estimation technique of the CPCP, please refer to Khachikjan et al. (2008). AMIE assimilates many types of data from both ground-based and space-based instruments and produces estimates of several ionospheric parameters including the potential in the polar cap (Richmond and Kamide, 1988). In the version used in this study (Kihn and Ridley, 2005), only ground magnetometer data have been used to predict the potential.

Performance Metrics
To undertake this comparative analysis, we have used the following three performance metrics: (1) Root-Mean-Square Error (RMSE), (2) Maximum Amplitude Ratio (MAR), and (3) Exclusion Parameter (EP). RMSE and MAR have been defined similarly to the metrics defined in Pulkkinen et al. (2011) and Honkonen et al. (2013), in order to quantify the error in the simulated results. The metric EP has been introduced specifically for this study in order to better quantify modelmodel comparisons. In the following, results from the empirical magnetopause models and ionospheric results from AMIE and SuperDARN have been interchangeably termed predicted observations or simply observations, to distinguish from results from the global models. RMSE is a popular fit metric used to quantify the difference between predictions and observations, with a value of 0 indicating perfect performance. RMSE is defined as where x obs and x mod are the observed and the modeled results, respectively, < ... > indicates the arithmetic mean taken over i ranging over N time steps. Throughout this work i corresponds to the time series over individual events, with N indicating the total number of time steps in a given event(s). Because RMSE takes the square of the numbers involved, the values cannot be negative.
The second metric, MAR is defined as the ratio of the maximum amplitudes: where i, x obs , and x mod stand for the same variables as in Equation (1). Clearly, MAR = 1 indicates perfect model performance, while MAR > 1 and MAR < 1 indicates over-and underestimation. This is especially useful in analyzing quantities like MPSD, where it is critical to understand whether the peak value of globally-modeled MPSD is overpredicted or underpredicted when compared against empirically-modeled MPSD which provides useful insight regarding the physical morphology of the magnetosphere, especially during storm-time magnetospheric compression. The third metric EP has been used to quantify times when simulated results lay outside the range of observationallyderived estimates (including their standard deviations), and if during such times the simulated results overestimated or underestimated the values. This is an important aspect to study as this investigation is comparing modeled results against multiple observational and modeled datasets, and it is highly unlikely that the observationally-derived estimates will match with each other. Any and every prediction of the MHD-modeled data that is "excluded" from the observational range (outside the range of observed values) has been characterized as an incorrect prediction, and therefore counted as an exclusion. Mathematically, this could be defined as Here, i, x obs , and x mod are the same as the previous equations, while σ obs is the standard deviation of the observed data, and (max, min) signify the maxima and minima of observed values at timestep i. Using the above relation, EP identifies the number of times when the model is outside the set limits of the observed values, and measures if the exclusivity is due to underprediction or overprediction of values at each time step using the following relation: At the end of calculations, the total number of "excluded" time steps as a fraction of the total number of time steps defines the total EP underprediction and overprediction as a percentage value, such that the addition of the total underpredicted and overpredicted fractions results in the EP: where EP event is the total EP as a fraction of the total number of time steps, N. Note that the under-and over-prediction percentages are as a fraction of the total event time and not of the total wrongly predicted times. For example, a model with an EP value of 50% with a total under-prediction percentage of 10% for a given event indicates that the model results lie outside the observation thresholds 50% of all times during the event but under-predict only 10% of the total time, further signifying that 40% of the total time the model results are over-predicted. This parameter was specifically introduced to understand variations in the both the MP standoff distance and CPCP values, as the observations/empirical-derived quantities themselves vary at a given time step. Further discussion about this parameter's usage is described in sections 3 and 4. Figure 1 displays a composite image of the performance quantification of model-predicted magnetopause standoff distances against predicted observations using the empirical models. In part (a), a time series comparison of the magnetopause distance for the August 31, 2001 event has been shown. Results from the global models displayed using the solid lines are plotted against a gray band of values encompassing the individual time-series of all 6 empirical magnetopause models. The black solid line passing through the middle of the gray band is the median value of the empirically modeled results. In part (b), the aggregate RMSE (top subplot i), MAR (middle subplot ii), and EP (bottom subplot iii) have been computed for each event. In order to compute each metric, the time series data simulated by the global models were compared against the median value of the observationally-derived estimates. LFM magnetopause distances exhibit the lowest RMSE for each event, with 6 out of 7 events having a RMSE value 1 R E . OpenGGCM has the highest RMSE values with 5 out the 7 events have RMSE values greater than  The EP values are model-wise re-plotted in part (c) of the figure, but the area under the curve is colored by the proportion of underprediction and overprediction. Since underprediction and overprediction of the EP is calculated as a fraction of the total time series, the total EP for any given model could be defined as the addition of the underpredicted fraction and the overpredicted fraction. As shown in part (c-i), SWMF mostly overpredicts the magnetopause distance during all events except Event 7. It also has a significant underprediction fraction during Events 4, 5, and 6, which along with Event 7 correspond with some of the strongest events being studied in this report. In contrast to SWMF results, both LFM and OpenGGCM predominantly underpredict during almost all events when outside of empirically-predicted range of values. The only exception to this is OpenGGCM's EP values during the Halloween Storm of 2003 where the overprediction fraction are greater than the underprediction. Figure 2 describes the comparison of CPCP values estimated by global models and compared against AMIE and SuperDARN measurements. A similar format to Figure 1 is followed for consistency. In part (a), a time series comparison of the CPCP for the December 14, 2006 event has been shown comparing MHD-modeled results against the band of values observed by SuperDARN and predicted by AMIE. In part (b-i), while the aggregate RMSE values for each model are within 100 kV, eventwise performance varies -SWMF exhibits the lowest median RMSE value of 24 kV, with the RMSE value being <50 kV for all events except Event 4. LFM follows a similar pattern as SWMF, but displays comparatively higher RMSE values for Events 6 and 7. OpenGGCM exhibits RMSE values greater than 100 kV for Events 2, 4, and 6. The simulations of the Halloween Storm of 2003 (Event 4) lead to the highest errors for CPCP. In part (b-ii), the MAR values of all models are much higher when compared to magneopause MAR values. All three models follow a similar trend for all events, except OpenGGCM during Event 2 and 6 when it exhibits a MAR value greater than 4 times the observed median values for those events. LFM exhibits a median MAR value of 2.05 while SWMF has the closest MAR value to unity of 0.995. In part (b-iii), all models exhibit an EP value >50% for all events except Event 4 and 7. OpenGGCM has the highest median EP value at 98.7%, with 4 out of 7 events being 100% out of range. LFM shows a median EP of 78.6%, while SWMF exhibits the lowest median EP value of 72%. The EP values replotted in part (c) show that LFM (part ii) and OpenGGCM (part iii) largely overpredict the CPCP when outside the range of observed values. While SWMF largely underpredicts the CPCP during Events 1, 2, 3, and 4, CPCP during the remaining events was mainly overpredicted.

DISCUSSION
Because modeled MPSD and CPCP were compared against multiple datasets, the lone usage of error metrics like RMSE is not enough to meaningfully rank model performance (Liemohn et al., 2021) as has often been done before (e.g., Pulkkinen et al., 2011). Because there is no single right answer, a significant aim of this study has been to develop innovative metrics to better quantify the performance of global models against multiple, divergent observationally-derived estimates. For example, CPCP values from SuperDARN and AMIE are at significant odds with each other during stronger events as evidenced by Figure 2A. To counter this problem, MAR and EP are used which allow us to identify whether a global model overpredicts or underpredicts; this does not give us a quantitative error value, but is able to create a blanket range of values within which a modeled result could be considered reasonable. While the usage of better metrics (e.g., Haiducek et al., 2017;Morley et al., 2018) would be strongly considered for future studies involving CPCP and MPSD validation, the rudimentary metric analysis in this study has been used to understand the differences in each model's performance and discuss future directions toward improvements.
In the performance analysis of MPSD, the metrics indicate reasonable performance during weaker events. For instance, some of the lowest EP values are exhibited by all three models during Events 3 and 5, which have the lowest AE. LFM and OpenGGCM tend to underpredict the MP standoff distance, as indicated in part (c) of Figure 1. This is probably due to the absence of an inner magnetospheric module to provide realistic ring current pressure values. SWMF, which uses RCM to provide a much stronger ring current input, tends to overpredict the MP standoff distance. This is in agreement with the study by Samsonov et al. (2016) which found that accounting for a realistic ring current in global MHD brings values closer to the empirical MP models. However, as shown in Staples et al. (2020), the validity of MP standoff distances as estimated by the empirical models during extreme events is questionable. Since the study does not employ direct comparisons with satellite crossings, a future extension of this work would compare modeled results directly against in-situ measurements from satellites like Cluster, THEMIS, MMS, or Geotail (e.g., Angelopoulos et al., 2009;Lin et al., 2010;Burch and Phan, 2016;Collado-Vega et al., 2019).
The CPCP metric analysis indicates that ionospheric potential predicted by the global models are greater than the expected value sometimes by more than a factor of 8. This tendency of global models to overpredict the CPCP could be driven by field aligned current generation in the global MHD domains and/or the ionospheric conductance value, as all models use a similar numerical framework to apply Ohm's Law (Goodman, 1995). Since FAC strength and pattern is an aspect of MHD grid resolution (Ridley et al., 2010;Wiltberger et al., 2017;Welling et al., 2019;Mukhopadhyay et al., 2020), incorrect estimation of the ionospheric conductance, especially in the polar (auroral) region, should play a significant role in the overprediction of CPCP. Since each global model uses different techniques to estimate this quantity (SWMF uses an empirical conductance model, while LFM and OpenGGCM use a semiempirical physics-driven conductance system), it is challenging to suggest a universal solution. In addition, dependencies in techniques like AMIE on empirical relationships (e.g., Ahn et al., 1998) to derive ionospheric electrodynamics results in an independent challenge of establishing a global truth value for the ionospheric conductance. Recent advancements in addressing these issues through the ongoing Ionospheric Conductance Challenge was reported by Öztürk et al. (2020). Furthermore, significant deviations between AMIE and SuperDARN values, especially during the Halloween Storm (Event 4) and the December 2006 event (Event 7) indicate that a performance evaluation of CPCP measurement during extreme driving is necessary. Binning of CPCP metrics by geomagnetic indices like AE and Sym-H would be a future focus of this study, which could provide a quantitative measure of performance across activity thresholds, similar to Welling et al. (2017). Gao (2012) has discussed the disadvantages of using SuperDARN, which under-predicts, and AMIE, which over-predicts, leading to sharp deviation in CPCP predictions. Future studies should consider using a tertiary source of data (like DMSP or PC Index) or a different quantity (e.g., hemispheric power index) to evaluate ionospheric performance. Furthermore, conducting metric validation on ionospheric drivers of CPCP, like electric fields and ion drift velocities, that are available from instruments like DMSP SSIES (Kihn et al., 2006), should be considered.

CONCLUSIONS
The present study aimed at evaluating global models' prediction of MPSD and CPCP against multiple robust observationallyderived datasets. The study used well-documented space weather events simulated using three different global MHD models through the CCMC Run-on-Request feature. The MPSD from these model results were compared against empirical magnetopause models, while the predicted ionospheric polar cap potential values were compared against those obtained from SuperDARN and AMIE. Three performance metrics-RMSE, MAR, and EP-were used to quantify the predictions. While the models performed reasonably well during times of relatively weak geomagnetic activity, it was found that extreme events lead to increased errors and a tendency to overpredict the ionospheric potential. While inclusion of a ring current model in a global simulation leads to lesser underprediction of the MPSD during extreme driving, the study does not find that such an approach necessarily leads to reduced errors. Furthermore, the use of empirical models to predict MPSD, and statistics-based datasets to predict CPCP, may lead to incorrect evaluations during extreme events. Future studies should consider applying improved metrics to further evaluate these parameters.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: DeepBlue Repository: Mukhopadhyay (2020).

AUTHOR CONTRIBUTIONS
We use the CRediT (Contributor Roles Taxonomy) categories (Brand et al., 2015) for providing the following contribution description. AM led the conceptualization, designed the methodology, conducted the investigation, performed data visualization and formal analysis, and wrote the original draft. XJ provided resources and supervised the initial conceptualization and methodology design. DW and ML assisted in conceptualization and formal analysis, provided the resources, funding acquisition, supervision, and aided in project administration. All authors have contributed toward the revision and editing of the manuscript.

FUNDING
This research was funded by NASA grants: NNX12AQ40G, 80NSSC18K1120, 80NSSC17K0015, NNX17AB87G, and NSF grant 1663770 -AWD004525. Partial funding for travel was also received by AM through the CCMC Student Research Contest held in 2017.

ACKNOWLEDGMENTS
All model result data, input files and observation data are available via [DeepBlue Link] and through the Community Coordinated Modeling Center website (http://ccmc.gsfc.nasa.gov/), and the Virtual Model Repository website (http://vmr.engin.umich.edu/). This study would not have been possible without the support of the staff at the Community Coordinated Modeling Center, which is funded by the National Science Foundation, National Aeronautical and Space Administration, the Air Force Office of Scientific Research, and others. AM would also like to thank organizers of the CCMC Student Research Contest 2017 for their generous funding and support. The authors would also like to thank Mr. Shibaji Chakrabarty and Ms. Garima Malhotra who kindly provided us data from the SuperDARN system and University of Michigan's AMIE data repository. VMR is maintained by Dr. Aaron Ridley at the University of Michigan. The authors would also like to thank Mr. Brain Swiger for his valuable comments on the draft manuscript, and support with the manuscript submission.