Pulsating Aurora Database and Automated Algorithm for Detecting them from Ground-Based Optical Imaging

Pandya, Megha; Hui, James; Michell, Robert; Samara, Marilia; Halford, Alexa (She/Her/Hers)  Jean; Lee, Sang-Yun; Le, Guan; Mirizio, Emma; Fok, Mei-Ching

doi:10.3389/fspas.2026.1788935

DATA REPORT article

Front. Astron. Space Sci.

Sec. Space Physics

This article is part of the Research TopicPredicting Near-Earth Space Environment: New Perspective and Capabilities in the AI AgeView all 6 articles

Pulsating Aurora Database and Automated Algorithm for Detecting them from Ground-Based Optical Imaging

Provisionally accepted

Megha Pandya^1,2*

James Hui³

Robert Michell¹

Marilia Samara¹

Alexa (She/Her/Hers) Jean Halford¹

Sang-Yun Lee^1,2

Guan Le¹

Emma Mirizio^1,4

Mei-Ching Fok¹

¹Heliophysics Science Division, NASA Goddard Space Flight Center, Greenbelt, United States
²The Catholic University of America, Washington, United States
³River Hill High, Clarksville, United States
⁴University of Maryland, College Park, United States

The final, formatted version of the article will be published soon.

Pulsating aurora (PA) represents a distinct class of diffuse auroral emissions observed in the polar ionosphere, appearing as quasiperiodic intensity modulations or intermittently brightening patches. These structures typically occur at 100 km altitude have horizontal scale size ranging from 10 to 200 km (Hosokawa & Ogawa, 2015;McEwen et al., 1981;Yukitoshi et al., 2020) and exhibit repetition periods ranging from a few seconds to several tens of seconds (Yamamoto, 1988). PAs are most commonly found in the post-midnight sector to dawn sector (Grono & Donovan, 2020;Jones et al., 2011;Partamies et al., 2022;Royrvik & Davis, 1977) usually near the equatorward edge of the auroral oval and are most often associated with the recovery phase of substorms (Bland et al., 2020;Tesema et al., 2020;Tsuchiya et al., 2018). PAs are spatially patchy, often drift slowly in the ionosphere, and exhibit complex temporal modulation patterns including on-off pulsations, drifting patches, and internal substructures (Grono & Donovan, 2020;Royrvik & Davis, 1977;Yukitoshi et al., 2020). PAs arises from quasiperiodic precipitation of magnetospheric electrons with characteristic energies of tens of keV. These electrons are modulated prior to entering the upper atmosphere (Johnstone, 1978;Samara et al., 2010;Sandahl et al., 1980). PAs are widely interpreted as the ionospheric manifestation of pitch-angle scattering of tens-of-keV electrons in the magnetosphere by very-low-frequency (VLF) and whistler-mode chorus waves (e.g., Jaynes et al., 2015;Kasahara et al., 2018;Nishimura et al., 2010). In this study, a pulsating aurora is defined as a spatially localized auroral emission that exhibits quasi-periodic, discrete intensity enhancements in time, identifiable as significant peaks relative to the local background intensity. From an algorithmic perspective, a pulsating aurora event is characterized by (i) a clear intensity pulse exceeding a prescribed prominence threshold, (ii) a finite pulse duration consistent with reported pulsation timescales (seconds to tens of seconds), and (iii) temporal separation from adjacent pulses that distinguishes individual pulsations from steady or slowly varying diffuse aurora.Historically, the identification and classification of PAs have relied heavily on manual inspection of: (i) ground-based all-sky imager data: (e.g. Shiokawa et al., 2010, Yang et al., 2015). (ii) Satellite optical data: (e.g. Klimov et al., 2022, Siren, 1975) (iii) Energetic particle measurements: (e.g. Miyoshi et al., 2015, Kasahara et al., 2018). While manual inspection-based identification of pulsating aurora is often accurate, it is inherently subjective and does not scale to modern data volumes. However, auroral observation networks such as Time History of Events and Macroscale Interactions during Substorms (THEMIS) ground-based All-Sky Imager (ASI) array, Magnetometers, Ionospheric Radars, All-sky Cameras Large Experiment (MIRACLE), and Red-line Emission Geospace Observatory (REGO) now generate terabyte-scale datasets spanning multiple years and both hemispheres. For example, the THEMIS all-sky imager network alone has accumulated tens to hundreds of millions of images, making manual inspection impractical. As a result, manual identification is time-consuming, observer-dependent, and susceptible to bias arising from individual experience and selection criteria. These limitations have driven the development of automated and semi-automated detection techniques, including image thresholding, frequency-domain analyses, machine-learning-based classifiers, and morphological feature-tracking methods (e.g., Clausen & Nickisch, 2018;Grono et al., 2017;Kaeppler et al., 2023;Kvammen et al., 2020;Nanjo et al., 2022;Rao et al., 2014;Syrjäsuo & Donovan, 2002;Tesema et al., 2020;Yamauchi & Brändström, 2023;Zhong et al., 2020). Despite these advances, no single detection framework has yet emerged as a broadly applicable standard across different datasets or observation platforms.Many traditional threshold-based methods are highly sensitive to imager-specific characteristics such as camera gain, exposure, and background illumination conditions, which limits their transferability across imaging systems. Machine-learning-based approaches, while powerful, typically require large, carefully labeled training datasets and often behave as black-box classifiers, making it difficult to interpret failure modes or adapt the models to new imager without retraining. Several methods also rely on fixed spatial or temporal scales, which can lead to missed detections or false positives when auroral dynamics deviate from assumed scales. Moreover, a new approach is particularly required for the Alaska imager used in this study, for which a dedicated pulsating aurora detection framework has not yet been established. Thus in this study, we develop and present an open-source Python software package for the automated, time-resolved detection of PAs using ground-based optical imager observations. The scripts are designed to automatically load raw imager TIFF files and integrate spatial segmentation of auroral images to produce standardized, timeresolved pulsation products that are well suited for large-scale statistical analysis. Rather than focusing on individual events, the framework is optimized for survey-style analysis, enabling users to move from raw image sequences to curated lists of pulsation intervals. The paper is organized as follows. Section 2 details dataset overview. Section 3 describes the datasets and pre-processing methods and details of the automated detection algorithm. Section 4 presents visual overview of example events, statistical results and dataset value. Section 5 is summary. Dataset Overview The dataset analyzed in this study consists of ground-based auroral optical observations acquired using an Electron Multiplying Charge-Coupled Device (EMCCD) imager operated at Poker Flat, AK (geographic: 65.1°N, 147.4°W; geomagnetic: 65.7°N, 96.6°W; L = 5.9). The observations cover the two-months interval from 1 December 2013 to 31 January 2014, a period characterized by frequent auroral activity during the northern winter season. The EMCCD detector provides high quantum efficiency and low read noise through on-chip electron multiplication, enabling the capture of faint auroral emissions under low-light conditions. The imager recorded continuous 16-bit grayscale image sequences at a fixed exposure time of ~16 ms, operating at 56 frames per second with a narrow 4° field of view and no optical filter. More details on EMCCDs are available at Michell et al., 2014;Michell & Samara, 2015. The raw data are stored as 16-bit TIFF (*.tif) files, with each file containing a sequence of auroral images acquired at a cadence determined by the camera exposure settings. For each hour of observations, the data are segmented into four consecutive TIFF files, each approximately 20 minutes in duration. Each TIFF file contains a two-dimensional array with dimensions (y, x), where individual frames are 128 × 128 pixels after binning. These log files are used to establish the absolute timing of each image frame and to associate image sequences with their corresponding observation intervals. Timing information from the log files enables conversion between frame indices and physical time, which is essential for identifying pulsation durations and temporal spacing. All analyses were performed using Python 3.x. The pipeline relies on widely used, open-source scientific libraries, including NumPy for numerical operations, SciPy for signal processing i.e. Savitzky-Golay filtering (Schafer, 2011) and peak detection, OpenCV for image handling, and Matplotlib for visualization. Parallel processing was implemented using Python's standard multiprocessing utilities to accelerate grid-wise time-series extraction. File system operations and metadata handling were performed using standard Python libraries (os, sys, datetime). All codes are included as .py files on GitHub. The overall execution of the analysis is controlled by the script master_analysis.py, which serves as the main driver of the framework. This script defines the temporal range of the analysis, manages the input and output directories, and iterates over individual days and hours of observations. By editing master_analysis.py, users specify key criteria such as frame rate (FPS = 56.7), pixel grid size (PIXEL_SIZE = 6), grid dimensions (6 × 6), prominence thresholds, maximum pulsation duration, duplicate-handling logic, along with the location of the EMCCD TIFF and log files, and the desired output paths. It also controls whether individual pulsation plots or time-series plots are generated and aggregates hourly results into a Pulsating_Aurora_event_list.txt attached on zenodo. For each confirmed event, standardized metadata includes the event start and end times (green dashed lines in Figure 1b,d,f), pulsation duration (purple dashed lines), peak intensity value (red cross), contributing spatial regions, and associated source file and frame identifiers. The core detection algorithm is implemented in detection.py, which is called by the main driver script (i.e. master_analysis.py) for each observation interval. This script implements the primary signalprocessing steps: Peak detection was based on a Savitzky-Golay-filtered signal, with the primary identification criterion being a minimum peak prominence relative to the local background. A normal prominence threshold of 80 digital counts was used for standard conditions, while a reduced prominence threshold of 30 digital counts was applied in cases where the background intensity was elevated. To avoid false detections due to diffuse auroral emissions, peaks with baseline intensities exceeding a high-base threshold of 900 digital counts were excluded. The maximum allowed pulsation duration was constrained to 90 s, and a minimum full width at half-maximum (FWHM) to prominence ratio of 0.15 was required to ensure temporal coherence consistent with pulsating aurora behavior. It also contains the de-duplication algorithm that merges spatially and temporally overlapping detections. For each observation hour, this module sequentially loads the four corresponding TIFF files and concatenates them in time to create a continuous image sequence.Frames are indexed using the timing information extracted from the accompanying log file, enabling accurate conversion between frame number and UTC time. Intensity time series for each grid cell are constructed in parallel to improve computational efficiency. No image projection or background subtraction is applied at this stage. It then applies spatial segmentation to the imager field of view and constructs region-specific intensity time series that form the basis for pulsation detection. Supporting analytical routines are provided by analyzer.py, which contains utility functions for spatial grid generation, intensity extraction, and time-series handling. To capture the spatially localized nature of pulsating aurora, each image frame is divided into a 6×6 grid of small, evenly distributed regions across the field of view. The regions are indexed from region 0 to region 35, numbered horizontally from left to right, starting at the top-left corner as shown in Figure S1 in the Supplementary Material. For a detailed description of the procedure used to reject outlier pixels prior to computing the mean intensity for each frame-including background stars, transient moving objects such as meteors or satellites, and sporadic instrumental artifacts that do not represent true auroral emissions-please refer to Figure S1 in the Supplementary Material. Candidate pulsating aurora intervals are identified through time-domain analysis routines implemented in detection.py.The detection approach evaluates temporal characteristics such as intensity enhancement, pulse duration, repetition, and temporal separation, without imposing strict frequency-domain constraints.To account for detections occurring across multiple spatial regions or overlapping time intervals, the framework applies an event consolidation procedure within the detection module. Redundant detections are merged into unique pulsation events based on temporal and spatial proximity. Diagnostic visualization and manual validation are supported by detection_viewer.py, which generates time-series plots and annotated image sequences corresponding to detected pulsation intervals. These diagnostic products are used to visually verify automated detections and assess detection quality during both development and application of the framework. A schematic flowchart detailing the steps of the algorithm is included as Figure S2 in the Supplementary Material. The detection framework relies on a limited set of empirically selected parameters, including the peak-prominence threshold used for pulse identification, the high-base-intensity cutoff applied to exclude diffuse aurora, the maximum allowed pulsation duration, the minimum ratio of full-width at half-maximum (FWHM) to peak prominence, and the spatial grid resolution used for time-series extraction. To assess the robustness of the method, we performed sensitivity tests by varying each parameter independently within physically reasonable ranges while holding the others fixed. We find that the overall occurrence statistics, spatial distribution, and characteristic pulsation timescales remain stable under moderate parameter variations, with changes primarily affecting the total number of detected pulses rather than their qualitative properties.Among these parameters, the absolute intensity-based thresholds (e.g., peak prominence and baseintensity cutoff) are instrument-and dataset-dependent, as they are influenced by camera sensitivity, background noise characteristics, and observing conditions. In contrast, the temporal criteria-such as the allowable pulsation duration and the FWHM-to-prominence ratio-are tied to the intrinsic temporal morphology of pulsating aurora and are therefore expected to be broadly applicable across different all-sky imaging systems. Similarly, the spatial grid resolution may be adjusted to match the native resolution of a given imager without altering the underlying detection logic. These considerations indicate that while certain threshold values must be tuned for individual instruments, the core detection methodology is robust and transferable. Figure 1 shows three examples of pulsating aurora events detected by the automated algorithm. Panel (a) shows a two-dimensional all-sky imager frame at the indicated UT time. The overlaid boxes represent the predefined analysis regions used to compute average intensities. Red boxes denote regions without pulsating aurora or without peak pulsation activity, while the white box highlights the region in which the algorithm identifies the maximum pulsating auroral intensity across the entire imager field of view. In this example, a bright red patch corresponding to pulsating aurora is detected, with its maximum intensity occurring in region 15 (white box). Panel (b) shows the corresponding grid-averaged intensity-versus-time profile for this region and illustrates the key pulsation parameters identified by the algorithm. Panel (c) presents a second example of pulsating aurora characterized by a streak-like feature across the imager field of view. This feature is detected in region 9 (white box), and the corresponding intensity-versus-time plot in panel (d) shows a sharp, short-duration pulse. Panel (e) shows a third and particularly interesting example. Although a bright auroral patch appears near the center of the imager frame and represents the brightest feature in the scene, it is relatively steady and does not exhibit pulsations. In contrast, a weaker pulsating aurora is detected in region 8 (white box). The algorithm successfully identifies this lower-intensity pulsation despite the presence of a brighter, non-pulsating feature in the same frame, demonstrating the robustness of the region-based detection approach. The corresponding intensity-versus-time profile for this event is shown in panel (f). The video of these interesting pulsating aurora events is attached on zenodo. Figure 1: Examples of detected pulsation events and corresponding time-intensity signatures. Panels (a), (c), and (e) show two-dimensional all sky image frames at the indicated UT times when PAs was at its peak. Red boxes mark regions without pulsating aurora or peak pulsation activity, while the white box denotes the region where the algorithm identifies the peak pulsating aurora across the entire imager field of view. Panels (b), (d) and (f) show the corresponding average intensity time series for the white boxes (Regions 15 and 9, respectively), where pulsating aurora is detected.Figure 2 presents the histogram of the daily number of detected pulsating aurora events over the analysis interval. In total, 10,567 pulsating aurora events were identified by our automated algorithm.The missing dates on the histogram correspond to days when no pulsations were observed. The event counts vary by more than three orders of magnitude, with particularly high occurrence rates on 25 December 2013 and several days in early January 2014. To clearly display both high-and lowactivity days, the y-axis is shown on a logarithmic scale. Days with low event counts, such as 30-31 December 2013 and 13 January 2014, are still clearly resolved, highlighting the strong day-to-day variability in pulsating aurora occurrence during this period. A detailed scientific analysis of these events will be presented in a forthcoming paper. In the present study, we release the complete list of detected events. Figure 2: Histogram of the daily count of detected pulsating aurora events. Missing dates indicate days on which no pulsations were observed.In this work we focused on the relative intensity of the photons to identify PAs and not the absolute values of the count. Even within the narrow field-of-view system, optical distortions arising from lens curvature introduce spatially dependent shifts in intensity and localization, affecting the precise identification of peak auroral emissions across the image, including within the central approximately 4-degree field of view. These distortions become increasingly pronounced toward the edges of the imager and influence both the measured intensity and the inferred spatial structure of PAs. For this reason, the current detection framework applies region-dependent thresholds when dividing the imager observations into multiple spatial grids, accounting for center-to-edge variations in sensitivity and distortion. In future work, we will apply flat-field calibration procedures to correct for center-to-edge sensitivity variations in the auroral imager. This calibration will enable more accurate spatial mapping and improve the robustness of pulsation event characterization across the entire field of view. At present, identical auroral intensities can appear weaker toward the edges of the imager (e.g., an intensity of 1.0 at the center may appear as 0.5 at the edge despite representing the same physical brightness). Flat-field correction will normalize these variations by appropriately scaling edge intensities, thereby producing a uniform response across the full field of view. This uniformity will allow us to apply a single, consistent prominence threshold across the entire imager, rather than adjusting detection parameters spatially, as was required in the current implementation. These corrections will enhance the reliability of the automated detection framework and support its application to long-duration datasets and future auroral imaging campaigns. Adding flat-field corrections will more easily make these algorithms applicable to data from larger FOV imaging systems, such as all-sky cameras, and to filter imager data, where absolute photometry measurements are possible. The pulsating aurora event database presented in this study enables a broad range of investigations spanning the magnetosphere, ionosphere, and upper atmosphere. Progress in understanding PAs has historically been constrained by the absence of large, objectively identified event catalogs. This database addresses that limitation by providing automatically detected, timeresolved PAs intervals derived from ground-based all-sky imager observations. Key scientific applications of this database include: Statistical studies of PAs occurrence rates, spatial distribution, and temporal characteristics as functions of magnetic local time, geomagnetic activity, background auroral conditions, and solar wind driving. (e.g., Lessard, 2012;Jones et al., 2011;Grono & Donovan, 2019).  Coordinated studies combining ground-based optical observations with in situ measurements such as the Van Allen Probes, THEMIS, MMS, and low-Earth-orbiting satellites to investigate the role of whistler-mode chorus waves in modulating electron precipitation and producing quasi-periodic auroral luminosity variations, as well as to examine the energydependent response and spatial localization of precipitating electrons (e.g., Nishimura et al., 2010;Miyoshi et al., 2015Miyoshi et al., , 2020Miyoshi et al., , 2021;;Kasahara et al., 2018;Hosokawa et al., 2020;Tesema et al., 2020).  Investigations of the ionospheric and thermospheric consequences of PAs-related particle precipitation, including enhancements in E-region ionization, changes in ionospheric conductivity, thermospheric heating, and nitric oxide production. (e.g., Miyoshi et al., 2015;Bland et al., 2021;Ivarsen et al., 2025).  Statistically robust observational benchmark for testing and validating numerical models of electron precipitation, wave-particle interactions, and ionospheric responses. Model outputs can be quantitatively compared with large populations of observed PAs events, enabling improved constraints on physical mechanisms and parameterizations used in simulations (e.g., Kasahara et al., 2018;Miyoshi et al., 2020;Kong et al., 2025).The database presented in this study is derived from a single ground-based optical imager located in Alaska and covers a two-month interval during the northern winter. Consequently, the dataset has the limitation to provide a statistically comprehensive representation of pulsating aurora occurrence across seasons, varying geomagnetic conditions associated with solar cycle variability, or extended spatial scales. Overall, continued application and expansion of this PAs database will facilitate systematic exploitation of growing auroral imaging archives and significantly advance understanding of the physical processes governing pulsating aurora and energetic electron precipitation. MP is supported by MMS project at NASA Goddard Space Flight Center through cooperative agreement 80NSSC21M0180 Partnership for Heliophysics and Space Environment Research (PHaSER). JH is supported by Goddard Space Club Scholars Program.

Keywords: Automated algorithm, Ground Based Imaging, pulsating aurora, Python-based, Space weather

Received: 15 Jan 2026; Accepted: 13 Feb 2026.

Copyright: © 2026 Pandya, Hui, Michell, Samara, Halford, Lee, Le, Mirizio and Fok. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Megha Pandya

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.