AUTHOR=Liu Siwei , Wang Jingjing , Li Ming , Cui Yanmei , Guo Juan , Shi Yurong , Luo Bingxian , Liu Siqing TITLE=A selective up-sampling method applied upon unbalanced data for flare prediction: potential to improve model performance JOURNAL=Frontiers in Astronomy and Space Sciences VOLUME=Volume 10 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/astronomy-and-space-sciences/articles/10.3389/fspas.2023.1082694 DOI=10.3389/fspas.2023.1082694 ISSN=2296-987X ABSTRACT=The SHARP parameters have been widely used to develop flare prediction models. The relatively small number of strong-flare events lead to an unbalanced dataset that prediction models can be sensitive to the unbalanced data and might lead to bias and limited performance. In this study, we adopted the logistic regression algorithm to develop a flare prediction model for the next 48 hours based on the SHARP parameters. The model was trained with five different inputs. The first is the original unbalanced dataset; the second and third were obtained by two widely-used sampling methods from the original dataset; the fourth is the original dataset but accompanied by a weighted classifier. Based on the distribution properties of strong-flare occurrence related to SHARP parameters, we established a new selective up-sampling method, and applied it to the mixed-up region (referring the confusing distribution areas consisting of both the strong-flare events and non-strong-flare events) to pick up the flare-related samples and add small random values to them, and finally create a large number of flare-related samples that are very close to the ground truth. And thus we obtained the fifth balanced dataset aiming to: 1) promoting the forecast capability in mixed-up region; 2) increasing robustness of the model. We compared the model performance and found that the selective up-sampling method has potential to improve the model performance in strong-flare prediction that its F1 score reaches $0.5501\pm0.1200$, which is approximately $22\%-33\%$ higher than other imbalance mitigation schemes.