<?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0">
      <channel xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <title>Frontiers in Big Data | Data Mining and Management section | New and Recent Articles</title>
        <link>https://www.frontiersin.org/journals/big-data/sections/data-mining-and-management</link>
        <description>RSS Feed for Data Mining and Management section in the Frontiers in Big Data journal | New and Recent Articles</description>
        <language>en-us</language>
        <generator>Frontiers Feed Generator,version:1</generator>
        <pubDate>2026-05-02T22:15:07.977+00:00</pubDate>
        <ttl>60</ttl>
        <item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1811110</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1811110</link>
        <title><![CDATA[A reinforcement learning-guided interpretable method for postoperative sepsis prediction with Hilbert-Schmidt Independence Criterion]]></title>
        <pubdate>2026-04-07T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Kunhua Zhong</author><author>Han Chen</author><author>Qilong Sun</author><author>Peng Wang</author><author>Zhenbei Liu</author><author>Yuwen Chen</author>
        <description><![CDATA[BackgroundSepsis is a major cause of postoperative morbidity and mortality, and early risk stratification from perioperative electronic health records (EHR) is a representative large-scale, high-dimensional data processing problem that requires models to be accurate, efficient, and clinically interpretable. However, many existing sepsis prediction methods operate as black boxes and rely on extensive temporal monitoring streams, which increases feature dimensionality and computation while limiting transparency.MethodsWe propose a reinforcement learning-guided, interpretable feature engineering framework for postoperative sepsis prediction that targets scalable learning on heterogeneous perioperative data. Within an Actor-Critic formulation, feature selection is treated as an action: an Actor network produces a stochastic feature mask over preoperative static variables and intraoperative statistical summaries, while a Critic network performs downstream prediction using a self-attention-based classifier. To benchmark and stabilize learning, we introduce an auxiliary baseline model that incorporates intraoperative temporal signals extracted by a temporal convolutional network (TCN) and regularized using the Hilbert-Schmidt Independence Criterion (HSIC) to encourage non-redundant representations between statistical and temporal feature views. The Actor is optimized to achieve comparable predictive performance to the baseline while using a reduced feature set, improving computational efficiency and supporting instance-level interpretability.ResultsExperiments on a real-world surgical cohort from Southwest Hospital (2014-2018) demonstrate that the proposed framework attains performance comparable to or better than competitive machine learning baselines while selecting fewer input features. On this dataset, our method achieved perfect scores of 1.00 for F1-score, Sensitivity, and Specificity.ConclusionThe proposed method accurately predicts the occurrence of postoperative sepsis and provides effective instance-level post hoc explanations. These findings offer a novel perspective for postoperative sepsis prediction.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1814157</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1814157</link>
        <title><![CDATA[A disease potential-driven graph attention model for comorbidity risk prediction of hypertension]]></title>
        <pubdate>2026-04-02T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Leming Zhou</author><author>Hanshu Qin</author><author>Yanmei Yang</author><author>Gang Huang</author><author>Zhigang Liu</author>
        <description><![CDATA[Hypertension is associated with an increased risk of serious complications, and the hazards are very serious. However, current methods for predicting comorbidity risks face the challenge that comorbidity prediction relying solely on data driven may lead to clinically implausible associations and reduce model interpretability. Also, how to capture the fusion features of patient and identify differences among them to facilitate risk prediction needs to be addressed. To overcome these challenges, we propose a Disease Potential-Driven Graph Attention (DP-GA) model for comorbidity risk prediction of hypertension, which has 3-fold ideas: (a) Constructing a fusion mechanism for the correlation among the patients' disease features and the structural, thus integrating feature attention and structural attention effectively; (b) Introducing a similarity-difference balance mechanism to further identify the relationships among patients; and (c) Designing a disease potential-driven attention mechanism to calculate the disease potential and construct masks, thus preserving the effective associations from high-risk patients to low-risk patients. Experimental results demonstrate that our proposed DP-GA model achieves a significant improvement in comorbidity risk prediction for patients with hypertension across three comorbidity datasets collected by the research group, compared with both the baseline and state-of-the-art peer methods. We also analyze the comorbidity network to predict the risk of hypertension comorbidity, thereby improving interpretability and early prediction of such comorbidities.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1779935</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1779935</link>
        <title><![CDATA[GFTrans: an on-the-fly static analysis framework for code performance profiling]]></title>
        <pubdate>2026-02-27T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Jie Li</author><author>Yunbao Wen</author><author>Jingxin Liu</author><author>Biqing Zeng</author><author>Seyedali Mirjalili</author>
        <description><![CDATA[Improving software efficiency is crucial for maintenance, but pinpointing runtime bottlenecks becomes increasingly difficult as systems expand. Traditional dynamic profiling tools require full build-execution cycles, creating significant latency that impedes agile development. To address this, we introduce GFTrans, a static analysis framework that predicts c program performance without execution. GFTrans utilizes a Transformer architecture with a novel “anchor-based embedding” technique to integrate control flow and data dependencies into a unified sequence. Additionally, a dynamic gating mechanism fuses these semantic representations with 16 handcrafted statistical features to comprehensively capture code complexity. Evaluated on a dataset of real-world GitHub c functions with high-precision runtime labels, GFTrans outperforms baseline models like Random Forest and Code2Vec, achieving 78.64% accuracy. The system identifies potential bottlenecks in milliseconds, enabling developers to perform optimization effectively during the coding phase.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1782461</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1782461</link>
        <title><![CDATA[A genetic algorithm-based framework for online sparse feature selection in data streams]]></title>
        <pubdate>2026-02-09T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Guanyu Liu</author><author>Jinhang Liu</author><author>Guifan He</author><author>Yifan Liu</author><author>Huabo Bai</author><author>Min Zhou</author>
        <description><![CDATA[High-dimensional streaming data implementations commonly utilize online streaming feature selection (OSFS) techniques. In practice, however, incomplete data due to equipment failures and technical constraints often poses a significant challenge. Online Sparse Streaming Feature Selection (OS2FS) tackles this issue by performing missing data imputation via latent factor analysis. Nevertheless, existing OS2FS approaches exhibit considerable limitations in feature evaluation, resulting in degraded performance. To address these shortcomings, this paper introduces a novel genetic algorithm-based online sparse streaming feature selection (GA-OS2FS) in data streams, which integrates two key innovations: (1) imputation of missing values using a latent factor analysis model, and (2) application of genetic algorithm to assess feature importance. Comprehensive experiments conducted on six real-world datasets show that GA-OS2FS surpasses state-of-the-art OSFS and OS2FS methods, consistently attaining higher accuracy through the selection of optimal feature subsets.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1775728</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1775728</link>
        <title><![CDATA[Adaptive core-enhanced latent factor model for highly accurate QoS prediction]]></title>
        <pubdate>2026-02-02T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Siqi Ai</author><author>Peixin Li</author><author>Hao Fang</author><author>Yonghui Xia</author>
        <description><![CDATA[Accurate prediction of Quality of Service (QoS) plays a crucial role in service recommendation and selection across large-scale distributed environments. Latent factor (LF) models have become a mainstream solution for QoS prediction owing to their simplicity and scalability, yet typical formulations struggle to capture complex latent interactions and usually rely on manually tuned regularization, which often limits prediction accuracy. To address these challenges, we propose an Adaptive Core-Enhanced Latent Factor (ACELF) model that integrates a learnable core interaction mechanism with an incremental Proportional-Integral-Derivative (PID)-driven adaptive regularization strategy. Specifically, a learnable core interaction matrix is introduced to model interactions between latent user and service factors, enabling richer representation learning beyond standard bilinear assumptions. To further enhance robustness, we design an incremental PID controller that dynamically adjusts the regularization coefficient of the core interaction matrix according to the training dynamics, allowing the optimization process to automatically balance model expressiveness and overfitting. Extensive experiments on real-world QoS datasets demonstrate that ACELF consistently outperforms several state-of-the-art methods in terms of prediction accuracy.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2025.1753871</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2025.1753871</link>
        <title><![CDATA[Deep learning-enabled hybrid systems for accurate recognition of text in seal images]]></title>
        <pubdate>2026-01-14T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Keke Zhang</author><author>Mingyu Guan</author><author>Chao Wu</author><author>Yutong Li</author><author>Qingguo Lü</author><author>Yi Liu</author><author>Yi Wang</author><author>Wei Wang</author><author>Wei Zhang</author>
        <description><![CDATA[Chinese seals are widely used in various fields within Chinese society as a tool for certifying legal documents. However, recognizing text on these seals presents challenges due to background text, high noise levels, and minimalistic image features. This paper introduces a hybrid model to address these difficulties in Chinese seal text recognition. Our model integrates preprocessing techniques tailored for real seals, a deep learning-based position correction model, a circular text unwrapping model, and OCR text recognition. First, we apply a color-based method to effectively remove the black background text on seals, eliminating redundant information while retaining crucial features for further analysis. Next, we introduce an innovative image denoising algorithm to significantly improve the system's robustness in processing noisy seal images. Additionally, we develop a deep learning-based angle prediction network and create synthetic datasets that mimic real seal scenes, enabling optimal seal image positioning for enhanced text flattening and recognition, thus boosting overall system performance. Finally, polar coordinate transformation is employed to convert the circular seal into a rectangular image for more efficient text recognition. Experimental results indicate that our proposed methods effectively enhance the accuracy of seal text recognition.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2025.1745751</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2025.1745751</link>
        <title><![CDATA[Time series forecasting for bug resolution using machine learning and deep learning models]]></title>
        <pubdate>2025-12-19T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Lerina Aversano</author><author>Martina Iammarino</author><author>Antonella Madau</author><author>Fabiano Pecorelli</author>
        <description><![CDATA[Predicting bug fix times is a key objective for improving software maintenance and supporting planning in open source projects. In this study, we evaluate the effectiveness of different time series forecasting models applied to real-world data from multiple repositories, comparing local (one model per project) and global (a single model trained across multiple projects) approaches. We considered classical models (Naive, Linear Regression, Random Forest) and neural networks (MLP, LSTM, GRU), with global extensions including Random Forest and LSTM with project embeddings. The results highlight that, at the local level, Random Forest achieves lower errors and better classification metrics than deep learning models in several cases. However, global models show greater robustness and generalizability: in particular, the global Random Forest significantly reduces the mean error and maintains high performance in terms of accuracy and F1 score, while the global LSTM captures temporal dependencies and provides additional insights into cross-project dynamics. The explainable AI techniques adopted (permutation importance, saliency maps, and embedding analysis) allow us to interpret the main drivers of forecasts, confirming the role of process variables and temporal characteristics. Overall, the study demonstrates that an integrated approach, combining classical models and deep learning in a global perspective, offers more reliable and interpretable forecasts to support software maintenance.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2025.1704189</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2025.1704189</link>
        <title><![CDATA[M-PSGP: a momentum-based proximal scaled gradient projection algorithm for nonsmooth optimization with application to image deblurring]]></title>
        <pubdate>2025-11-24T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Kexin Ning</author><author>Qingguo Lü</author><author>Xiaofeng Liao</author>
        <description><![CDATA[In this study, we focus on investigating a nonsmooth convex optimization problem involving the l1-norm under a non-negative constraint, with the goal of developing an inverse-problem solver for image deblurring. Research focused on solving this problem has garnered extensive attention and has had a significant impact on the field of image processing. However, existing optimization algorithms often suffer from overfitting and slow convergence, particularly when working with ill-conditioned data or noise. To address these challenges, we propose a momentum-based proximal scaled gradient projection (M-PSGP) algorithm. The M-PSGP algorithm, which is based on the proximal operator and scaled gradient projection (SGP) algorithm, integrates an improved Barzilai-Borwein-like step-size selection rule and a unified momentum acceleration framework to achieve a balance between performance optimization and convergence rate. Numerical experiments demonstrate the superiority of the M-PSGP algorithm over several seminal algorithms in image deblurring tasks, highlighting the significance of our improved step-size strategy and momentum-acceleration framework in enhancing convergence properties.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2025.1680669</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2025.1680669</link>
        <title><![CDATA[Research on optimization of personalized recommendation method based on RFMQ model— taking outdoor sports products in cross-border e-commerce as an example]]></title>
        <pubdate>2025-10-14T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Qianlan Chen</author><author>Chupeng Chen</author><author>Zubai Jiang</author><author>Chaoling Li</author><author>Yangxizi Tan</author><author>Niannian Li</author><author>Bolin Zhou</author><author>Bingxian Yang</author>
        <description><![CDATA[With the rapid development of the global digital economy, cross-border e-commerce has rapidly emerged and developed at a high speed, and has become a crucial bridge connecting global markets. This research focuses on the cross-border e-commerce sector of outdoor sports products, in response to the common problems in the cross-border e-commerce field, such as “information overload” and “insufficient recommendation accuracy,” a personalized recommendation optimization framework integrating customer value segmentation and collaborative filtering is proposed. Based on the classic RFM model, the purchase quantity indicator (Quantity) is introduced to construct the RFMQ model, thereby more comprehensively characterizing user behavior characteristics. Further, the customer value stratification is achieved by using the indicator segmentation method and the K-means clustering algorithm, and a differentiated collaborative filtering recommendation mechanism is designed based on the segmented groups. Through a five-fold cross-validation experiment, it is shown that the proposed method significantly outperforms the traditional collaborative filtering model in the TOPN recommendation task. Specifically, when the number of recommended products is between 3 and 7, the RFMQ recommendation model based on indicator segmentation performs best in terms of F1 score (for example, when TOPN = 5, the F1 value increases from 0.1709 to 0.3093), and the method based on K-means clustering also shows a stable improvement (with the F1 value reaching 0.267 at the same time). The results indicate that the indicator segmentation method has a significant advantage in smaller recommendation quantity scenarios. This study verifies the effectiveness of the RFMQ model in customer segmentation and recommendation performance optimization, providing an operational solution for e-commerce platforms to implement precise marketing, enhance user stickiness and commercial competitiveness, and is particularly suitable for low-cost and high-efficiency personalized recommendation scenarios of small and medium-sized enterprises.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2025.1609124</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2025.1609124</link>
        <title><![CDATA[OCT-SelfNet: a self-supervised framework with multi-source datasets for generalized retinal disease detection]]></title>
        <pubdate>2025-07-29T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Fatema-E Jannat</author><author>Sina Gholami</author><author>Minhaj Nur Alam</author><author>Hamed Tabkhi</author>
        <description><![CDATA[IntroductionIn the medical AI field, there is a significant gap between advances in AI technology and the challenge of applying locally trained models to diverse patient populations. This is mainly due to the limited availability of labeled medical image data, driven by privacy concerns. To address this, we have developed a self-supervised machine learning framework for detecting eye diseases from optical coherence tomography (OCT) images, aiming to achieve generalized learning while minimizing the need for large labeled datasets.MethodsOur framework, OCT-SelfNet, effectively addresses the challenge of data scarcity by integrating diverse datasets from multiple sources, ensuring a comprehensive representation of eye diseases. By employing a robust two-phase training strategy self-supervised pre-training with unlabeled data followed by a supervised training stage, we utilized the power of a masked autoencoder built on the SwinV2 backbone.ResultsExtensive experiments were conducted across three datasets with varying encoder backbones, assessing scenarios including the absence of self-supervised pre-training, the absence of data fusion, low data availability, and unseen data to evaluate the efficacy of our methodology. OCT-SelfNet outperformed the baseline model (ResNet-50, ViT) in most cases. Additionally, when tested for cross-dataset generalization, OCT-SelfNet surpassed the performance of the baseline model, further demonstrating its strong generalization ability. An ablation study revealed significant improvements attributable to self-supervised pre-training and data fusion methodologies.DiscussionOur findings suggest that the OCT-SelfNet framework is highly promising for real-world clinical deployment in detecting eye diseases from OCT images. This demonstrates the effectiveness of our two-phase training approach and the use of a masked autoencoder based on the SwinV2 backbone. Our work bridges the gap between basic research and clinical application, which significantly enhances the framework's domain adaptation and generalization capabilities in detecting eye diseases.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2025.1624507</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2025.1624507</link>
        <title><![CDATA[Navigating the microarray landscape: a comprehensive review of feature selection techniques and their applications]]></title>
        <pubdate>2025-07-10T00:00:00Z</pubdate>
        <category>Review</category>
        <author>Fangling Wang</author><author>Azlan Mohd Zain</author><author>Yanjie Ren</author><author>Mahadi Bahari</author><author>Azurah A. Samah</author><author>Zuraini Binti Ali Shah</author><author>Norfadzlan Bin Yusup</author><author>Rozita Abdul Jalil</author><author>Azizah Mohamad</author><author>Nurulhuda Firdaus Mohd Azmi</author>
        <description><![CDATA[This review systematically summarizes recent advances in microarray feature selection techniques and their applications in biomedical research. It addresses the challenges posed by the high dimensionality and noise of microarray data, aiming to integrate the strengths and limitations of various methods while exploring their applicability across different scenarios. By identifying gaps in current research, highlighting underexplored areas, and proposing clear directions for future studies, this review seeks to inspire academics to develop novel techniques and applications. Furthermore, it provides a comprehensive evaluation of feature selection methods, offering both a theoretical foundation and practical guidance to help researchers select the most suitable approaches for their specific research questions. Emphasizing the importance of interdisciplinary collaboration, the study underscores the potential of feature selection in transformative applications such as personalized medicine, cancer diagnosis, and drug discovery. Through this review, not only does it provide in-depth theoretical support for the academic community, but also practical guidance for the practical field, which significantly contributes to the overall improvement of microarray data analysis technology.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2025.1603106</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2025.1603106</link>
        <title><![CDATA[Conceptual design of a decision knowledge service model integrating a multi-agent supply relationship diagram for electric power emergency equipment]]></title>
        <pubdate>2025-06-06T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Jiandong Si</author><author>Chang Liu</author><author>Jingxian Ye</author><author>Jianfeng Wu</author><author>Jianguo Wang</author><author>Kairui Hu</author><author>Chunhua Ju</author><author>Qianwen Cao</author>
        <description><![CDATA[IntroductionThe decision regarding the supply of emergency equipments for power emergencies requires timeliness, efficiency, and accuracy. The multi-agent supply relationship graph, based on complex data fusion, enables the comprehensive exploration of interconnections among key entities in power emergency supplies.MethodsThis approach enhances decision-making efficiency and quality by uncovering multiple relationships between main bodies involved. The present study focuses on the decision-making process for power emergency equipments supply and aims to enhance its professionalization. To achieve this goal, multi-modal data regarding power emergency equipments supply is collected from both internal and external power enterprises. Subsequently, a decision support knowledge base is established, along with a four-dimensional relationship graph that integrates events, time, equipments, and suppliers based on the knowledge graph. This enables the mining of multidimensional relationships pertaining to the main body. Finally, supported by the graph, the platform can offer intelligent assistance in decision-making, supplier recommendation, optimization of emergency equipment scheduling for electric power supply, and provides effective information and guidance for decision-making in electric power emergency equipment supply.ResultsAfter conducting a comparative analysis, the decision support system based on the knowledge graph proposed in this study demonstrates superior effectiveness and precision. By integrating the four-dimensional relationship graph with data mining algorithms, precise decision support can be provided for power emergency response. After verification through case studies, the model developed in this study was utilized to recommend suppliers of power emergency equipment, and the recommendation results demonstrated a closer alignment with actual procurement outcomes.Conclusion and recommendationThis system proposed by this study delivers multidimensional knowledge guidance and optimized decision pathways for emergency supply management.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2025.1600267</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2025.1600267</link>
        <title><![CDATA[Sliding window based rare partial periodic pattern mining algorithms over temporal data streams]]></title>
        <pubdate>2025-06-04T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>K. Jyothi Upadhya</author><author>Ronan Lobo</author><author>Mini Shail Chhabra</author><author>Aman Paleja</author><author>B. Dinesh Rao</author><author>Geetha M.</author><author>Prachi Sisodia</author><author>Bolusani Akshita Reddy</author>
        <description><![CDATA[Periodic pattern mining, a branch of data mining, is expanding to provide insight into the occurrence behavior of large volumes of data. Recently, a variety of industries, including fraud detection, telecommunications, retail marketing, research, and medical have found applications for rare association rule mining, which uncovers unusual or unexpected combinations. A limited amount of literature demonstrated how periodicity is essential in mining low-support rare patterns. In addition, attention must be placed on temporal datasets that analyze crucial information about the timing of pattern occurrences and stream datasets to manage high-speed streaming data. Several algorithms have been developed that effectively track the cyclic behavior of patterns and identify the patterns that display complete or partial periodic behavior in temporal datasets. Numerous frameworks have been created to examine the periodic behavior of streaming data. Nevertheless, such a method that focuses on the temporal information in the data stream and extracts rare partial periodic patterns has yet to be proposed. With a focus on identifying rare partial periodic patterns from temporal data streams, this paper proposes two novel sliding window-based single scan approaches called R3PStreamSW-Growth and R3PStreamSW-BitVectorMiner. The findings showed that when a dense dataset Accidents is considered, for different threshold variations R3P-StreamSWBitVectorMiner outperformed R3PStreamSW-Growth by about 93%. Similarly, when the sparse dataset T10I4D100K is taken into account, R3P-StreamSWBitVectorMiner exhibits a 90% boost in performance. This demonstrates that on a range of synthetic, real-world, sparse, and dense datasets for different thresholds, R3P-StreamSWBitVectorMiner is significantly faster than R3PStreamSW-Growth.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2025.1582619</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2025.1582619</link>
        <title><![CDATA[Erratum: Edge-level multi-constraint graph pattern matching with lung cancer knowledge graph]]></title>
        <pubdate>2025-03-04T00:00:00Z</pubdate>
        <category>Erratum</category>
        <author>Frontiers Production Office </author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2025.1508087</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2025.1508087</link>
        <title><![CDATA[Cloud computing convergence: integrating computer applications and information management for enhanced efficiency]]></title>
        <pubdate>2025-02-19T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Guo Zhang</author>
        <description><![CDATA[This study examines the transformative impact of cloud computing on the integration of computer applications and information management systems to improve operational efficiency. Grounded in a robust methodological framework, the research employs experimental testing and comparative data analysis to assess the performance of an information management system within a cloud computing environment. Data was meticulously collected and analyzed, highlighting a threshold where user demand surpasses 400, leading to a stabilization in CPU utilization at an optimal level and maintaining subsystem response times consistently below 5 s. This comprehensive evaluation underscores the significant advantages of cloud computing, demonstrating its capacity to optimize the synergy between computer applications and information management. The findings not only contribute to theoretical advancements in the field but also offer actionable insights for organizations seeking to enhance efficiency through effective cloud-based solutions.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2025.1546850</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2025.1546850</link>
        <title><![CDATA[Edge-level multi-constraint graph pattern matching with lung cancer knowledge graph]]></title>
        <pubdate>2025-02-10T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Houdie Tu</author><author>Lei Li</author><author>Zhenchao Tao</author><author>Zan Zhang</author>
        <description><![CDATA[IntroductionTraditional Graph Pattern Matching (GPM) research mainly focuses on improving the accuracy and efficiency of complex network analysis and fast subgraph retrieval. Despite their ability to return subgraphs quickly and accurately, these methods are limited to their applications without medical data research.MethodsIn order to overcome this limitation, based on the existing research on GPM with the lung cancer knowledge graph, this paper introduces the Monte Carlo method and proposes an edge-level multi-constraint graph pattern matching algorithm TEM with lung cancer knowledge graph. Furthermore, we apply Monte Carlo method to both nodes and edges, and propose a multi-constraint hologram pattern matching algorithm THM with lung cancer knowledge graph.ResultsThe experiments have verified the effectiveness and efficiency of TEM algorithm.DiscussionThis method effectively addresses the complexity of uncertainty in lung cancer knowledge graph, and is significantly better than the existing algorithms on efficiency.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2025.1563730</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2025.1563730</link>
        <title><![CDATA[Editorial: Visualizing big culture and history data]]></title>
        <pubdate>2025-02-04T00:00:00Z</pubdate>
        <category>Editorial</category>
        <author>Florian Windhager</author><author>Steffen Koch</author><author>Sander Münster</author><author>Eva Mayr</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2024.1437580</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2024.1437580</link>
        <title><![CDATA[TSPDB: a curated resource of tailspike proteins with potential applications in phage research]]></title>
        <pubdate>2024-11-27T00:00:00Z</pubdate>
        <category>Data Report</category>
        <author>Opeyemi U. Lawal</author><author>Lawrence Goodridge</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2024.1444634</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2024.1444634</link>
        <title><![CDATA[Efficient out-of-distribution detection via layer-adaptive scoring and early stopping]]></title>
        <pubdate>2024-11-20T00:00:00Z</pubdate>
        <category>Methods</category>
        <author>Haoliang Wang</author><author>Chen Zhao</author><author>Feng Chen</author>
        <description><![CDATA[IntroductionMulti-layer aggregation is key to the success of out-of-distribution (OOD) detection in deep neural networks. Moreover, in real-time systems, the efficiency of OOD detection is equally important as its effectiveness.MethodsWe propose a novel early stopping OOD detection framework for deep neural networks. By attaching multiple OOD detectors to the intermediate layers, this framework can detect OODs early to save computational cost. Additionally, through a layer-adaptive scoring function, it can adaptively select the optimal layer for each OOD based on its complexity, thereby improving OOD detection accuracy.ResultsExtensive experiments demonstrate that our proposed framework is robust against OODs of varying complexity. Adopting the early stopping strategy can increase OOD detection efficiency by up to 99.1% while maintaining superior accuracy.DiscussionOODs of varying complexity are better detected at different layers. Leveraging the intrinsic characteristics of inputs encoded in the intermediate latent space is important for achieving high OOD detection accuracy. Our proposed framework, incorporating early stopping, significantly enhances OOD detection efficiency without compromising accuracy, making it practical for real-time applications.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2024.1427104</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2024.1427104</link>
        <title><![CDATA[ActiveReach: an active learning framework for approximate reachability query answering in large-scale graphs]]></title>
        <pubdate>2024-11-19T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Zohreh Raghebi</author><author>Farnoush Banaei-Kashani</author>
        <description><![CDATA[With graph reachability query, one can answer whether there exists a path between two query vertices in a given graph. The existing reachability query processing solutions use traditional reachability index structures and can only compute exact answers, which may take a long time to resolve in large graphs. In contrast, with an approximate reachability query, one can offer a compromise by enabling users to strike a trade-off between query time and the accuracy of the query result. In this study, we propose a framework, dubbed ActiveReach, for learning index structures to answer approximate reachability query. ActiveReach is a two-phase framework that focuses on embedding nodes in a reachability space. In the first phase, we leverage node attributes and positional information to create reachability-aware embeddings for each node. These embeddings are then used as nodes' attributes in the second phase. In the second phase, we incorporate the new attributes and include reachability information as labels in the training data to generate embeddings in a reachability space. In addition, computing reachability for all training data may not be practical. Therefore, selecting a subset of data to compute reachability effectively and enhance reachability prediction performance is challenging. ActiveReach addresses this challenge by employing an active learning approach in the second phase to selectively compute reachability for a subset of node pairs, thus learning the approximate reachability for the entire graph. Our extensive experimental study with various real attributed large-scale graphs demonstrates the effectiveness of each component of our framework.]]></description>
      </item>
      </channel>
    </rss>