<?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0">
      <channel xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <title>Frontiers in Big Data | New and Recent Articles</title>
        <link>https://www.frontiersin.org/journals/big-data</link>
        <description>RSS Feed for Frontiers in Big Data | New and Recent Articles</description>
        <language>en-us</language>
        <generator>Frontiers Feed Generator,version:1</generator>
        <pubDate>2026-06-21T12:23:19.764+00:00</pubDate>
        <ttl>60</ttl>
        <item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1687969</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1687969</link>
        <title><![CDATA[Deep learning model to predict COPD hospital admissions based on meteorological data: a medical meteorological forecast]]></title>
        <pubdate>2026-06-18T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Lei Zhang</author><author>Mingjie Zhang</author><author>Jinghong Zhang</author><author>Yajie Zhang</author><author>Tian Xie</author><author>Yipeng Ding</author><author>Shuyuan Chu</author><author>Haihong Wu</author>
        <description><![CDATA[BackgroundChronic obstructive pulmonary disease (COPD) has placed a substantial health burden on the world. Meteorological conditions are associated with hospital admissions for COPD. In this study, we aim to develop a model of medical meteorological forecasting for COPD hospital admissions.MethodsA predictive model was developed using a Long Short-Term Memory (LSTM) algorithm applied to time series data on COPD hospital admissions and meteorological conditions. Data were collected daily from 25 September 2016 to 26 December 2020. Performance of the model was assessed using the mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), and R2. The association between the risk of COPD hospital admissions and meteorological conditions was assessed using a conditional logistic regression analysis and a conditional Poisson regression analysis in a time-stratified case-crossover design.ResultsA total of 17,555 hospital admissions for COPD from 1 January 2017 to 31 December 2019 were included in the final LSTM model. Regarding the performance of the LSTM model, the MSE was 0.028, the RMSE was 0.167, the MAE was 0.134, and R2 was 0.416. Regression analysis revealed that the maximum temperature was positively associated with COPD hospital admissions.ConclusionThe LSTM model offers potential for medical meteorological forecasting to predict COPD hospital admissions among the general population according to the local climate. Higher maximum temperature may be a risk factor for COPD hospital admissions.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1857064</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1857064</link>
        <title><![CDATA[Where diverse populations gather: transit accessibility and the spatial structure of social mixing]]></title>
        <pubdate>2026-06-18T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Yuan Liao</author>
        <description><![CDATA[Urban venues serve as arenas for social mixing, yet less is known about how public transit infrastructure shapes the geography of mixing at specific locations. This study examines how transit catchment diversity—the socioeconomic heterogeneity of populations reachable by public transit—associates with visitor diversity at points of interest (POIs) in nine Swedish and three US cities. Using mobile phone GPS traces and aggregated foot traffic data from 2024, we compute visitor diversity indices based on visitors' home-neighborhood birth-background composition and employ spatial regression models and geographically weighted regression (GWR). Transit catchment diversity positively predicts visitor diversity across nearly all cities, but this association is robust only in the largest metropolitan areas; in smaller cities, the coefficient attenuates to insignificance once geographic catchment composition, centrality, and venue density are controlled. Spatial spillovers in visitor diversity follow general geographic proximity rather than shared transit-stop connectivity, suggesting that the association operates through catchment population composition rather than station-level linkages. Transit–diversity hotspots occur not in already-diverse venues, but in lower-diversity POIs with lower commercial density, greater distance from transit in US cities, and greater centrality in Sweden. These patterns are consistent with transit-accessible population composition being associated with visitor diversity, particularly where alternative pathways to diverse co-presence are limited.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1826953</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1826953</link>
        <title><![CDATA[Inner layer security reinforcement for instant payment systems: a dual layer encryption-steganography evaluation in Brunei's digital payment context]]></title>
        <pubdate>2026-06-17T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Ampuan Shazani bin Ampuan Haji Sadikin</author><author>Heru Susanto</author>
        <description><![CDATA[IntroductionThe rapid adoption of real-time digital payment systems introduces cybersecurity risks that extend beyond technical vulnerabilities to include significant human and organizational factors.MethodsThis study evaluates a dual-layer data protection mechanism combining AES-128 encryption with spread spectrum audio steganography within Brunei's digital payment context. Stakeholder interviews with Cyber Security Brunei (CSB), National Digital Payments Network (NDPx), and a local bank were conducted alongside experimental testing using Stripe Sandbox.ResultsHuman factor issues accounted for 47% of identified cybersecurity concerns. Experimental results demonstrated that spread spectrum steganography achieved an 87.5% robustness rate across eight attack scenarios while maintaining near real-time performance with an average processing time of 568.82 ms and acceptable audio quality (PSNR 26.30 dB). Sandbox validation confirmed feasibility within realistic payment workflows.DiscussionThe findings support data-centric security as a compensating control in human-dominated threat environments and demonstrate the viability of combining encryption and steganography to reinforce instant payment security.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1838191</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1838191</link>
        <title><![CDATA[Measuring the impact of virtualization and containerization on the environment when using GPUs for processing the AI models]]></title>
        <pubdate>2026-06-16T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Safaa Hriez</author><author>Mohammad Haikal</author>
        <description><![CDATA[IntroductionThe rapid growth of artificial intelligence (AI) has significantly increased computing demand, intensifying the operational strain on the computing environment. While virtualization and containerization are established technologies for resource optimization, their comparative energy efficiency and environmental impact, particularly under GPU-accelerated AI workloads, are not well-quantified.MethodsThis study evaluates the energy consumption and environmental impact of virtualization and containerization technologies when using Graphics Processing Units (GPUs) for AI model execution. Employing a computer vision benchmark, the performance, GPU resource utilization, and power consumption were measured. The experiment involved training a DenseNet-121 model on the MNIST dataset within a VirtualBox virtual machine and a Docker container environment.ResultsThe analysis indicates that containerization consistently surpasses virtualization in energy efficiency. Specifically, Docker container configuration demonstrated an approximately 21.6% reduction in total energy consumption, and a corresponding reduction in carbon dioxide (CO2) emissions compared to a VirtualBox virtual machine. Furthermore, containerization exhibited lower average and peak GPU utilization and power consumption.DiscussionThese findings demonstrate that containerization offers a more energy-efficient and environmentally sustainable approach than VirtualBox virtualization for the specific GPU-enabled AI workload evaluated in this study. Statistical significance testing indicates that the observed performance differentials are significant, supporting the validity of the results within the experimental scope of this work.ConclusionImplementing containerization in this experimental setup may reduce energy consumption and environmental impact without compromising computational performance. Future studies should extend these analyses to larger neural network models, diverse AI workloads, and heterogeneous GPU platforms to enhance the generalizability of these findings beyond the current single-system experimental configuration.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1835663</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1835663</link>
        <title><![CDATA[Using artificial intelligence to improve governance and public services in Africa]]></title>
        <pubdate>2026-06-15T00:00:00Z</pubdate>
        <category>Mini Review</category>
        <author>David Mhlanga</author>
        <description><![CDATA[African governments continue to face persistent challenges in delivering efficient, transparent, and inclusive public services due to institutional constraints, rapid population growth, and fragmented administrative systems. At the same time, accelerating digital transformation and expanding data ecosystems create new opportunities for governance reform. This study examines the role of Artificial Intelligence (AI) in enhancing public sector performance across welfare targeting, healthcare delivery, tax administration, and urban governance. Drawing on a structured narrative literature review, the paper develops a conceptual framework that conceptualizes governance outcomes as a function of data availability, AI capability, institutional capacity, and human oversight. The findings suggest that AI can improve service delivery by enabling predictive decision-making, reducing administrative inefficiencies, and enhancing targeting accuracy. However, these benefits depend on the alignment between technological adoption and institutional readiness, as weak governance systems may amplify risks such as bias, exclusion, and accountability gaps. The study concludes that AI must be embedded in inclusive, context-sensitive governance strategies to support sustainable development outcomes in Africa.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1736939</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1736939</link>
        <title><![CDATA[Case count metric for comparative analysis of entity resolution results]]></title>
        <pubdate>2026-06-11T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>John R. Talburt</author><author>Muzakkiruddin Ahmed Mohammed</author><author>Mert Can Cakmak</author><author>Onais Khan Mohammed</author><author>Mahboob Khan Mohammed</author><author>Khizer Syed</author><author>Leon Claassens</author>
        <description><![CDATA[IntroductionEntity resolution (ER) systems often produce different clustering outcomes when applied to the same dataset, especially when parameters, algorithms, or system configurations change. However, in many real-world settings, the true linking structure is unknown, making traditional accuracy-based evaluation difficult.MethodsThis paper presents the Case Count Metric System (CCMS), a process and software system for comparing two cluster ER outcomes without requiring a truth set. CCMS classifies how each cluster from the first ER process is transformed by the second process into four mutually exclusive cases: unchanged, merged, partitioned, or overlapping.ResultsCCMS produces aggregate case counts, singleton summaries, and per-cluster transformation details to support diagnostic analysis. Example applications using synthetic demographic data and an industrial materials dataset show that CCMS can identify how clustering outcomes change under parameter adjustments and alternative ER systems.DiscussionCCMS provides a practical and interpretable method for comparing ER clustering results when labeled ground truth is unavailable. By distinguishing between over-linking, under-linking, and more complex cluster reorganizations, CCMS offers more actionable insight than single-value similarity measures and supports both research analysis and operational ER evaluation.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1752468</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1752468</link>
        <title><![CDATA[Data field theory: a geometric framework for learning on Riemannian manifolds with synthetic validation and limitation analysis]]></title>
        <pubdate>2026-06-11T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Mohammadreza Nehzati</author>
        <description><![CDATA[IntroductionConventional machine learning treats learning as parameter optimization, lacking a first-principles framework for phenomena like criticality, generalization, and causal structure. We introduce Data Field Theory (DFT), a mathematical framework modelling learning as the evolution of a data field governed by stochastic partial differential equations on Riemannian manifolds. This work aims to validate DFT's core predictions in settings where its geometric assumptions hold, while honestly assessing its empirical limitations.MethodsWe formulate learning as a field φ:M×ℝ≥0→ℝk evolving on a spherical manifold. To test DFT, we implement a hierarchical classification task using synthetic data drawn from von Mises-Fisher distributions, ensuring match with the manifold geometry. We derive four key predictions: (1) critical exponents near concept formation, (2) a spectral robustness law linking Eigen gaps to out-of-distribution (OOD) error, (3) finite-speed causal propagation from hyperbolic regularization, and (4) approximate rotational equivariance via a Ward identity. We also conduct a preliminary real-data experiment projecting MNIST digits onto the sphere.ResultsSynthetic experiments validate all four predictions: (1) Correlation length diverges as ξ(t)~|t-tc|-ν with ν = 0.63 ± 0.04, accompanied by 1/f fluctuations; (2) OOD generalization error scales as ϵOOD∝mgap-2 (ρ = −0.78, p < 10−6); (3) Causal propagation speed ceff = 0.98 ± 0.03 (theory maximum cmax = 1.0) under hyperbolic regularization; and (4) Ward identity residual R = 0.0032 ± 0.0008 converging as R∝h1.02. However, on real-world MNIST-sphere data, DFT achieves only 15.7% accuracy versus 51.7% for k-NN, revealing critical limitations.DiscussionDFT successfully predicts emergent phenomena criticality, spectral robustness, bounded causality, and approximate equivariance under ideal geometric conditions, supporting its theoretical validity. The poor real-data performance highlights key gaps: the current framework lacks adaptive metric learning, noise robustness, and hierarchical feature extraction present in real images. These results establish DFT as a principled mathematical foundation for learning as field dynamics while clearly delineating necessary extensions for practical applicability.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1883246</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1883246</link>
        <title><![CDATA[Correction: Explainable gradient convolutional vector fuzzy pattern analysis based on ensemble model for facial expression recognition]]></title>
        <pubdate>2026-06-10T00:00:00Z</pubdate>
        <category>Correction</category>
        <author>Lakshmi Sarvani Videla</author><author>Babu Reddy Mukamalla</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1825213</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1825213</link>
        <title><![CDATA[When uncertainty guides learning: a highly effective approach to kidney disease classification in CT imaging]]></title>
        <pubdate>2026-06-09T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Muslima Akter</author><author>Fahmid Al Farid</author><author>Md Yousuf Ahmad</author><author>Md Azad Hossain Raju</author><author>Sowad Rahman</author><author>Jia Uddin</author><author>Hezerul Bin Abdul Karim</author>
        <description><![CDATA[The high cost of expert annotations significantly hinders the advancement of deep learning models for clinical medical imaging. This work introduces an efficient entropy-based active learning framework that achieves outstanding classification performance for renal abnormalities (Normal, Cyst, Stone, Tumor) in CT scans while requiring only a minimal amount of labeled data. The dataset comprises 12,446 CT slices split 70/15/15 into training (8,716), validation (1,865), and test (1,865) partitions via stratified sampling. Starting with only 200 randomly selected images and employing predictive entropy for uncertainty sampling on a pretrained ResNet-50 backbone, the proposed method attains 99.71% ± 0.25% mean test accuracy (95% CI: [99.30, 99.94]) across five independent runs after just six query cycles on the standard 12,446-image CT kidney dataset. Our method uses only 2,000 labeled training images, representing 22.9% of the 8,716-image training partition (a 77.1% reduction in required annotations relative to full supervision of the training set). This performance matches or exceeds prior fully supervised methods trained on the complete labeled training partition while demonstrating substantially improved sample efficiency, particularly in early annotation cycles where entropy-guided selection converges significantly faster than random sampling. Statistical testing across five repeated runs confirms that results are stable (Shapiro-Wilk p = 0.148). The framework exhibits exceptional sample efficiency as described by an empirically fitted power-law curve with a fitted exponent of 1.2, and empirically observed uncertainty decay with a rate of 0.92. These results offer both practical insights into annotation efficiency and substantial application value in the medical imaging domain.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1785710</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1785710</link>
        <title><![CDATA[Democratizing cloud data lake analytics: natural language access to Apache Iceberg via LLM agents]]></title>
        <pubdate>2026-06-08T00:00:00Z</pubdate>
        <category>Technology and Code</category>
        <author>Vipin Kataria</author><author>Nitin Kumar</author>
        <description><![CDATA[Business analysts and non-technical users need insights from enterprise data lakes but lack SQL expertise to query them directly. While large language models (LLMs) can translate natural language to SQL, existing text-to-SQL approaches face critical limitations: severe SQL injection vulnerabilities, inability to leverage data-lake-specific features like time-travel queries, and inconsistent metric definitions across organizations. We present the LangChain Iceberg Toolkit, enabling users to query Apache Iceberg data lakes through natural language conversations with LLM agents, no SQL knowledge required. Users ask questions in plain English (e.g., “What was revenue last quarter?”), and the system automatically: (1) interprets intent using LLMs, (2) selects appropriate tools from a YAML-based semantic layer mapping business terms to data structures, (3) executes queries through a hybrid architecture combining PyIceberg's type-safe API (for security) with DuckDB's SQL engine (for complex analytics), and (4) returns formatted answers with business context. Our evaluation demonstrates 100% success across 100 systematically designed queries leveraging semantic layer integration for consistent metric definitions. Critically, in direct comparison against a schema-aware text-to-SQL baseline on the same query set, our system achieves a 33 percentage-point accuracy improvement (100% vs. 67%) while reducing SQL injection attack success rate from the 99% reported in prior text-to-SQL research to 0% across both execution paths. End-to-end query latency averages 2.6 seconds on 15.1M records, with partition pruning eliminating 90%+ of scanned data files. The hybrid execution architecture prevents SQL injection vulnerabilities through type-safe query construction for simple queries and controlled, pre-validated SQL execution for complex analytics. Users receive data insights through conversational interfaces without writing SQL, understanding schemas, or knowing technical implementation details. We provide a production-ready, open-source implementation demonstrating practical viability for democratizing enterprise data access.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1796969</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1796969</link>
        <title><![CDATA[TCMB: cross-model multi-level cross-attention network with Taylor-based loss for multimodal fake news detection]]></title>
        <pubdate>2026-06-05T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Santosh Kumar Banbhrani</author>
        <description><![CDATA[IntroductionThe rapid spread of misinformation across social media platforms, websites, and online communication channels has made fake news detection a critical task in the digital era. Although various computational approaches have been developed to identify fake news, many existing methods suffer from limitations such as biased training datasets and high rates of false positives and false negatives. To address these challenges, this study proposes a Multimodal Cross Attention Network with Taylor-based Cross Entropy Mean Bias (MMCN_TCMB) model for detecting multimodal fake news.MethodsThe proposed approach utilizes multimodal inputs consisting of textual and visual content obtained from fake news datasets. The textual information in news posts is first tokenized using Bidirectional Encoder Representations from Transformers (BERT). Feature extraction is then performed using Word2Vec and Term Frequency–Inverse Gravity Moment (TF-IGM). Simultaneously, images associated with news posts undergo preprocessing through Contrast Limited Adaptive Histogram Equalization and Histogram Equalization (CLAHE-HE), followed by feature extraction using ResNet. The extracted textual and visual features are combined and processed through the MMCN framework. The learning mechanism of the network is enhanced using the Taylor-based Cross Entropy Mean Bias (TCMB) loss function to improve classification performance.ResultsExperimental results demonstrate that the proposed MMCN_TCMB model achieves superior performance in multimodal fake news detection. The model attains a recall of 97.988%, precision of 96.223%, F1-score of 97.098%, and overall accuracy of 97.436%, outperforming existing methods.DiscussionThe findings indicate that integrating multimodal feature extraction with cross-attention mechanisms and the TCMB loss function significantly enhances the reliability and accuracy of fake news detection. The proposed framework effectively captures both textual and visual inconsistencies, making it a promising approach for combating misinformation in modern digital platforms.The code is available on:https://github.com/banbhrani84/MMCN_TCMB-Fake-News-.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1813265</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1813265</link>
        <title><![CDATA[Interpretable intrusion detection for IoT: a CNN-BiLSTM permutation importance framework for deep feature selection]]></title>
        <pubdate>2026-05-22T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Ibrahim Al-Shibly</author><author>Llorenç Burgas</author><author>Joaquim Massana</author>
        <description><![CDATA[Industrial intrusion detection systems (IDS) in Industrial Internet of Things (IIoT) environments have to address the problem of handling multi-feature temporally correlated network traffic and dynamic changes in attack patterns. Traditional filter-based feature selection methods, like Mutual Information (MI), only consider individual feature performance and may not be effective in dealing with non-linear feature dependencies. This may degrade detection performance, especially in class-imbalanced problems. To mitigate such challenges, this paper proposes a deep feature selection (DFS) framework that utilizes a hybrid Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM) model. The proposed framework assesses the importance of native features using permutation importance. In the proposed framework, the CNN model detects local features in the data, whereas the BiLSTM model detects bidirectional temporal features in the data. The importance of features is computed by assessing the performance degradation of the model using time-aware perturbations on individual features. These identified features that are most relevant are then used to train lightweight traditional machine learning models like decision tree, K-nearest neighbor (KNN), logistic regression, naïve Bayes, and random forest. This makes it easy to deploy in resource-constrained IIoT environments. The approach is tested on the CIC IIoT 2025 dataset. From the experimental results, it is clear that the CNN-BiLSTM DFS framework improves recall and F1-score compared to other feature selection approaches like MI. This is especially true in imbalanced settings. The decoupling of feature selection from offline and edge-side inference provides a balance between detection accuracy, robustness, and deployability in real-world IIoT settings.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1829960</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1829960</link>
        <title><![CDATA[KATENA: a verifiable governance architecture for encrypted cloud storage systems]]></title>
        <pubdate>2026-05-15T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Jesús F. Rodríguez-Aragón</author><author>Carolina Zato</author><author>Francisco Pinto-Santos</author><author>Lorena Sánchez-Pravos</author>
        <description><![CDATA[Modern data-intensive infrastructures increasingly rely on cloud storage and client-controlled encryption to protect the confidentiality of outsourced information. However, while encryption prevents providers from accessing plaintext data, governance operations such as sharing, revocation, and policy updates typically remain opaque to users and auditors. This creates a structural gap between strong data confidentiality and verifiable governance in cloud environments that manage large volumes of sensitive information. This paper introduces KATENA (Key Architecture for Trustworthy Encrypted Networked Archives), an architectural model that enables client-verifiable governance in encrypted cloud storage systems. The proposed approach combines hierarchical key orchestration, transparency-based governance logging, and cryptographically verifiable governance artifacts so that clients can independently validate governance events without relying on provider-side trust. By integrating accountability mechanisms directly into encrypted storage architectures, the work provides a governance-by-design framework that bridges client-controlled encryption with verifiable data governance in modern data-intensive cloud systems.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1799073</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1799073</link>
        <title><![CDATA[The role of statistical methods and artificial intelligence in inventory management for manufacturing industries: a systematic literature review]]></title>
        <pubdate>2026-05-15T00:00:00Z</pubdate>
        <category>Systematic Review</category>
        <author>Arvia Dwi Royani</author><author>Mahfud Sholihin</author><author>Dewi Dewi</author><author>Novika Novika</author><author>Annisa Sorayya</author><author>Wahyu Nur Hanifah</author><author>Rizki Ramadhani Arif Trilana</author><author>Paolina Buton</author>
        <description><![CDATA[Inventory management is a critical business process that affects the operational efficiency and competitiveness of manufacturing companies. Inaccurate inventory decisions can result in significant financial losses for companies. Demand variability poses a challenge in determining inventory levels, requiring more sophisticated, flexible forecasting methods. This study was conducted to examine the roles of statistical methods and Artificial Intelligence (AI) in inventory decision-making in the manufacturing industry, analyze the conditions under which each method is suitable, and evaluate the potential of a hybrid approach integrating statistical methods and AI. This study uses the Systematic Literature Review method with the PRISMA 2020 framework to ensure research transparency and accuracy. This study identifies articles from reputable databases indexed in Scopus. The findings show a significant shift in inventory management research. In the last decade, AI technology has dominated the literature at 62.5%, while statistical methods account for 25%, and hybrid methods have begun to emerge but remain limited to 12.5%. Based on the review of selected papers, statistical methods have proven to remain effective for consistent historical data and stable demand patterns. Conversely, in dynamic operational environments with large-scale data and complex nonlinear patterns, AI technology is superior. This study also found that the hybrid approach has great potential to balance accuracy, interpretability, and decision support, although the relevant literature remains limited. The implementation of technology in the manufacturing industry faces several obstacles, including limited data quality, a skills gap in technology, and the black-box nature of complex AI. This review provides a systematic and critical synthesis of methodological patterns and operational fit in the use of statistical, AI, and hybrid methods for manufacturing inventory management. Future research is recommended to focus on the development of interpretable AI, modular hybrid frameworks, and the use of real industry data to ensure that academic innovations can be applied in the manufacturing industry.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1769948</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1769948</link>
        <title><![CDATA[Quantifying energy and accuracy trade-offs of federated learning on wearable health devices]]></title>
        <pubdate>2026-05-15T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Rupaak S.</author><author>Ganesh Khekare</author><author>Yash Kumar</author><author>Gaurav Soni</author>
        <description><![CDATA[The rapid development of wearable health tools has made it possible to continuously monitor physiological conditions for preventive care. However, stringent privacy laws, including HIPAA and GDPR, require decentralized methods such as federated learning (FL) to safeguard personal patient information. Nonetheless, empirical profiling in this paper finds that typical FL implementations are plagued by a serious performance trilemma; a naive federated model attains a 35.3 percent energy savings (3.84 vs. 5.93 kJ in the centralized models), but at the cost of a disastrous performance penalty of 13.87 percentage points (84.94 vs. 98.81 percent in centralized models). The failure in research is largely due to the on-device computational load of 4.24 MFLOPs per training sample, resulting in a “straggler” bottleneck that increases the total training duration to 1,066.26 s, almost 70 times longer than centralized training. As a result, the introduction of the hybrid hierarchical federated split learning (H-FedSL) architecture helps in strategically splitting the neural network at a cut layer to divide the workload between wearable and nearby edge servers. The methodology provides a new framework that offloads the heavy and deep-layer computations to the edge server, leaving the shallow feature extraction to the point of operation, and sends only privacy-sensitive abstractions of the smashed data, rather than raw signals. The integration of asynchronous protocols will help manage device heterogeneity and resource-aware client selection, thereby achieving the aim of H-FedSL to restore the gold-standard accuracy of 98.81% with the state-of-the-art 35.3% energy efficiency of the federated model. Thus, a technically and economically feasible pathway will be provided for deploying medical-grade AI on resource-constrained Internet of Medical Things (IoMT) devices.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1762571</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1762571</link>
        <title><![CDATA[Definitional ambiguity in cognitive warfare: a critical and systematic conceptual review through ideal-type analysis]]></title>
        <pubdate>2026-05-15T00:00:00Z</pubdate>
        <category>Systematic Review</category>
        <author>Per-Erik Nilsson</author><author>Andreas Haga</author><author>Kristina Hellström</author>
        <description><![CDATA[Cognitive warfare is a relatively new concept in both military and academic discourse. The article's purpose is to advance conceptual clarity regarding cognitive warfare and to support future policy-oriented and academic research that strengthens the field's conceptual and methodological foundations, understood here as the broader domain of communication and defense studies concerned with informational and cognitive forms of contestation. This article examines how the notion is conceptualized within the emerging body of research, drawing on a systematic literature review. With support from LLM-assisted analysis, the study employs an exploratory methodology to identify both conceptual commonalities and points of divergence. The review indicates that cognitive warfare remains an underdeveloped research field, characterized by broad assumptions and limited scientific rigor. While the concept may represent a reframing of long-standing practices, it may also serve a political function by drawing renewed attention to forms of influence and conflict that have been overshadowed in recent decades. The article concludes by outlining avenues for future interdisciplinary research, emphasizing the need for conceptual clarity, empirical operationalization, and a more nuanced understanding of how adversaries themselves articulate and employ cognitive warfare.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1817120</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1817120</link>
        <title><![CDATA[Cheatomaly: weakly supervised video anomaly ranking for exam cheating detection using vision transformers]]></title>
        <pubdate>2026-05-12T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>El Mehdi Alaoui Mrani</author><author>Anas Bouayad</author><author>Khalid Fardousse</author>
        <description><![CDATA[Detecting cheating in classroom examinations is challenging because suspicious behaviors are often subtle, temporally sparse, and context-dependent. To address the lack of dedicated benchmarks for this setting, we introduce Cheatomaly, a curated video dataset assembled from publicly available classroom examination material and annotated to support weakly supervised anomaly detection. We formulate cheating detection as a weakly supervised video anomaly ranking task using Multiple Instance Learning (MIL) with Vision Transformer features. Videos are divided into temporal segments, and segment-level representations are built using mean pooling and a Mean, Standard Deviation, and Temporal Difference (MSD) formulation. A margin-based ranking objective is used to prioritize anomalous videos and suspicious temporal segments using only video-level labels during training. Experimental results on Cheatomaly show strong video-level discrimination and meaningful frame-level localization across repeated runs. Ablation, baseline, statistical, and sensitivity analyses indicate that temporal aggregation affects the trade-off between ranking and localization but does not produce consistent statistically significant gains. Overall, Cheatomaly provides a realistic benchmark for studying subtle cheating-related anomalies in classroom examinations, and the results highlight that the main challenge lies in modeling context-dependent temporal behavior rather than feature aggregation alone.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1768571</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1768571</link>
        <title><![CDATA[Optimizing large-scale graph database ingestion through edge value ranking: a proposed framework]]></title>
        <pubdate>2026-05-11T00:00:00Z</pubdate>
        <category>Hypothesis and Theory</category>
        <author>Phanindra Reddy Madduru</author><author>Bijo Thomas</author>
        <description><![CDATA[This paper proposes a preprocessing framework for optimizing large-scale graph database ingestion through intelligent edge filtering based on value ranking. We combine adapted PageRank algorithms with business-specific metrics and edge type importance to evaluate and rank edges, enabling selective retention of high-value relationships. The framework introduces three PageRank variants (maximum weight normalization, weighted average, and log-based normalization) with type-specific business value normalization to handle heterogeneous graphs. Current graph database ingestion approaches struggle with scale: loading 6.2TB of data (38 billion objects) requires over 3 weeks, forcing organizations to limit historical data retention. Our approach addresses this through preprocessing-stage filtering before database ingestion. While requiring experimental validation, preliminary analysis suggests potential for 40%–80% data volume reduction depending on graph characteristics, with corresponding improvements in loading efficiency and storage costs. The paper details the theoretical framework, computational complexity analysis, formal property preservation guarantees, and comprehensive validation methodology. This work represents a novel direction in graph database optimization: value-based preprocessing rather than runtime query optimization.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1764468</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1764468</link>
        <title><![CDATA[Fairness across domains: a unified fairness-aware framework for domain generalization and unsupervised adaptation]]></title>
        <pubdate>2026-05-11T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Kai Jiang</author><author>Chen Zhao</author><author>Haoliang Wang</author><author>Xintao Wu</author><author>Latifur Khan</author><author>Christan Grant</author><author>Feng Chen</author>
        <description><![CDATA[Fairness in machine learning remains a critical challenge, particularly in the presence of domain shift. We propose a unified fairness-aware framework for both domain generalization (DG) and unsupervised domain adaptation (UDA), which jointly addresses domain shift and sensitive-attribute bias through disentangled representation learning. The framework disentangles content, style, and sensitive factors, and uses them to generate augmented samples that reduce bias while maintaining predictive reliability. Extensive experiments on four datasets demonstrate that the proposed method achieves state-of-the-art performance in both DG and UDA settings. Moreover, it yields a stronger balance between classification accuracy and fairness across diverse domains and sensitive subgroups. By incorporating unlabeled target-domain data, our framework extends prior fairness-aware approaches that were limited to DG and provides new insight into fairness-aware learning under unsupervised adaptation. Overall, this work offers a practical step toward scalable and robust fairness-aware learning in multi-domain environments.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/fdata.2026.1821612</guid>
        <link>https://www.frontiersin.org/articles/10.3389/fdata.2026.1821612</link>
        <title><![CDATA[A longitudinal multimodal big data infrastructure for precision poultry monitoring]]></title>
        <pubdate>2026-05-11T00:00:00Z</pubdate>
        <category>Methods</category>
        <author>Daniel Essien</author><author>Yashan Dhaliwal</author><author>Suresh Neethirajan</author>
        <description><![CDATA[Livestock systems are increasingly instrumented with heterogeneous sensors, yet the resulting data remain fragmented, short-lived, and rarely documented as integrated infrastructures. This gap limits the development of robust multimodal artificial intelligence under real production conditions. Here we present a longitudinal multimodal data infrastructure for poultry monitoring, spanning 22 consecutive weeks across five commercial-style barns. The dataset combines continuous RGB video (1080 p, 30 fps), continuous audio (48 kHz), periodic radiometric thermal imaging, and twice-daily environmental measurements, yielding 10.2 terabytes of temporally heterogeneous data. Rather than focusing on a specific predictive task, the study addresses the underlying data-engineering challenge: how to acquire, synchronize, store, and preprocess multimodal streams at production scale. We detail a reproducible system architecture for distributed sensing, local buffering, secure transfer, and cloud-based organization, together with standardized preprocessing pipelines for illumination correction, acoustic denoising, and radiometric temperature extraction. Temporal alignment is achieved through timestamp-based normalization across asynchronous modalities, with explicit characterization of alignment granularity and missing data under real-world constraints. This work positions multimodal livestock sensing as a data-systems problem. The resulting dataset supports longitudinal analysis, cross-modal querying, and the development and evaluation of machine learning and multimodal fusion approaches at appropriate temporal scales. By releasing both data and workflows, we provide a transparent and extensible foundation for building and evaluating AI systems in precision agriculture.]]></description>
      </item>
      </channel>
    </rss>