<?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0">
      <channel xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <title>Frontiers in Artificial Intelligence | Pattern Recognition section | New and Recent Articles</title>
        <link>https://www.frontiersin.org/journals/artificial-intelligence/sections/pattern-recognition</link>
        <description>RSS Feed for Pattern Recognition section in the Frontiers in Artificial Intelligence journal | New and Recent Articles</description>
        <language>en-us</language>
        <generator>Frontiers Feed Generator,version:1</generator>
        <pubDate>2026-04-30T00:58:20.632+00:00</pubDate>
        <ttl>60</ttl>
        <item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1770342</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1770342</link>
        <title><![CDATA[FSD-Net: underwater object detection based on frequency and spatial domain feature enhancement]]></title>
        <pubdate>2026-04-17T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Chao Zhang</author><author>Shuang Wu</author><author>Baohua Huang</author><author>Binchen Zhao</author><author>Fengqi Cui</author><author>Xingkun Li</author>
        <description><![CDATA[BackgroundComplex underwater visual conditions cause severe missed and false detections in conventional object detection models, hindering reliable autonomous underwater exploration. This work addresses these key performance limitations.MethodsWe propose FSD-Net, a novel underwater detection model with two core enhancement modules. The Frequency Attention Convolution Module reduces missed detections via frequency-domain spatial feature preservation, and the Multi-dimensional Feature Enhancement Module suppresses false detections via enhanced semantic fusion. Experiments include ablation studies and state-of-the-art method comparisons on the UTDAC2020 and Brackish datasets.ResultsFSD-Net achieves state-of-the-art performance on both tested datasets. On the UTDAC2020 dataset, it reaches 85.7% AP50 and 82.5% F1-score, with a 3.8% AP50 improvement over the baseline model. On the Brackish dataset, it achieves 98.1% AP50 and 97.0% F1-score, with a 3.9% AP50 improvement over the baseline. The model outperforms all compared mainstream detection algorithms, and ablation studies validate the effectiveness of both proposed modules.ConclusionFSD-Net's joint frequency-spatial enhancement strategy effectively mitigates underwater image degradation challenges, providing a robust detection solution for autonomous underwater exploration. The proposed dual-module design offers a practical reference for detection model optimization in complex visual environments, with future work focused on lightweight model optimization.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1760137</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1760137</link>
        <title><![CDATA[Machine learning based approach to intrusion detection in internet of things environments]]></title>
        <pubdate>2026-04-17T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Oluwatoyin Esther Akinbowale</author><author>Adebola Tajudeen Adesina</author><author>Mulatu Fekadu Zerihun</author><author>Polly Mashigo</author>
        <description><![CDATA[The growing security requirements of Internet of Things (IoT) networks where heterogeneous networks and resource-constrained devices offer exponentially increased attack surface, was addressed using machine learning based intrusion detection system. Open source secondary quantitative IoT intrusion traffic data was obtained and trained using machine learning models. The dataset comprises of over one million labeled flow records consisting of 34 kinds of attacks and benign traffic. First, extensive preprocessing including managing of missing values, encoding features, scaling, and removal of redundancy was carried out followed by the training of three supervised machine learning (ML) classifiers namely Decision Tree (DT), Random Forest (RF), and Support Vector Machine (SVM) for the differentiation of the different types of intrusions. The performance evaluation of the ML models was conducted by evaluating the accuracy, precision, and recall, and F1-score. It was observed that Decision Tree model was the most outstanding in terms of overall accuracy (99.36%) and respectable performance in prevalent attack classes, and was closely followed ccy Random Forest (99.27%) while SVM lagged behind with an accuracy of 80.08% due to computational constraints in handling massive amounts of big data. Inter-arrival time and total packet size were identified as the significant discriminators in malicious behavior through feature-importance analysis. Conclusively, the tree-based models, and specifically Decision Trees, offer extremely effective and interpretable solution for real-time IoT intrusion detection, and provide future avenues in handling class imbalance and examining lightweight, ensemble, and deep-learning approaches for robust detection of rare and unknown threats. This study contributes to cybersecurity via the identification and classification of intrusions in IoT devices for proper mitigation.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1794923</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1794923</link>
        <title><![CDATA[Automatic recognition of dynamic signs of Mexican sign language using deep learning]]></title>
        <pubdate>2026-04-16T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Jesús Antonio Navarrete-López</author><author>Michelle Sainos-Vizuett</author><author>Irvin Hussein Lopez-Nava</author>
        <description><![CDATA[IntroductionOver four million individuals in Mexico face communication barriers due to hearing impairments. Sign language serves as an essential communication tool within the deaf community; however, automatic translation between sign and oral languages remains a significant challenge. This study proposes an approach for recognizing dynamic gestures from Mexican Sign Language (LSM) to support the development of assistive communication technologies.MethodsIn collaboration with expert interpreters, an LSM corpus comprising 121 signs was developed, including a specialized lexicon focused on medical emergencies and accident scenarios. A standardized video acquisition protocol was implemented with both expert and non-expert participants. The proposed methodology consists of skeletal keypoint extraction using MediaPipe, data augmentation through frame sampling, and dataset normalization. Multiple deep learning architectures were evaluated, including ResNet, Simple RNN, LSTM, Bidirectional LSTM (BiLSTM), Gated Recurrent Units (GRU), a Transformer encoder, and a hybrid ResNet–Transformer model.ResultsAmong the evaluated models, the ResNet architecture achieved the best performance, obtaining an F1-score of 0.948 under subject-independent evaluation, with an average inference time of 0.468 seconds. Hyperparameter optimization analysis indicated that performance improvements were primarily driven by training dynamics and regularization strategies rather than increases in architectural depth.DiscussionThe results demonstrate the effectiveness of deep learning–based approaches for dynamic LSM gesture recognition and highlight the importance of optimization strategies for robust generalization. This work contributes toward LSM-to-Spanish translation systems and provides a foundation for advancing data-driven sign language recognition technologies.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1811469</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1811469</link>
        <title><![CDATA[Editorial: AI-enabled breakthroughs in computational imaging and computer vision]]></title>
        <pubdate>2026-04-07T00:00:00Z</pubdate>
        <category>Editorial</category>
        <author>Liping Zhang</author><author>Xiaobo Li</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1796099</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1796099</link>
        <title><![CDATA[Dynamic-focus transformer for point cloud segmentation]]></title>
        <pubdate>2026-04-02T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Ziwen Wang</author><author>Xiaoting Fan</author><author>Mei Yu</author><author>Jianlu Liu</author><author>Shuai Wang</author><author>Yonghua Wang</author><author>Chuanfu Wu</author>
        <description><![CDATA[Transformer-based methods have significantly advanced 3D point cloud segmentation by effectively capturing long-range dependencies. However, the global or fixed-window self-attention mechanisms they often employ suffer from computational redundancy and overfitting due to processing excessive, potentially irrelevant key-value pairs for each query. To address this, we propose the Dynamic-Focus Transformer, a novel architecture that introduces a data-dependent adaptive attention mechanism. Through learned soft point masks, we selectively sparsify keys and values to focus on semantically critical regions. Our method enables flexible, input-adaptive receptive fields without the heavy memory overhead associated with per-point offset learning in deformable designs. Furthermore, when integrated into a U-Net-style encoder-decoder, our method attains a highly efficient balance between modeling capability and computational cost. Extensive experiments on S3DIS and ScanNetv2 benchmarks demonstrate that our method achieves state-of-the-art performance with notably improved efficiency, validating its effectiveness for large-scale point cloud understanding.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1818182</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1818182</link>
        <title><![CDATA[Correction: Painting authentication using CNNs and sliding window feature extraction]]></title>
        <pubdate>2026-03-13T00:00:00Z</pubdate>
        <category>Correction</category>
        <author>Juan Ruiz de Miras</author><author>José Luis Vílchez</author><author>María José Gacto</author><author>Domingo Martín</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1721866</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1721866</link>
        <title><![CDATA[InfoMSD: an information-maximization self-distillation framework for parameter-efficient fine-tuning on artwork images]]></title>
        <pubdate>2026-03-04T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Feng Guan</author><author>Hao Hong</author><author>Yong Wang</author>
        <description><![CDATA[In recent years, despite the remarkable performance of large-scale vision language models across various visual classification tasks, their substantial parameter counts and high fine-tuning costs have hindered deployment in resource-constrained cultural and artwork settings. This work specifically addresses the task of object recognition in artwork—that is, identifying semantic objects (e.g., animals, people, everyday items) depicted within paintings, sketches, and other artistic renditions, rather than classifying artistic styles or genres. To address this issue, we propose InfoMSD, an unsupervised, Information-Maximization Self-Distillation framework designed for parameter-efficient fine-tuning on unlabeled artwork imagery while preserving robust performance. Specifically, InfoMSD incorporates a teacher-student architecture in the self-distillation phase, where the teacher model generates pseudo-labels for artworks, and the student model learns from the teacher through cross-entropy. By aligning the student's predictions with the discriminative signals from the teacher's pseudo-labels and simultaneously applying entropy-based regularization to sharpen the probability distribution and balance class coverage, the framework improves both the quality of the pseudo-labels and the discriminative capacity of the model. To enable parameter-efficient fine-tuning, only the layer norm parameters and visual prompts in the student model are updated, while the remaining parameters are frozen, significantly reducing computational overhead. Extensive experimental results on artwork datasets show that InfoMSD achieves accuracy improvements of +6.43 and +3.02% over CLIP zero-shot baselines, while adjusting less than 1% of the model parameters. Compared to existing lightweight distillation methods, InfoMSD achieves average accuracy gains of 1.35 and 0.96%, respectively. Overall, InfoMSD offers a novel, information-theoretic paradigm for unsupervised and efficient fine-tuning in object recognition within artistic imagery, balancing performance and efficiency.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1766828</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1766828</link>
        <title><![CDATA[Real-time grading method of tunnel surrounding rock based on image recognition]]></title>
        <pubdate>2026-02-05T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Yihuan Xiao</author><author>Hao Yuan</author><author>Qingye Shi</author><author>Zemin Qiu</author><author>Liao Tang</author><author>Yihua Yu</author><author>Yabin Li</author><author>Yin Pan</author><author>Qinghua Xiao</author>
        <description><![CDATA[To enable rapid, accurate grading of tunnel surrounding rock during construction, we propose a real-time grading method that integrates image processing with lightweight deep learning. We developed an automated pipeline that combines image-processing techniques and machine-learning algorithms to extract and classify characteristic parameters of tunnel surrounding rock, enabling real-time monitoring and classification at the tunnel palm surface. The study demonstrates that: (1) Following the proposed image-acquisition standards for rock and tunnel palm surfaces, images are converted to grayscale, denoised, enhanced, and normalized, which facilitates efficient and accurate extraction of structural features and improves the precision of classification parameters; (2) An optimized lithology identification and classification model was built, and a rock-hardness, strength, and integrity sensing approach based on the ShuffleNetV2 convolutional neural network was introduced to achieve real-time surrounding-rock grading. On an engineering site, the method attains 85% accuracy for lithology classification, 75% for rock-mass integrity, and 80% for overall surrounding-rock grade, confirming its feasibility and practical value. These results offer theoretical insight and engineering utility for the scientific evaluation of tunnel surrounding-rock grade.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1714882</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1714882</link>
        <title><![CDATA[CBAM-DenseNet with multi-feature quality filtering: advancing accuracy in small-sample iris recognition]]></title>
        <pubdate>2026-01-27T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Yongheng Pang</author><author>Zishen Wang</author><author>Nan Jiang</author><author>Jia Qin</author><author>Suyuan Li</author>
        <description><![CDATA[In the context of the information age, traditional password and key-based authentication mechanisms are no longer sufficient to meet the growing demands for information security. Iris recognition technology has garnered attention due to its high security and uniqueness. Current iris recognition methods based on single feature extraction are prone to loss of feature information, which affects recognition rates. To address this, this paper proposes a multi-feature fusion-based iris recognition method. The method employs a comprehensive quality evaluation scheme to filter iris images, ensuring the quality of the input images. An improved CAN network is used to effectively remove image noise, and a DenseNet network-based iris feature extraction method is combined with a fusion space and attention mechanism (CBAM) to enhance the expressiveness of features. Through experiments with small sample sizes and testing on various public iris databases, the proposed method has been validated for significant improvements in recognition accuracy and robustness.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1675834</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1675834</link>
        <title><![CDATA[An improved YOLOv10-based framework for knee MRI lesion detection with enhanced small object recognition and low contrast feature extraction]]></title>
        <pubdate>2026-01-20T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Hongwei Yang</author><author>Wenqu Song</author><author>Tiankai Jiang</author><author>Chuanhao Wang</author><author>Luping Zhang</author><author>Zhian Cai</author><author>Yuhan Sun</author><author>Qing Zhao</author><author>Yuyu Sun</author>
        <description><![CDATA[Rationale and objectivesTo address the challenges in detecting anterior cruciate ligament (ACL) lesions in knee MRI examinations, including difficulties in identifying tiny lesions, insufficient extraction of low-contrast features, and poor modeling of irregular lesion morphologies, and to provide a precise and efficient auxiliary diagnostic tool for clinical practice.Materials and methodsAn enhanced framework based on YOLOv10 is constructed. The backbone network is optimized using the C2f-SimAM module to enhance multi-scale feature extraction and spatial attention; an Adaptive Spatial Fusion (ASF) module is introduced in the neck to better fuse multi-scale spatial features; and a novel hybrid loss function combining Focal-EIoU and KPT Loss is employed. To ensure rigorous statistical evaluation, we utilized a five-fold cross-validation strategy on a dataset of 917 cases.ResultsEvaluation on the KneeMRI dataset demonstrates that the proposed model achieves statistically significant improvements over standard YOLOv10, Faster R-CNN, and Transformer-based detectors (RT-DETR). Specifically, mAP@0.5 is increased by 1.3% (p < 0.05) compared to the standard YOLOv10, and mAP@0.5:0.95 is improved by 2.5%. Qualitative analysis further confirms the model's ability to reduce false negatives in small, low-contrast tears.ConclusionThis framework effectively connects general object detection models with the specific requirements of medical imaging, providing a precise and efficient solution for diagnosing ACL injuries in routine clinical workflows.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1732820</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1732820</link>
        <title><![CDATA[Laplace-guided fusion network for camouflage object detection]]></title>
        <pubdate>2026-01-14T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Jiangxiao Zhang</author><author>Feng Gao</author><author>Shengmei He</author><author>Bin Zhang</author>
        <description><![CDATA[Camouflaged object detection (COD) aims to identify objects that are visually indistinguishable from their surrounding background, making it challenging to precisely distinguish the boundaries between objects and backgrounds in camouflaged environments. In recent years, numerous studies have leveraged frequency-domain methods to aid in camouflage target detection by utilizing frequency-domain information. However, current methods based on the frequency domain cannot effectively capture the boundary information between disguised objects and the background. To address this limitation, we propose a Laplace transform-guided camouflage object detection network called the Self-Correlation Cross Relation Network (SeCoCR). In this framework, the Laplace-transformed camouflage target is treated as high-frequency information, while the original image serves as low-frequency information. These are then separately input into our proposed Self-Relation Attention module to extract both local and global features. Within the Self-Relation Attention module, key semantic information is retained in the low-frequency data, and crucial boundary information is preserved in the high-frequency data. Furthermore, we design a multi-scale attention mechanism for low- and high-frequency information, Low-High Mix Fusion, to effectively integrate essential information from both frequencies for camouflage object detection. Comprehensive experiments on three COD benchmark datasets demonstrate that our approach significantly surpasses existing state-of-the-art frequency-domain-assisted methods.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1738444</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1738444</link>
        <title><![CDATA[Painting authentication using CNNs and sliding window feature extraction]]></title>
        <pubdate>2026-01-13T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Juan Ruiz de Miras</author><author>José Luis Vílchez</author><author>María José Gacto</author><author>Domingo Martín</author>
        <description><![CDATA[Painting authentication is an inherently complex task, often relying on a combination of connoisseurship and technical analysis. This study focuses on the authentication of a single painting attributed to Paolo Veronese, using a convolutional neural network approach tailored to severe data scarcity. To ensure that stylistic comparisons were based on artistic execution rather than iconographic differences, the dataset was restricted to paintings depicting the Holy Family, the same subject as the work under authentication. A custom shallow convolutional network was developed to process multichannel inputs (RGB, grayscale, and edge maps) extracted from overlapping patches via a sliding-window strategy. This patch-based design expanded the dataset from a small number of paintings to thousands of localized patches, enabling the model to learn microtextural and brushstroke features. Regularization techniques were employed to enhance generalization, while a painting-level cross-validation strategy was used to prevent data leakage. The model achieved high classification performance (accuracy of 94.51%, Area under the Curve 0.99) and generated probability heatmaps that revealed stylistic coherence in authentic Veronese works and fragmentation in non-Veronese paintings. The work under examination yielded an intermediate global mean Veronese probability (61%) with extensive high-probability regions over stylistically salient passages, suggesting partial stylistic affinity. The results support the use of patch-based models for stylistic analysis in art authentication, especially under domain-specific data constraints. While the network provides strong probabilistic evidence of stylistic affinity, definitive attribution requires further integration with historical, technical, and provenance-based analyses.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1669512</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1669512</link>
        <title><![CDATA[IndiaScene365: a transfer learning dataset for Indian scene understanding in diverse weather condition]]></title>
        <pubdate>2026-01-09T00:00:00Z</pubdate>
        <category>Data Report</category>
        <author>Deepa Mane</author><author>Sandhya Arora</author><author>Sachin Shelke</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1660388</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1660388</link>
        <title><![CDATA[Demographic identification of Greater Caribbean manatees via acoustic feature learning]]></title>
        <pubdate>2026-01-08T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Fernando Merchan</author><author>Kenji Contreras</author><author>Héctor Poveda</author><author>Rocío M. Estévez</author><author>Hector M. Guzman</author><author>Javier E. Sanchez-Galan</author>
        <description><![CDATA[Demographic inference from vocalizations is essential for monitoring endangered Greater Caribbean manatees (Trichechus manatus manatus) in tropical environments where direct observation is limited. While passive acoustic monitoring has proven effective for manatee detection and individual identification, the ability to classify sex and age from vocalizations remains unexplored, limiting ecological insights into population structure and reproductive dynamics. We investigated whether machine learning can accurately classify sex and age from manatee acoustic signals using 1,285 vocalizations from 20 wild individuals captured in the Changuinola River, Panama. Acoustic features including spectral envelope descriptors (MFCCs), harmonic content (chroma), and temporal-frequency parameters were extracted and analyzed using two feature sets: SET1 (30 spectral-cepstral features) and SET2 (38 features augmented with explicit pitch and temporal descriptors). Four classification algorithms (Random Forest, XGBoost, SVM, LDA) were trained under Leave-One-Group-Out cross-validation with SMOTE oversampling to address class imbalance. Sex classification achieved 85%–87% accuracy (75%–78% macro-F1) with balanced performance across both classes (female: 86%, male: 79%), validating operational feasibility for passive monitoring applications. However, subject-level bootstrap analysis revealed substantial individual heterogeneity (female: 95% CI: 68.7%–96.4%, male: 75.1%–83.6%), indicating that approximately 10%–15% of individuals exhibit systematic misclassification due to atypical acoustic signatures. Spectral envelope characteristics (MFCCs, spectral skewness) rather than fundamental frequency were most discriminative, suggesting sex-related variation manifests in vocal tract resonance patterns. Age classification achieved 73%–85% global accuracy but exhibited severe juvenile under-detection (14%–26% recall), with bootstrap confidence intervals spanning 9.3%–86.3% for juveniles vs. 60.7%–84.7% for adults. Dimensionality reduction (PCA, t-SNE) revealed substantial overlap between juvenile and adult acoustic feature distributions, with clearer age structure visible primarily within female clusters, contributing to systematic misclassification of male juveniles. Threshold optimization improved juvenile recall to 63% but increased false positives to 37%, presenting trade-offs for conservation surveillance. Acoustic body size regression demonstrated promising continuous estimation (MAE = 0.208 m, R2 = 0.33), offering an alternative to categorical age classification by enabling coarse demographic profiling when integrated with sex inference. These findings establish the operational viability of acoustic sex classification for manatee conservation while highlighting fundamental challenges in categorical age inference due to continuous ontogenetic variation and limited juvenile samples. However, acoustic body size regression offers a promising complementary approach, enabling continuous demographic profiling across size classes rather than discrete age categories. Integration with established individual identification frameworks would enable comprehensive acoustic mark-recapture, simultaneously estimating abundance, sex ratios, size distributions, and demographic structure from long-term hydrophone deployments without requiring visual confirmation of body dimensions.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1558358</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1558358</link>
        <title><![CDATA[Anatomical study and early diagnosis of dome galls in Cordia Dichotoma using DeepSVM model]]></title>
        <pubdate>2026-01-05T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Said Khalid Shah</author><author>Mazliham Bin Mohd Su’ud</author><author>Aurangzeb Khan</author><author>Muhammad Mansoor Alam</author><author>Muhammad Ayaz</author>
        <description><![CDATA[IntroductionArtificial intelligence (AI), particularly deep learning (DL), offers automated solutions for early detection of plant diseases to improve crop yield. However, training accurate models on real-field data remains challenging due to over fitting and limited generalization. As observed in prior studies, traditional CNNs often struggle with real-environment variability, and transfer learning can lead to instability in training on domain-specific leaf datasets. This study focuses on detecting dome galls, a disease in Cordia dichotoma, by formulating a binary classification task (healthy vs. diseased leaves) using a custom dataset of 3,900 leaf images collected from real field environments.MethodsInitially, both custom CNNs and transfer learning models were trained and compared. Among them, a modified ResNet-50 architecture showed promising results but suffered from over fitting and unstable convergence. To address this, the final sigmoid activation layer was replaced with a Support Vector Machine (SVM), and L2 regularization was applied to reduce over fitting. This hybrid DeepSVM architecture stabilized training and improved model robustness. Image preprocessing and augmentation techniques were applied to increase variability and prevent over fitting.ResultsThe final model was evaluated on a separate test set of 400 images, and the results remained stable across repeated runs. DeepSVM achieved an accuracy of 94.50% and an F1-score of 94.47%, outperforming other well-known models like VGG-16, InceptionResNetv2, and MobileNet-V2.ConclusionThese results indicate that the proposed DeepSVM approach offers better generalization and training stability than conventional CNN classifiers, potentially aiding in automated disease monitoring for precision agriculture.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1703135</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1703135</link>
        <title><![CDATA[Hypergraph-based contrastive learning for enhanced fraud detection]]></title>
        <pubdate>2025-11-26T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Qinhong Wang</author><author>Yiming Shen</author><author>Husheng Dong</author>
        <description><![CDATA[The proliferation of digital platforms has enabled fraudsters to deploy sophisticated camouflage techniques, such as multi-hop collaborative attacks, to evade detection. Traditional Graph Neural Networks (GNNs) often fail to capture these complex high-order patterns due to limitations including homophily assumption failures, severe label imbalance, and noise amplification during deep aggregation. To address these challenges, we propose the Hypergraph-based Contrastive Learning Network (HCLNet), a novel framework integrating three synergistic innovations. Firstly, multi-relational hypergraph fusion encodes heterogeneous associations into hyperedges, explicitly modeling group-wise fraud syndicates beyond pairwise connections. Secondly, a multi-head gated hypergraph aggregation mechanism employs parallel attention heads to capture diverse fraud patterns, dynamically balances original and high-order features via gating, and stabilizes training through residual connections with layer normalization. Thirdly, hierarchical dual-view contrastive learning jointly applies feature masking and topology dropout at both node and hyperedge levels, constructing augmented views to optimize self-supervised discrimination under label scarcity. Extensive experiments on two real-world datasets demonstrate HCLNet's superior performance, achieving significant improvements over the baselines across key evaluation metrics. The model's ability to reveal distinctive separation patterns between fraudulent and benign entities underscores its practical value in combating evolving camouflaged fraud tactics in digital ecosystems.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1649452</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1649452</link>
        <title><![CDATA[BLoss-DDNet: bending loss and dual-task decoding network for overlapping cell nucleus segmentation of cervical clinical LBC images]]></title>
        <pubdate>2025-11-26T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Guihua Yang</author><author>Ziran Chen</author><author>Peng Guo</author><author>Junchi Ma</author><author>Jinjie Huang</author><author>Cong Jin</author><author>Xiaona Yang</author><author>Kai Zhao</author><author>Yibo Wang</author><author>Qi Gao</author><author>Chengcheng Liu</author><author>Tianqi Wu</author><author>Yong Li</author><author>Yingwei Guo</author><author>Jie Zheng</author><author>Xiangran Cai</author><author>Yingjian Yang</author>
        <description><![CDATA[IntroductionCervical cancer has become one of the most malignant tumors that threatens women's health worldwide. Liquid-based cytology (LBC) examination has become the most common screening method for detecting cervical cancer early and preventing it. Currently, nuclear segmentation technology for cervical clinical LBC images based on convolutional neural networks has become a vital means of assisting in the diagnosis of cervical cancer. However, the existing nuclear segmentation techniques fail to segment the nuclei of severely overlapping nuclei in highly aggregated cell clusters, which will inevitably lead to the misdiagnosis of cervical cancer pathology.MethodsTherefore, a novel bending loss and dual-task decoding network (Bloss-DDNet) is proposed for overlapping cell nucleus segmentation of cervical clinical LBC images. First, the network architecture search method is introduced to search and optimize the architecture of the decoding module in the dual-task branch, determining the mask and boundary decoding modules (dual-task decoding modules) of the Bloss-DDNet. Second, two feature maps, separately generated from dual-task decoding branches composed of a shared encoder module and dual-task decoder modules, are fused to enhance the sensitivity to cell nucleus boundaries. Third, a bending loss is introduced to the loss function to focus on the curvature variation characteristics of the intersection of overlapping cell nucleus boundaries, thereby constraining the training process of the dual-task decoding branch and increasing the constraint on the cell nucleus boundary.ResultsThe results show that all evaluation metrics of the proposed Bloss-DDNet achieved the best performance on public datasets.DiscussionTherefore, the proposed Bloss-DDNet can effectively address the segmentation problem of overlapping cell clusters and nuclei in clinical LBC images, providing strong support for subsequent clinical auxiliary diagnosis of cervical cancer.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1681277</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1681277</link>
        <title><![CDATA[Resource-efficient fine-tuning of large vision-language models for multimodal perception in autonomous excavators]]></title>
        <pubdate>2025-11-18T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Hung Viet Nguyen</author><author>Hyojin Park</author><author>Namhyun Yoo</author><author>Jinhong Yang</author>
        <description><![CDATA[Recent advances in large vision-language models (LVLMs) have transformed visual recognition research by enabling multimodal integration of images, text, and videos. This fusion supports a deeper and more context-aware understanding of visual environments. However, the application of LVLMs to multitask visual recognition in real-world construction scenarios remains underexplored. In this study, we present a resource-efficient framework for fine-tuning LVLMs tailored to autonomous excavator operations, with a focus on robust detection of humans and obstacles, as well as classification of weather conditions on consumer-grade hardware. By leveraging Quantized Low-Rank Adaptation (QLoRA) in conjunction with the Unsloth framework, our method substantially reduces memory consumption and accelerates fine-tuning compared with conventional approaches. We comprehensively evaluate a domain-specific excavator-vision dataset using five open-source LVLMs. These include Llama-3.2-Vision, Qwen2-VL, Qwen2.5-VL, LLaVA-1.6, and Gemma 3. Each model is fine-tuned on 1,000 annotated frames and tested on 2000 images. Experimental results demonstrate significant improvements in both object detection and weather classification, with Qwen2-VL-7B achieving an mAP@50 of 88.03%, mAP@[0.50:0.95] of 74.20%, accuracy of 84.54%, and F1 score of 78.83%. Our fine-tuned Qwen2-VL-7B model not only detects humans and obstacles robustly but also classifies weather accurately. These results illustrate the feasibility of deploying LVLM-based multimodal AI agents for safety monitoring, pose estimation, activity tracking, and strategic planning in autonomous excavator operations.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1671099</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1671099</link>
        <title><![CDATA[Shape modeling of longitudinal medical images: from diffeomorphic metric mapping to deep learning]]></title>
        <pubdate>2025-10-30T00:00:00Z</pubdate>
        <category>Review</category>
        <author>Edwin Tay</author><author>Nazli Tümer</author><author>Amir A. Zadpoor</author>
        <description><![CDATA[Living biological tissue is a complex system, constantly growing and changing in response to external and internal stimuli. These processes lead to remarkable and intricate changes in shape. Modeling and understanding both natural and pathological (or abnormal) changes in the shape of anatomical structures is highly relevant, with applications in diagnostic, prognostic, and therapeutic healthcare. Nevertheless, modeling the longitudinal shape change of biological tissue is a non-trivial task due to its inherent nonlinear nature. In this review, we highlight several existing methodologies and tools for modeling longitudinal shape change (i.e., spatiotemporal shape modeling). These methods range from diffeomorphic metric mapping to deep-learning based approaches (e.g., autoencoders, generative networks, recurrent neural networks, etc.). We discuss the synergistic combinations of existing technologies and potential directions for future research, underscoring key deficiencies in the current research landscape.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1575427</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1575427</link>
        <title><![CDATA[Oral squamous cell carcinoma grading classification using deep transformer encoder assisted dilated convolution with global attention]]></title>
        <pubdate>2025-10-17T00:00:00Z</pubdate>
        <category>Brief Research Report</category>
        <author>Singaraju Ramya</author><author>R. I. Minu</author>
        <description><![CDATA[In recent years, Oral Squamous Cell Carcinoma (OSCC) has been a common tumor in the orofacial region, affecting areas such as the teeth, jaw, and temporomandibular joint. OSCC is classified into three grades: “well-differentiated, moderately differentiated, and poorly differentiated,” with a high morbidity and mortality rate among patients. Several existing methods, such as AlexNet, CNN, U-Net, and V-Net, have been used for OSCC classification. However, these methods face limitations, including low ACC, poor comparability, insufficient data collection, and prolonged training times. To address these limitations, we introduce a novel Deep Transformer Encoder-Assisted Dilated Convolution with Global Attention (DeTr-DiGAtt) model for OSCC classification. To enhance the dataset and mitigate over-fitting, a GAN model is employed for data augmentation. Additionally, an Adaptive Bilateral Filter (Ad-BF) is used to improve image quality and remove undesirable noise. For accurate identification of the affected region, an Improved Multi-Encoder Residual Squeeze U-Net (Imp-MuRs-Unet) model is utilized for segmentation. The DeTr-DiGAtt model is then applied to classify different OSCC grading levels. Furthermore, an Adaptive Grey Lag Goose Optimization Algorithm (Ad-GreLop) is used for hyperparameter tuning. The proposed method achieves an accuracy (ACC) of 98.59%, a Dice score of 97.97%, and an Intersection over Union (IoU) of 98.08%.]]></description>
      </item>
      </channel>
    </rss>