<?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0">
      <channel xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <title>Frontiers in Artificial Intelligence | Pattern Recognition section | New and Recent Articles</title>
        <link>https://www.frontiersin.org/journals/artificial-intelligence/sections/pattern-recognition</link>
        <description>RSS Feed for Pattern Recognition section in the Frontiers in Artificial Intelligence journal | New and Recent Articles</description>
        <language>en-us</language>
        <generator>Frontiers Feed Generator,version:1</generator>
        <pubDate>2026-04-07T23:57:20.536+00:00</pubDate>
        <ttl>60</ttl>
        <item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1811469</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1811469</link>
        <title><![CDATA[Editorial: AI-enabled breakthroughs in computational imaging and computer vision]]></title>
        <pubdate>2026-04-07T00:00:00Z</pubdate>
        <category>Editorial</category>
        <author>Liping Zhang</author><author>Xiaobo Li</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1796099</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1796099</link>
        <title><![CDATA[Dynamic-focus transformer for point cloud segmentation]]></title>
        <pubdate>2026-04-02T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Ziwen Wang</author><author>Xiaoting Fan</author><author>Mei Yu</author><author>Jianlu Liu</author><author>Shuai Wang</author><author>Yonghua Wang</author><author>Chuanfu Wu</author>
        <description><![CDATA[Transformer-based methods have significantly advanced 3D point cloud segmentation by effectively capturing long-range dependencies. However, the global or fixed-window self-attention mechanisms they often employ suffer from computational redundancy and overfitting due to processing excessive, potentially irrelevant key-value pairs for each query. To address this, we propose the Dynamic-Focus Transformer, a novel architecture that introduces a data-dependent adaptive attention mechanism. Through learned soft point masks, we selectively sparsify keys and values to focus on semantically critical regions. Our method enables flexible, input-adaptive receptive fields without the heavy memory overhead associated with per-point offset learning in deformable designs. Furthermore, when integrated into a U-Net-style encoder-decoder, our method attains a highly efficient balance between modeling capability and computational cost. Extensive experiments on S3DIS and ScanNetv2 benchmarks demonstrate that our method achieves state-of-the-art performance with notably improved efficiency, validating its effectiveness for large-scale point cloud understanding.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1818182</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1818182</link>
        <title><![CDATA[Correction: Painting authentication using CNNs and sliding window feature extraction]]></title>
        <pubdate>2026-03-13T00:00:00Z</pubdate>
        <category>Correction</category>
        <author>Juan Ruiz de Miras</author><author>José Luis Vílchez</author><author>María José Gacto</author><author>Domingo Martín</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1721866</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1721866</link>
        <title><![CDATA[InfoMSD: an information-maximization self-distillation framework for parameter-efficient fine-tuning on artwork images]]></title>
        <pubdate>2026-03-04T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Feng Guan</author><author>Hao Hong</author><author>Yong Wang</author>
        <description><![CDATA[In recent years, despite the remarkable performance of large-scale vision language models across various visual classification tasks, their substantial parameter counts and high fine-tuning costs have hindered deployment in resource-constrained cultural and artwork settings. This work specifically addresses the task of object recognition in artwork—that is, identifying semantic objects (e.g., animals, people, everyday items) depicted within paintings, sketches, and other artistic renditions, rather than classifying artistic styles or genres. To address this issue, we propose InfoMSD, an unsupervised, Information-Maximization Self-Distillation framework designed for parameter-efficient fine-tuning on unlabeled artwork imagery while preserving robust performance. Specifically, InfoMSD incorporates a teacher-student architecture in the self-distillation phase, where the teacher model generates pseudo-labels for artworks, and the student model learns from the teacher through cross-entropy. By aligning the student's predictions with the discriminative signals from the teacher's pseudo-labels and simultaneously applying entropy-based regularization to sharpen the probability distribution and balance class coverage, the framework improves both the quality of the pseudo-labels and the discriminative capacity of the model. To enable parameter-efficient fine-tuning, only the layer norm parameters and visual prompts in the student model are updated, while the remaining parameters are frozen, significantly reducing computational overhead. Extensive experimental results on artwork datasets show that InfoMSD achieves accuracy improvements of +6.43 and +3.02% over CLIP zero-shot baselines, while adjusting less than 1% of the model parameters. Compared to existing lightweight distillation methods, InfoMSD achieves average accuracy gains of 1.35 and 0.96%, respectively. Overall, InfoMSD offers a novel, information-theoretic paradigm for unsupervised and efficient fine-tuning in object recognition within artistic imagery, balancing performance and efficiency.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1766828</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1766828</link>
        <title><![CDATA[Real-time grading method of tunnel surrounding rock based on image recognition]]></title>
        <pubdate>2026-02-05T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Yihuan Xiao</author><author>Hao Yuan</author><author>Qingye Shi</author><author>Zemin Qiu</author><author>Liao Tang</author><author>Yihua Yu</author><author>Yabin Li</author><author>Yin Pan</author><author>Qinghua Xiao</author>
        <description><![CDATA[To enable rapid, accurate grading of tunnel surrounding rock during construction, we propose a real-time grading method that integrates image processing with lightweight deep learning. We developed an automated pipeline that combines image-processing techniques and machine-learning algorithms to extract and classify characteristic parameters of tunnel surrounding rock, enabling real-time monitoring and classification at the tunnel palm surface. The study demonstrates that: (1) Following the proposed image-acquisition standards for rock and tunnel palm surfaces, images are converted to grayscale, denoised, enhanced, and normalized, which facilitates efficient and accurate extraction of structural features and improves the precision of classification parameters; (2) An optimized lithology identification and classification model was built, and a rock-hardness, strength, and integrity sensing approach based on the ShuffleNetV2 convolutional neural network was introduced to achieve real-time surrounding-rock grading. On an engineering site, the method attains 85% accuracy for lithology classification, 75% for rock-mass integrity, and 80% for overall surrounding-rock grade, confirming its feasibility and practical value. These results offer theoretical insight and engineering utility for the scientific evaluation of tunnel surrounding-rock grade.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1714882</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1714882</link>
        <title><![CDATA[CBAM-DenseNet with multi-feature quality filtering: advancing accuracy in small-sample iris recognition]]></title>
        <pubdate>2026-01-27T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Yongheng Pang</author><author>Zishen Wang</author><author>Nan Jiang</author><author>Jia Qin</author><author>Suyuan Li</author>
        <description><![CDATA[In the context of the information age, traditional password and key-based authentication mechanisms are no longer sufficient to meet the growing demands for information security. Iris recognition technology has garnered attention due to its high security and uniqueness. Current iris recognition methods based on single feature extraction are prone to loss of feature information, which affects recognition rates. To address this, this paper proposes a multi-feature fusion-based iris recognition method. The method employs a comprehensive quality evaluation scheme to filter iris images, ensuring the quality of the input images. An improved CAN network is used to effectively remove image noise, and a DenseNet network-based iris feature extraction method is combined with a fusion space and attention mechanism (CBAM) to enhance the expressiveness of features. Through experiments with small sample sizes and testing on various public iris databases, the proposed method has been validated for significant improvements in recognition accuracy and robustness.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1675834</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1675834</link>
        <title><![CDATA[An improved YOLOv10-based framework for knee MRI lesion detection with enhanced small object recognition and low contrast feature extraction]]></title>
        <pubdate>2026-01-20T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Hongwei Yang</author><author>Wenqu Song</author><author>Tiankai Jiang</author><author>Chuanhao Wang</author><author>Luping Zhang</author><author>Zhian Cai</author><author>Yuhan Sun</author><author>Qing Zhao</author><author>Yuyu Sun</author>
        <description><![CDATA[Rationale and objectivesTo address the challenges in detecting anterior cruciate ligament (ACL) lesions in knee MRI examinations, including difficulties in identifying tiny lesions, insufficient extraction of low-contrast features, and poor modeling of irregular lesion morphologies, and to provide a precise and efficient auxiliary diagnostic tool for clinical practice.Materials and methodsAn enhanced framework based on YOLOv10 is constructed. The backbone network is optimized using the C2f-SimAM module to enhance multi-scale feature extraction and spatial attention; an Adaptive Spatial Fusion (ASF) module is introduced in the neck to better fuse multi-scale spatial features; and a novel hybrid loss function combining Focal-EIoU and KPT Loss is employed. To ensure rigorous statistical evaluation, we utilized a five-fold cross-validation strategy on a dataset of 917 cases.ResultsEvaluation on the KneeMRI dataset demonstrates that the proposed model achieves statistically significant improvements over standard YOLOv10, Faster R-CNN, and Transformer-based detectors (RT-DETR). Specifically, mAP@0.5 is increased by 1.3% (p < 0.05) compared to the standard YOLOv10, and mAP@0.5:0.95 is improved by 2.5%. Qualitative analysis further confirms the model's ability to reduce false negatives in small, low-contrast tears.ConclusionThis framework effectively connects general object detection models with the specific requirements of medical imaging, providing a precise and efficient solution for diagnosing ACL injuries in routine clinical workflows.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1732820</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1732820</link>
        <title><![CDATA[Laplace-guided fusion network for camouflage object detection]]></title>
        <pubdate>2026-01-14T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Jiangxiao Zhang</author><author>Feng Gao</author><author>Shengmei He</author><author>Bin Zhang</author>
        <description><![CDATA[Camouflaged object detection (COD) aims to identify objects that are visually indistinguishable from their surrounding background, making it challenging to precisely distinguish the boundaries between objects and backgrounds in camouflaged environments. In recent years, numerous studies have leveraged frequency-domain methods to aid in camouflage target detection by utilizing frequency-domain information. However, current methods based on the frequency domain cannot effectively capture the boundary information between disguised objects and the background. To address this limitation, we propose a Laplace transform-guided camouflage object detection network called the Self-Correlation Cross Relation Network (SeCoCR). In this framework, the Laplace-transformed camouflage target is treated as high-frequency information, while the original image serves as low-frequency information. These are then separately input into our proposed Self-Relation Attention module to extract both local and global features. Within the Self-Relation Attention module, key semantic information is retained in the low-frequency data, and crucial boundary information is preserved in the high-frequency data. Furthermore, we design a multi-scale attention mechanism for low- and high-frequency information, Low-High Mix Fusion, to effectively integrate essential information from both frequencies for camouflage object detection. Comprehensive experiments on three COD benchmark datasets demonstrate that our approach significantly surpasses existing state-of-the-art frequency-domain-assisted methods.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1738444</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1738444</link>
        <title><![CDATA[Painting authentication using CNNs and sliding window feature extraction]]></title>
        <pubdate>2026-01-13T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Juan Ruiz de Miras</author><author>José Luis Vílchez</author><author>María José Gacto</author><author>Domingo Martín</author>
        <description><![CDATA[Painting authentication is an inherently complex task, often relying on a combination of connoisseurship and technical analysis. This study focuses on the authentication of a single painting attributed to Paolo Veronese, using a convolutional neural network approach tailored to severe data scarcity. To ensure that stylistic comparisons were based on artistic execution rather than iconographic differences, the dataset was restricted to paintings depicting the Holy Family, the same subject as the work under authentication. A custom shallow convolutional network was developed to process multichannel inputs (RGB, grayscale, and edge maps) extracted from overlapping patches via a sliding-window strategy. This patch-based design expanded the dataset from a small number of paintings to thousands of localized patches, enabling the model to learn microtextural and brushstroke features. Regularization techniques were employed to enhance generalization, while a painting-level cross-validation strategy was used to prevent data leakage. The model achieved high classification performance (accuracy of 94.51%, Area under the Curve 0.99) and generated probability heatmaps that revealed stylistic coherence in authentic Veronese works and fragmentation in non-Veronese paintings. The work under examination yielded an intermediate global mean Veronese probability (61%) with extensive high-probability regions over stylistically salient passages, suggesting partial stylistic affinity. The results support the use of patch-based models for stylistic analysis in art authentication, especially under domain-specific data constraints. While the network provides strong probabilistic evidence of stylistic affinity, definitive attribution requires further integration with historical, technical, and provenance-based analyses.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1669512</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1669512</link>
        <title><![CDATA[IndiaScene365: a transfer learning dataset for Indian scene understanding in diverse weather condition]]></title>
        <pubdate>2026-01-09T00:00:00Z</pubdate>
        <category>Data Report</category>
        <author>Deepa Mane</author><author>Sandhya Arora</author><author>Sachin Shelke</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1660388</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1660388</link>
        <title><![CDATA[Demographic identification of Greater Caribbean manatees via acoustic feature learning]]></title>
        <pubdate>2026-01-08T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Fernando Merchan</author><author>Kenji Contreras</author><author>Héctor Poveda</author><author>Rocío M. Estévez</author><author>Hector M. Guzman</author><author>Javier E. Sanchez-Galan</author>
        <description><![CDATA[Demographic inference from vocalizations is essential for monitoring endangered Greater Caribbean manatees (Trichechus manatus manatus) in tropical environments where direct observation is limited. While passive acoustic monitoring has proven effective for manatee detection and individual identification, the ability to classify sex and age from vocalizations remains unexplored, limiting ecological insights into population structure and reproductive dynamics. We investigated whether machine learning can accurately classify sex and age from manatee acoustic signals using 1,285 vocalizations from 20 wild individuals captured in the Changuinola River, Panama. Acoustic features including spectral envelope descriptors (MFCCs), harmonic content (chroma), and temporal-frequency parameters were extracted and analyzed using two feature sets: SET1 (30 spectral-cepstral features) and SET2 (38 features augmented with explicit pitch and temporal descriptors). Four classification algorithms (Random Forest, XGBoost, SVM, LDA) were trained under Leave-One-Group-Out cross-validation with SMOTE oversampling to address class imbalance. Sex classification achieved 85%–87% accuracy (75%–78% macro-F1) with balanced performance across both classes (female: 86%, male: 79%), validating operational feasibility for passive monitoring applications. However, subject-level bootstrap analysis revealed substantial individual heterogeneity (female: 95% CI: 68.7%–96.4%, male: 75.1%–83.6%), indicating that approximately 10%–15% of individuals exhibit systematic misclassification due to atypical acoustic signatures. Spectral envelope characteristics (MFCCs, spectral skewness) rather than fundamental frequency were most discriminative, suggesting sex-related variation manifests in vocal tract resonance patterns. Age classification achieved 73%–85% global accuracy but exhibited severe juvenile under-detection (14%–26% recall), with bootstrap confidence intervals spanning 9.3%–86.3% for juveniles vs. 60.7%–84.7% for adults. Dimensionality reduction (PCA, t-SNE) revealed substantial overlap between juvenile and adult acoustic feature distributions, with clearer age structure visible primarily within female clusters, contributing to systematic misclassification of male juveniles. Threshold optimization improved juvenile recall to 63% but increased false positives to 37%, presenting trade-offs for conservation surveillance. Acoustic body size regression demonstrated promising continuous estimation (MAE = 0.208 m, R2 = 0.33), offering an alternative to categorical age classification by enabling coarse demographic profiling when integrated with sex inference. These findings establish the operational viability of acoustic sex classification for manatee conservation while highlighting fundamental challenges in categorical age inference due to continuous ontogenetic variation and limited juvenile samples. However, acoustic body size regression offers a promising complementary approach, enabling continuous demographic profiling across size classes rather than discrete age categories. Integration with established individual identification frameworks would enable comprehensive acoustic mark-recapture, simultaneously estimating abundance, sex ratios, size distributions, and demographic structure from long-term hydrophone deployments without requiring visual confirmation of body dimensions.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1558358</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1558358</link>
        <title><![CDATA[Anatomical study and early diagnosis of dome galls in Cordia Dichotoma using DeepSVM model]]></title>
        <pubdate>2026-01-05T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Said Khalid Shah</author><author>Mazliham Bin Mohd Su’ud</author><author>Aurangzeb Khan</author><author>Muhammad Mansoor Alam</author><author>Muhammad Ayaz</author>
        <description><![CDATA[IntroductionArtificial intelligence (AI), particularly deep learning (DL), offers automated solutions for early detection of plant diseases to improve crop yield. However, training accurate models on real-field data remains challenging due to over fitting and limited generalization. As observed in prior studies, traditional CNNs often struggle with real-environment variability, and transfer learning can lead to instability in training on domain-specific leaf datasets. This study focuses on detecting dome galls, a disease in Cordia dichotoma, by formulating a binary classification task (healthy vs. diseased leaves) using a custom dataset of 3,900 leaf images collected from real field environments.MethodsInitially, both custom CNNs and transfer learning models were trained and compared. Among them, a modified ResNet-50 architecture showed promising results but suffered from over fitting and unstable convergence. To address this, the final sigmoid activation layer was replaced with a Support Vector Machine (SVM), and L2 regularization was applied to reduce over fitting. This hybrid DeepSVM architecture stabilized training and improved model robustness. Image preprocessing and augmentation techniques were applied to increase variability and prevent over fitting.ResultsThe final model was evaluated on a separate test set of 400 images, and the results remained stable across repeated runs. DeepSVM achieved an accuracy of 94.50% and an F1-score of 94.47%, outperforming other well-known models like VGG-16, InceptionResNetv2, and MobileNet-V2.ConclusionThese results indicate that the proposed DeepSVM approach offers better generalization and training stability than conventional CNN classifiers, potentially aiding in automated disease monitoring for precision agriculture.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1649452</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1649452</link>
        <title><![CDATA[BLoss-DDNet: bending loss and dual-task decoding network for overlapping cell nucleus segmentation of cervical clinical LBC images]]></title>
        <pubdate>2025-11-26T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Guihua Yang</author><author>Ziran Chen</author><author>Peng Guo</author><author>Junchi Ma</author><author>Jinjie Huang</author><author>Cong Jin</author><author>Xiaona Yang</author><author>Kai Zhao</author><author>Yibo Wang</author><author>Qi Gao</author><author>Chengcheng Liu</author><author>Tianqi Wu</author><author>Yong Li</author><author>Yingwei Guo</author><author>Jie Zheng</author><author>Xiangran Cai</author><author>Yingjian Yang</author>
        <description><![CDATA[IntroductionCervical cancer has become one of the most malignant tumors that threatens women's health worldwide. Liquid-based cytology (LBC) examination has become the most common screening method for detecting cervical cancer early and preventing it. Currently, nuclear segmentation technology for cervical clinical LBC images based on convolutional neural networks has become a vital means of assisting in the diagnosis of cervical cancer. However, the existing nuclear segmentation techniques fail to segment the nuclei of severely overlapping nuclei in highly aggregated cell clusters, which will inevitably lead to the misdiagnosis of cervical cancer pathology.MethodsTherefore, a novel bending loss and dual-task decoding network (Bloss-DDNet) is proposed for overlapping cell nucleus segmentation of cervical clinical LBC images. First, the network architecture search method is introduced to search and optimize the architecture of the decoding module in the dual-task branch, determining the mask and boundary decoding modules (dual-task decoding modules) of the Bloss-DDNet. Second, two feature maps, separately generated from dual-task decoding branches composed of a shared encoder module and dual-task decoder modules, are fused to enhance the sensitivity to cell nucleus boundaries. Third, a bending loss is introduced to the loss function to focus on the curvature variation characteristics of the intersection of overlapping cell nucleus boundaries, thereby constraining the training process of the dual-task decoding branch and increasing the constraint on the cell nucleus boundary.ResultsThe results show that all evaluation metrics of the proposed Bloss-DDNet achieved the best performance on public datasets.DiscussionTherefore, the proposed Bloss-DDNet can effectively address the segmentation problem of overlapping cell clusters and nuclei in clinical LBC images, providing strong support for subsequent clinical auxiliary diagnosis of cervical cancer.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1703135</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1703135</link>
        <title><![CDATA[Hypergraph-based contrastive learning for enhanced fraud detection]]></title>
        <pubdate>2025-11-26T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Qinhong Wang</author><author>Yiming Shen</author><author>Husheng Dong</author>
        <description><![CDATA[The proliferation of digital platforms has enabled fraudsters to deploy sophisticated camouflage techniques, such as multi-hop collaborative attacks, to evade detection. Traditional Graph Neural Networks (GNNs) often fail to capture these complex high-order patterns due to limitations including homophily assumption failures, severe label imbalance, and noise amplification during deep aggregation. To address these challenges, we propose the Hypergraph-based Contrastive Learning Network (HCLNet), a novel framework integrating three synergistic innovations. Firstly, multi-relational hypergraph fusion encodes heterogeneous associations into hyperedges, explicitly modeling group-wise fraud syndicates beyond pairwise connections. Secondly, a multi-head gated hypergraph aggregation mechanism employs parallel attention heads to capture diverse fraud patterns, dynamically balances original and high-order features via gating, and stabilizes training through residual connections with layer normalization. Thirdly, hierarchical dual-view contrastive learning jointly applies feature masking and topology dropout at both node and hyperedge levels, constructing augmented views to optimize self-supervised discrimination under label scarcity. Extensive experiments on two real-world datasets demonstrate HCLNet's superior performance, achieving significant improvements over the baselines across key evaluation metrics. The model's ability to reveal distinctive separation patterns between fraudulent and benign entities underscores its practical value in combating evolving camouflaged fraud tactics in digital ecosystems.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1681277</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1681277</link>
        <title><![CDATA[Resource-efficient fine-tuning of large vision-language models for multimodal perception in autonomous excavators]]></title>
        <pubdate>2025-11-18T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Hung Viet Nguyen</author><author>Hyojin Park</author><author>Namhyun Yoo</author><author>Jinhong Yang</author>
        <description><![CDATA[Recent advances in large vision-language models (LVLMs) have transformed visual recognition research by enabling multimodal integration of images, text, and videos. This fusion supports a deeper and more context-aware understanding of visual environments. However, the application of LVLMs to multitask visual recognition in real-world construction scenarios remains underexplored. In this study, we present a resource-efficient framework for fine-tuning LVLMs tailored to autonomous excavator operations, with a focus on robust detection of humans and obstacles, as well as classification of weather conditions on consumer-grade hardware. By leveraging Quantized Low-Rank Adaptation (QLoRA) in conjunction with the Unsloth framework, our method substantially reduces memory consumption and accelerates fine-tuning compared with conventional approaches. We comprehensively evaluate a domain-specific excavator-vision dataset using five open-source LVLMs. These include Llama-3.2-Vision, Qwen2-VL, Qwen2.5-VL, LLaVA-1.6, and Gemma 3. Each model is fine-tuned on 1,000 annotated frames and tested on 2000 images. Experimental results demonstrate significant improvements in both object detection and weather classification, with Qwen2-VL-7B achieving an mAP@50 of 88.03%, mAP@[0.50:0.95] of 74.20%, accuracy of 84.54%, and F1 score of 78.83%. Our fine-tuned Qwen2-VL-7B model not only detects humans and obstacles robustly but also classifies weather accurately. These results illustrate the feasibility of deploying LVLM-based multimodal AI agents for safety monitoring, pose estimation, activity tracking, and strategic planning in autonomous excavator operations.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1671099</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1671099</link>
        <title><![CDATA[Shape modeling of longitudinal medical images: from diffeomorphic metric mapping to deep learning]]></title>
        <pubdate>2025-10-30T00:00:00Z</pubdate>
        <category>Review</category>
        <author>Edwin Tay</author><author>Nazli Tümer</author><author>Amir A. Zadpoor</author>
        <description><![CDATA[Living biological tissue is a complex system, constantly growing and changing in response to external and internal stimuli. These processes lead to remarkable and intricate changes in shape. Modeling and understanding both natural and pathological (or abnormal) changes in the shape of anatomical structures is highly relevant, with applications in diagnostic, prognostic, and therapeutic healthcare. Nevertheless, modeling the longitudinal shape change of biological tissue is a non-trivial task due to its inherent nonlinear nature. In this review, we highlight several existing methodologies and tools for modeling longitudinal shape change (i.e., spatiotemporal shape modeling). These methods range from diffeomorphic metric mapping to deep-learning based approaches (e.g., autoencoders, generative networks, recurrent neural networks, etc.). We discuss the synergistic combinations of existing technologies and potential directions for future research, underscoring key deficiencies in the current research landscape.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1575427</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1575427</link>
        <title><![CDATA[Oral squamous cell carcinoma grading classification using deep transformer encoder assisted dilated convolution with global attention]]></title>
        <pubdate>2025-10-17T00:00:00Z</pubdate>
        <category>Brief Research Report</category>
        <author>Singaraju Ramya</author><author>R. I. Minu</author>
        <description><![CDATA[In recent years, Oral Squamous Cell Carcinoma (OSCC) has been a common tumor in the orofacial region, affecting areas such as the teeth, jaw, and temporomandibular joint. OSCC is classified into three grades: “well-differentiated, moderately differentiated, and poorly differentiated,” with a high morbidity and mortality rate among patients. Several existing methods, such as AlexNet, CNN, U-Net, and V-Net, have been used for OSCC classification. However, these methods face limitations, including low ACC, poor comparability, insufficient data collection, and prolonged training times. To address these limitations, we introduce a novel Deep Transformer Encoder-Assisted Dilated Convolution with Global Attention (DeTr-DiGAtt) model for OSCC classification. To enhance the dataset and mitigate over-fitting, a GAN model is employed for data augmentation. Additionally, an Adaptive Bilateral Filter (Ad-BF) is used to improve image quality and remove undesirable noise. For accurate identification of the affected region, an Improved Multi-Encoder Residual Squeeze U-Net (Imp-MuRs-Unet) model is utilized for segmentation. The DeTr-DiGAtt model is then applied to classify different OSCC grading levels. Furthermore, an Adaptive Grey Lag Goose Optimization Algorithm (Ad-GreLop) is used for hyperparameter tuning. The proposed method achieves an accuracy (ACC) of 98.59%, a Dice score of 97.97%, and an Intersection over Union (IoU) of 98.08%.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1647074</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1647074</link>
        <title><![CDATA[A hybrid framework for enhanced segmentation and classification of colorectal cancer histopathology]]></title>
        <pubdate>2025-10-14T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Aaseegha M. D.</author><author>Venkataramana B.</author>
        <description><![CDATA[IntroductionColorectal cancer (CRC) remains one of the leading causes of cancer-related deaths globally. Early detection and precise diagnosis are crucial in improving patient outcomes. Traditional histological evaluation through manual inspection of stained tissue slides is time-consuming, prone to observer variability, and susceptible to inconsistent diagnoses.MethodsTo address these challenges, we propose a hybrid deep learning system combining Swin Transformer, EfficientNet, and ResUNet-A. This model integrates self-attention, compound scaling, and residual learning to enhance feature extraction, global context modeling, and spatial categorization. The model was trained and evaluated using a histopathological dataset that included serrated adenoma, polyps, adenocarcinoma, high-grade and low-grade intraepithelial neoplasia, and normal tissues.ResultsOur hybrid model achieved impressive results, with 93% accuracy, 92% precision, 93% recall, and 93% F1-score. It outperformed individual architectures in both segmentation and classification tasks. Expert annotations and segmentation masks closely matched, demonstrating the model’s reliability.DiscussionThe proposed hybrid design proves to be a robust tool for the automated analysis of histopathological features in CRC, showing significant promise for improving diagnostic accuracy and efficiency in clinical settings.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1646743</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1646743</link>
        <title><![CDATA[Enhancing COVID-19 classification of X-ray images with hybrid deep transfer learning models]]></title>
        <pubdate>2025-10-13T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Maliki Moustapha</author><author>Murat Tasyurek</author><author>Celal Ozturk</author>
        <description><![CDATA[Deep learning, a subset of artificial intelligence, has made remarkable strides in computer vision, particularly in addressing challenges related to medical images. Deep transfer learning (DTL), one of the techniques of deep learning, has emerged as a pivotal technique in medical image analysis, including studies related to COVID-19 detection and classification. Our paper proposes an alternative DTL framework for classifying COVID-19 x-ray images in this context. Unlike prior studies, our approach integrates three distinct experimentation processes using pre-trained models: AlexNet, EfficientNetB1, ResNet18, and VGG16. Furthermore, we explore the application of YOLOV4, traditionally used in object detection tasks, to COVID-19 feature detection. Our methodology involves three different experiments: manual hyperparameter selection, k-fold retraining based on performance metrics, and the implementation of a genetic algorithm for hyperparameter optimization. The first involves training the models with manually selected hyperparameter sets (learning rate, batch size, and epoch). The second approach employs k-fold cross-validation to retrain the models based on the best-performing hyperparameter set. The third employed a genetic algorithm (GA) to automatically determine optimal hyperparameter values, selecting the model with the best performance on our dataset. We tested a Kaggle dataset with more than 5,000 samples and found ResNet18 to be the best model based on genetic algorithm-based hyperparameter selection. We also tested the proposed framework process on another separate public dataset and simulated adversarial attacks to ensure its robustness and dependability. The study outcomes had an accuracy of 99.57%, an F1-score of 99.50%, a precision of 99.44%, and an average AUC of 99.89 for each class. This study underscores the effectiveness of our proposed model, positioning it as a cutting-edge solution in COVID-19 x-ray image classification. Furthermore, the proposed study has the potential to achieve automatic predictions through the use of input images in a simulated web app. This would provide an essential supplement for imaging diagnosis in remote areas with scarce medical resources and help in training junior doctors to perform imaging diagnosis.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1675154</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1675154</link>
        <title><![CDATA[Optimizing surface defect detection with YOLOv9: the role of advanced backbone models]]></title>
        <pubdate>2025-10-10T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Zhonglin Zeng</author><author>Hongyang Wang</author><author>Chi Yao</author><author>Zile Dong</author><author>Shimin Cai</author>
        <description><![CDATA[IntroductionYOLO algorithmic models are widely utilized for detecting surface defects, offering a robust and efficient approach to identifying various flaws and imperfections on material surfaces.MethodsIn this study, we explore the integration of six distinct backbone networks within the YOLOv9 framework to optimize surface defect detection in steel strips. Specifically, we improve the YOLOv9 framework by integrating six representative backbones-ResNet50, GhostNet, MobileNetV4, FasterNet, StarNet, and RepViT-and conduct a systematic evaluation on the NEU-DET dataset and the GC10-DET dataset. Using YOLOv9-C as the baseline, we compare these backbones in terms of detection accuracy, computational complexity, and model efficiency.ResultsResults show that RepViT achieves the best overall performance with an mAP50 of 68.8%, F1-score of 0.65, and a balanced precision-recall profile, while GhostNet offers superior computational efficiency with only 41.2 M parameters and 190.2 GFLOPs. Further validation on YOLOv5-m confirms the consistency of the results.DiscussionThe study offers practical guidance for backbone selection in surface defect detection tasks, highlighting the advantages of lightweight architectures for real-time industrial applications.]]></description>
      </item>
      </channel>
    </rss>