<?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0">
      <channel xmlns:content="http://purl.org/rss/1.0/modules/content/">
        <title>Frontiers in Artificial Intelligence | Pattern Recognition section | New and Recent Articles</title>
        <link>https://www.frontiersin.org/journals/artificial-intelligence/sections/pattern-recognition</link>
        <description>RSS Feed for Pattern Recognition section in the Frontiers in Artificial Intelligence journal | New and Recent Articles</description>
        <language>en-us</language>
        <generator>Frontiers Feed Generator,version:1</generator>
        <pubDate>2026-06-25T03:16:26.197+00:00</pubDate>
        <ttl>60</ttl>
        <item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1835651</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1835651</link>
        <title><![CDATA[Evaluating the real-world robustness of face-swap detection models under compression and noise]]></title>
        <pubdate>2026-06-23T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Li Baocai</author><author>Fazli Bin Azzali</author><author>Nik Fatinah binti N. Mohd Farid</author>
        <description><![CDATA[IntroductionRecent advances in generative adversarial networks (GANs) and autoencoding techniques have significantly improved the realism of face-swap and deepfake media, creating substantial challenges for digital media authentication. Although existing deepfake detection models achieve high accuracy on benchmark datasets, their robustness under real-world media degradations remains insufficiently explored.MethodsThis study systematically evaluates the resilience of five leading face-swap detection models—XceptionNet, MesoNet, FSD-GAN, FakeTracer, and a Hybrid + Landmark approach—under four common distortions: JPEG compression (quality levels 20–90), Gaussian noise (σ = 0.01–0.05), motion blur (kernel size 3–15), and video encoding artefacts (bitrate 50–500 kbps). Experiments were conducted using the FaceForensics++ dataset (1,000 videos: 720 training, 140 validation, and 140 testing) and Celeb-DF v2 (590 videos: 400 training, 90 validation, and 100 testing). Performance was assessed using accuracy, F1-score, area under the curve (AUC), and degradation rate (Δ) between clean and distorted conditions.ResultsThe results demonstrate a substantial reduction in detection performance under degraded conditions. Average accuracy declined from 94.7% on clean data to 67.8% on distorted data, corresponding to an overall degradation rate of −26.9%. JPEG compression and motion blur caused the most significant performance drops, with reductions of up to 35%, particularly for lightweight CNN-based detectors. In contrast, FSD-GAN and FakeTracer exhibited greater robustness, maintaining degradation rates of no more than −15% due to their latent fingerprinting and trace embedding mechanisms.DiscussionThe findings highlight the limitations of current deepfake detection systems when deployed in real-world environments where media distortions are prevalent. The study emphasizes the need for distortion-aware training strategies, cross-condition benchmarking, and deployment-oriented evaluation protocols. Furthermore, a dual-branch framework integrating a Vision Transformer (ViT) for spatial artefact detection with a Recurrent Neural Network (RNN) or Temporal Convolutional Network (TCN) for temporal coherence modelling is proposed as a promising direction for improving the robustness and reliability of future deepfake detection systems.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1833234</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1833234</link>
        <title><![CDATA[Violation detection in power operation sites based on multi-scale detection and few-shot learning]]></title>
        <pubdate>2026-06-16T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Yaokuan Wen</author><author>Jun Wang</author><author>Qiming Liu</author><author>Mo Zhou</author><author>Minzhe Tian</author>
        <description><![CDATA[IntroductionSafety supervision at power operation sites is critical for ensuring worker safety and maintaining a reliable electricity supply. However, existing safety violation detection methods are constrained by limited labeled data, poor performance on small-object detection tasks, and interference from complex backgrounds.MethodsTo overcome these challenges, this study proposes a framework that integrates multi-scale object detection with few-shot learning. A multi-scale feature extraction module is designed based on a feature pyramid network and channel attention mechanisms to enhance the perception of small objects. In addition, a few-shot learning framework incorporating a meta-learning strategy is introduced to address the scarcity of labeled safety violation samples and improve the model's adaptability to new tasks with limited training data.ResultsExperimental results demonstrate that the proposed method consistently outperforms existing approaches across multiple evaluation metrics. The framework achieves notable improvements in small-object detection accuracy and few-shot learning performance, resulting in enhanced detection accuracy, robustness, and generalization capability.DiscussionThe integration of multi-scale feature extraction and few-shot learning effectively addresses the challenges of safety violation detection in power operation environments. The proposed framework provides a practical and reliable solution for intelligent safety monitoring and has significant potential for real-world deployment in power operation sites.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1834763</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1834763</link>
        <title><![CDATA[Adaptive quadtree-based segmentation of nucleus and cytoplasm in pap-smear images: a lightweight and interpretable approach for automated cytology]]></title>
        <pubdate>2026-06-12T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Wasswa William</author><author>Andrew Ware</author>
        <description><![CDATA[BackgroundAutomated analysis of Pap-smear images plays an important role in cervical cancer screening, particularly in low-resource settings where manual cytology remains labour-intensive, subjective, and prone to inter-observer variability. On the other hand, accurate segmentation of the nucleus and cytoplasm is a fundamental step in computer-aided diagnosis systems because it enables quantitative morphometric analysis and computation of clinically important biomarkers such as the nucleus-to-cytoplasm ratio. However, robust cervical cell segmentation remains challenging due to staining variability, inhomogeneity, irregular morphology, weak cytoplasmic boundaries, and overlapping cellular structures. This study presents an adaptive quadtree-based segmentation framework for automated nucleus and cytoplasm delineation in Pap-smear images.MethodsThe proposed method employs hierarchical split–merge decomposition guided by a dynamic adaptive statistical homogeneity analysis using mean intensity, variance, and entropy measures. Preprocessing is performed using large-kernel median filtering for background normalisation, followed by local Otsu thresholding, adaptive region merging, overlap refinement, and morphological post-processing. The framework was evaluated on both the Herlev cervical cytology dataset and the ISBI 2015 cervical cytology segmentation challenge dataset containing overlapping and clustered cervical cells. Comparative benchmarking was additionally performed against the U-Net and Attention U-Net.ResultsOn the Herlev dataset, the proposed framework achieved nucleus Dice coefficients exceeding 0.94 and Zijdenbos Similarity Index (ZSI) values greater than 0.9034 across all diagnostic classes, with competitive cytoplasm segmentation performance. On the ISBI 2015 dataset, the framework maintained acceptable segmentation performance under overlapping-cell conditions, achieving nucleus Dice and ZSI values of 0.912 ± 0.048 and 0.918 ± 0.044, respectively. Morphometric feature comparisons demonstrated strong agreement with ground-truth annotations and low average percentage errors for area and diameter measurements. Although deep learning models achieved superior performance under highly complex overlap conditions, the proposed framework remained competitive while requiring substantially lower computational resources and no iterative model training.ConclusionThe proposed Adaptive Quadtree-Based Segmentation framework provides a lightweight, interpretable, and computationally efficient approach for automated cervical cytology segmentation. Its training-free design, transparent statistical decision rules, and reduced hardware requirements make it particularly suitable for deployment in resource-constrained and embedded cervical cancer screening systems. The framework provides a practical segmentation backbone for automated cytology analysis and downstream computer-aided diagnosis applications.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1841848</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1841848</link>
        <title><![CDATA[An efficient hybrid CNN–transformer framework for real-time weapon detection and face recognition]]></title>
        <pubdate>2026-06-10T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>P. Shanthi</author><author>V. Manjula</author>
        <description><![CDATA[The growing demand of smart surveillance systems necessitates the accurate and real-time detection of weapons and face recognition with robustness against occlusion, illumination changes, and complex backgrounds. Existing techniques based on standalone CNN or transformer architectures are less effective in capturing local fine-grained features as well as long-range dependencies. This paper presents ConViDeTR, a hybrid deep learning framework that integrates CNN, Vision Transformer (ViT), and Detection Transformer (DETR) architectures into a unified framework. The key contribution of the proposed framework is the deep feature fusion layer, which integrates local spatial features, global context features, and object query features in one shared feature space. This enables the synchronous execution of weapon detection and face recognition using one efficient framework. The experiments using existing benchmark datasets validate the performance, achieving 98.9% accuracy in weapon detection and 97.34% accuracy in face recognition and outperforming the existing techniques in both tasks. The framework also demonstrates real-time performance with 25–30 FPS and low latency. The performance of the proposed framework sustains its effectiveness, robustness, and scalability in the development of next-generation intelligent surveillance systems.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1784359</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1784359</link>
        <title><![CDATA[Deep learning and multi-statistical features: an intra-frame forgery detection video method]]></title>
        <pubdate>2026-05-29T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Diaa Uliyan</author><author>Manal Eid Alazmi</author><author>Mohammad Alsaffar</author><author>Moham’d Al-Dlalah</author><author>Meshari Alazmi</author>
        <description><![CDATA[IntroductionIn this paper, we propose a technique for detecting spliced video forgeries using statistical clues extracted from the spatial and compression domains inside a suspicious video. The proposed technique employs a multi-feature architecture to train statistical features for every domain using the VGG-16 model. Examining the fused characteristics in specific image regions with great detail exposes instances of manipulation. Both the compression impacts of JPEG and the visual distortions that result from image modification are examined in our research.TechniquesIn this study, we propose an alternative to the standard methods for detecting spliced forgeries in spatial domain textural analysis, such as entropy-based edges (MER), median filter residual (MFR), gray level regional maxima (FGM), morphological open images (MOI), and morphological erosion images (MEI). Derivative filters of image that may extract statistical information about manipulation traces from compressed data were chosen for artifact detection in double JPEG compression. As an example, the compression domain analysis makes good use of the 2D block Discrete Cosine Transform (DCT). There are three stages that comprise up the suggested method: (1) Video preprocessing splits videos into frames and converts each frame’s image to two domain formats: spatial and DCT compression. (2) Take each domain and extract several statistical characteristics. (3) Train VGG16 on a dataset to identify the spatially manipulated area in a video.ResultsThe proposed method has been validated and tested on the HTVD and GRIP datasets. The performance measure, such as Splicing forgery accuracy is 93.50% on the GRIP dataset and 92.40% in HTVD.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1824634</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1824634</link>
        <title><![CDATA[Redefining lightweight vision models for healthcare AI]]></title>
        <pubdate>2026-05-29T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Linus Lee</author><author>Zhibin Feng</author><author>Jen Hong Tan</author><author>Chiaw-Ling Chng</author>
        <description><![CDATA[BackgroundVision models for medical imaging often require tens of millions of parameters, raising questions about whether architectural efficiency can be achieved without sacrificing classification performance. We introduce MedLiT-seed (2.1 Million parameters) and MedLiT-nano (0.75 Million parameters), two ultra-lightweight vision transformers designed for efficient and scalable medical image analysis.MethodsMedLiT employs a streamlined Mixture-of-Experts (MoE) architecture with SwiGLU feedforward networks, grouped query attention, and depth-wise scaling. Models were pre-trained using masked autoencoding on ImageNet and MedMNIST, followed by fine-tuning on 12 MedMNIST 2D subsets. We evaluated performance across multiple configurations and compared against benchmark models including ResNet, MedViT, and AutoML systems.ResultsMedLiT-seed achieved the highest Area Under Curve (AUC) on 4 subsets and second-highest on 2 others, outperforming models with 10–20× more parameters. MedLiT-nano achieved results comparable to, and even exceeding, ResNet-18 and AutoML baselines in several subsets. Transfer learning from ImageNet significantly improved convergence and generalization. Increasing embedding size yielded greater performance gains than increasing expert count.ConclusionMedLiT demonstrates that MoE-based token routing represents a viable architectural pathway for achieving competitive accuracy relative to its floating-point operations (FLOP) across diverse medical imaging modalities on the order of 2M parameters. These results suggest that selectively routing computation through specialised experts, rather than scaling model size, can serve as an effective design principle for more compact medical vision models. Such architecture can be utilised for low-resource clinical environments and scalable fine-tuning across diverse healthcare tasks, though limitations on multi-label tasks highlight clear directions for future architectural refinement.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1809586</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1809586</link>
        <title><![CDATA[HybridWeaveNet: deep cultural pattern recognition for Indian handloom heritage fabrics]]></title>
        <pubdate>2026-05-28T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>S. A. Anuraag Shankar</author><author>A. Sasithradevi</author><author>G. Krishnaraj</author><author>S. Kanimozhi</author>
        <description><![CDATA[IntroductionIndia’s handloom sector represents the diversity of local culture, which is reflected in the traditional handwoven products from India. Regional fabric varieties such as Bandhani, Banarasi, Kancheepuram, Patola and Tussar are characterized by unique pattern structures, in terms of yarns as well as color arrangements. Accurate identification between them is crucial for tasks such as digital archiving, automatic categorization of fabric and e-commerce. Manually categorizing these designs would be tedious and subjective.MethodsFurther, the existing conventional deep learning and computer vision approaches encounter significant challenges when processing complex motifs, fine-grained textures and substantial intra-class variability. For this purpose, we propose HybridWeaveNet, a custom deep learning architecture specifically developed to mitigate these shortcomings. The model integrates both a Dual Attention mechanism and pretrained EfficientNetV2 backbone to enhance pattern recognition and variate feature learning. Training of the proposed model is carried out using balanced sampling among the available five-class in the dataset. Further, various augmentation methods, including Mixup, Cutmix, GridDropout, Cutout, Blur, and ColorJitter, have been employed for effective generalization.ResultsThe proposed HybridWeaveNet model is evaluated using various metrics such as accuracy, macro F1-score, Cohen’s kappa, Matthews correlation coefficient (MCC), Jaccard index and log loss and exhibited 91% performance.DiscussionGrad-CAM visualizations validated that the model concentrates on culturally related motifs, thereby highlighting its potential for extensive fabric classification.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1761903</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1761903</link>
        <title><![CDATA[In-context adaptation of VLMs for few-shot cell detection in optical microscopy]]></title>
        <pubdate>2026-05-15T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Shreyan Ganguly</author><author>Angona Biswas</author><author>Jaydeep Rade</author><author>Md Hasibul Hasan Hasib</author><author>Nabila Masud</author><author>Nitish Singla</author><author>Abhipsa Dash</author><author>Ushashi Bhattacharjee</author><author>Soumik Sarkar</author><author>Aditya Balu</author><author>Anwesha Sarkar</author><author>Adarsh Krishnamurthy</author>
        <description><![CDATA[Foundation vision-language models (VLMs) excel on natural images, but their utility for biomedical microscopy remains underexplored. In this paper, we investigate how in-context learning enables state-of-the-art VLMs to perform few-shot object detection when large annotated datasets are unavailable, as is often the case with microscopic images. We introduce the Micro-OD benchmark, a curated collection of 252 images specifically curated for in-context learning, with bounding-box annotations spanning 11 cell types across four sources, including two in-lab expert-annotated sets. We systematically evaluate eight VLMs under few-shot conditions and compare variants with and without implicit test-time reasoning tokens. We further implement a hybrid Few-Shot Object Detection (FSOD) pipeline that combines a detection head with a VLM-based few-shot classifier, which enhances the few-shot performance of recent VLMs on our benchmark. Across datasets, we observe that zero-shot performance is weak due to the domain gap; however, few-shot support consistently improves detection, with marginal gains achieved after six shots. We observe that some reasoning variant models show task-specific gains, but the effect varies across models and settings. Our results highlight in-context adaptation as a promising research direction requiring further development for microscopy, and our benchmark provides a reproducible testbed for advancing open-vocabulary detection in biomedical imaging. Our project page is at: here.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1759740</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1759740</link>
        <title><![CDATA[A multi-class defect detection method for substations based on the improved YOLOv10n]]></title>
        <pubdate>2026-05-13T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Long Huang</author><author>Kangning Li</author><author>Tianren Fu</author><author>Jinlei Zhu</author><author>Yunhao Zhong</author><author>Xingfei Wang</author>
        <description><![CDATA[IntroductionEnsuring the stable operation of power substations is critical for maintaining the reliability of the electrical grid. However, automated inspection of substation equipment remains challenging because multi-class defects are often small, visually blurred, and located in complex backgroundsMethodsTo improve the localization accuracy of small defects with fuzzy features, this paper proposes YOLO-SMALLNET, an improved defect detection algorithm based on YOLOv10n. First, a Detail Information Extraction Convolution module is used to replace the strided convolution modules in the baseline network to preserve fine-grained information during downsampling. Second, a low-level feature fusion detection layer is introduced to reduce small-target feature loss. Third, a Weighted Hybrid Fusion Pyramid Network is adopted to optimize multi-scale feature integration. Finally, a Content-Guided Attention mechanism is integrated to enhance critical defect information while suppressing background noise.ResultsExperimental results show that, compared with the baseline model, YOLO-SMALLNET improves Precision, Recall, mAP@0.5, and mAP@0.5:0.95 by 7.3, 8.2, 3.9, and 3.3%, respectively.DiscussionThe proposed method effectively reduces false detections and missed detections of small defect regions and is suitable for real-time automated inspection of substation equipment.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1810912</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1810912</link>
        <title><![CDATA[DeepForgeryNet: a hybrid CNN–LSTM and transfer learning framework for robust image forgery and deepfake detection]]></title>
        <pubdate>2026-05-08T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Aarti Sardhara</author><author>Vipul Vekariya</author><author>Ajeet Ram Pathak</author><author>Sital Dash</author>
        <description><![CDATA[IntroductionThe increasingly lifelike nature of digitally manipulated images, as well as those generated by AI, presents significant problems for both media authenticity and digital trust. Various detection methods mostly depend on the visual content and thus, might miss the subtle forensic traces. This paper focuses on developing a powerful detection tool that leverages both artifact-level and contextual inconsistencies.MethodsIn this work, we introduce DeepForgeryNet, an artifact-savvy deep learning model that combines Error Level Analysis, based preprocessing with a hybrid CNN, LSTM network. The preprocessing step focuses on exposing changes in compression and edges, whereas the CNN, LSTM model hybridly captures the spatial and contextual information. The model is trained end-to-end on publicly available benchmark datasets stratifying the data splits.ResultsThe approach taken by the method proposed yielded the following results: accuracy at 95.1%, precision at 94.6%, recall at 94.2%, F1-score at 94.4%, and AUC at 0.98. It surpassed a baseline CNN and transformer models, especially in recall. Stability of generalization with accuracy over 92% was observed in cross-dataset experiments.DiscussionSuch results serve as evidence that the integration of artifact-aware pre-processing with spatial-contextual feature learning can further enhance the reliability of detection of forgery. Even though there are still problems in the detection of very small manipulations and/or when heavily compressed, this work lays the groundwork for a new generation of digital media verification that is trustworthy.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1760137</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1760137</link>
        <title><![CDATA[Machine learning based approach to intrusion detection in internet of things environments]]></title>
        <pubdate>2026-04-17T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Oluwatoyin Esther Akinbowale</author><author>Adebola Tajudeen Adesina</author><author>Mulatu Fekadu Zerihun</author><author>Polly Mashigo</author>
        <description><![CDATA[The growing security requirements of Internet of Things (IoT) networks where heterogeneous networks and resource-constrained devices offer exponentially increased attack surface, was addressed using machine learning based intrusion detection system. Open source secondary quantitative IoT intrusion traffic data was obtained and trained using machine learning models. The dataset comprises of over one million labeled flow records consisting of 34 kinds of attacks and benign traffic. First, extensive preprocessing including managing of missing values, encoding features, scaling, and removal of redundancy was carried out followed by the training of three supervised machine learning (ML) classifiers namely Decision Tree (DT), Random Forest (RF), and Support Vector Machine (SVM) for the differentiation of the different types of intrusions. The performance evaluation of the ML models was conducted by evaluating the accuracy, precision, and recall, and F1-score. It was observed that Decision Tree model was the most outstanding in terms of overall accuracy (99.36%) and respectable performance in prevalent attack classes, and was closely followed ccy Random Forest (99.27%) while SVM lagged behind with an accuracy of 80.08% due to computational constraints in handling massive amounts of big data. Inter-arrival time and total packet size were identified as the significant discriminators in malicious behavior through feature-importance analysis. Conclusively, the tree-based models, and specifically Decision Trees, offer extremely effective and interpretable solution for real-time IoT intrusion detection, and provide future avenues in handling class imbalance and examining lightweight, ensemble, and deep-learning approaches for robust detection of rare and unknown threats. This study contributes to cybersecurity via the identification and classification of intrusions in IoT devices for proper mitigation.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1770342</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1770342</link>
        <title><![CDATA[FSD-Net: underwater object detection based on frequency and spatial domain feature enhancement]]></title>
        <pubdate>2026-04-17T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Chao Zhang</author><author>Shuang Wu</author><author>Baohua Huang</author><author>Binchen Zhao</author><author>Fengqi Cui</author><author>Xingkun Li</author>
        <description><![CDATA[BackgroundComplex underwater visual conditions cause severe missed and false detections in conventional object detection models, hindering reliable autonomous underwater exploration. This work addresses these key performance limitations.MethodsWe propose FSD-Net, a novel underwater detection model with two core enhancement modules. The Frequency Attention Convolution Module reduces missed detections via frequency-domain spatial feature preservation, and the Multi-dimensional Feature Enhancement Module suppresses false detections via enhanced semantic fusion. Experiments include ablation studies and state-of-the-art method comparisons on the UTDAC2020 and Brackish datasets.ResultsFSD-Net achieves state-of-the-art performance on both tested datasets. On the UTDAC2020 dataset, it reaches 85.7% AP50 and 82.5% F1-score, with a 3.8% AP50 improvement over the baseline model. On the Brackish dataset, it achieves 98.1% AP50 and 97.0% F1-score, with a 3.9% AP50 improvement over the baseline. The model outperforms all compared mainstream detection algorithms, and ablation studies validate the effectiveness of both proposed modules.ConclusionFSD-Net's joint frequency-spatial enhancement strategy effectively mitigates underwater image degradation challenges, providing a robust detection solution for autonomous underwater exploration. The proposed dual-module design offers a practical reference for detection model optimization in complex visual environments, with future work focused on lightweight model optimization.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1794923</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1794923</link>
        <title><![CDATA[Automatic recognition of dynamic signs of Mexican sign language using deep learning]]></title>
        <pubdate>2026-04-16T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Jesús Antonio Navarrete-López</author><author>Michelle Sainos-Vizuett</author><author>Irvin Hussein Lopez-Nava</author>
        <description><![CDATA[IntroductionOver four million individuals in Mexico face communication barriers due to hearing impairments. Sign language serves as an essential communication tool within the deaf community; however, automatic translation between sign and oral languages remains a significant challenge. This study proposes an approach for recognizing dynamic gestures from Mexican Sign Language (LSM) to support the development of assistive communication technologies.MethodsIn collaboration with expert interpreters, an LSM corpus comprising 121 signs was developed, including a specialized lexicon focused on medical emergencies and accident scenarios. A standardized video acquisition protocol was implemented with both expert and non-expert participants. The proposed methodology consists of skeletal keypoint extraction using MediaPipe, data augmentation through frame sampling, and dataset normalization. Multiple deep learning architectures were evaluated, including ResNet, Simple RNN, LSTM, Bidirectional LSTM (BiLSTM), Gated Recurrent Units (GRU), a Transformer encoder, and a hybrid ResNet–Transformer model.ResultsAmong the evaluated models, the ResNet architecture achieved the best performance, obtaining an F1-score of 0.948 under subject-independent evaluation, with an average inference time of 0.468 seconds. Hyperparameter optimization analysis indicated that performance improvements were primarily driven by training dynamics and regularization strategies rather than increases in architectural depth.DiscussionThe results demonstrate the effectiveness of deep learning–based approaches for dynamic LSM gesture recognition and highlight the importance of optimization strategies for robust generalization. This work contributes toward LSM-to-Spanish translation systems and provides a foundation for advancing data-driven sign language recognition technologies.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1811469</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1811469</link>
        <title><![CDATA[Editorial: AI-enabled breakthroughs in computational imaging and computer vision]]></title>
        <pubdate>2026-04-07T00:00:00Z</pubdate>
        <category>Editorial</category>
        <author>Liping Zhang</author><author>Xiaobo Li</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1796099</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1796099</link>
        <title><![CDATA[Dynamic-focus transformer for point cloud segmentation]]></title>
        <pubdate>2026-04-02T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Ziwen Wang</author><author>Xiaoting Fan</author><author>Mei Yu</author><author>Jianlu Liu</author><author>Shuai Wang</author><author>Yonghua Wang</author><author>Chuanfu Wu</author>
        <description><![CDATA[Transformer-based methods have significantly advanced 3D point cloud segmentation by effectively capturing long-range dependencies. However, the global or fixed-window self-attention mechanisms they often employ suffer from computational redundancy and overfitting due to processing excessive, potentially irrelevant key-value pairs for each query. To address this, we propose the Dynamic-Focus Transformer, a novel architecture that introduces a data-dependent adaptive attention mechanism. Through learned soft point masks, we selectively sparsify keys and values to focus on semantically critical regions. Our method enables flexible, input-adaptive receptive fields without the heavy memory overhead associated with per-point offset learning in deformable designs. Furthermore, when integrated into a U-Net-style encoder-decoder, our method attains a highly efficient balance between modeling capability and computational cost. Extensive experiments on S3DIS and ScanNetv2 benchmarks demonstrate that our method achieves state-of-the-art performance with notably improved efficiency, validating its effectiveness for large-scale point cloud understanding.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1818182</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1818182</link>
        <title><![CDATA[Correction: Painting authentication using CNNs and sliding window feature extraction]]></title>
        <pubdate>2026-03-13T00:00:00Z</pubdate>
        <category>Correction</category>
        <author>Juan Ruiz de Miras</author><author>José Luis Vílchez</author><author>María José Gacto</author><author>Domingo Martín</author>
        <description></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1721866</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1721866</link>
        <title><![CDATA[InfoMSD: an information-maximization self-distillation framework for parameter-efficient fine-tuning on artwork images]]></title>
        <pubdate>2026-03-04T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Feng Guan</author><author>Hao Hong</author><author>Yong Wang</author>
        <description><![CDATA[In recent years, despite the remarkable performance of large-scale vision language models across various visual classification tasks, their substantial parameter counts and high fine-tuning costs have hindered deployment in resource-constrained cultural and artwork settings. This work specifically addresses the task of object recognition in artwork—that is, identifying semantic objects (e.g., animals, people, everyday items) depicted within paintings, sketches, and other artistic renditions, rather than classifying artistic styles or genres. To address this issue, we propose InfoMSD, an unsupervised, Information-Maximization Self-Distillation framework designed for parameter-efficient fine-tuning on unlabeled artwork imagery while preserving robust performance. Specifically, InfoMSD incorporates a teacher-student architecture in the self-distillation phase, where the teacher model generates pseudo-labels for artworks, and the student model learns from the teacher through cross-entropy. By aligning the student's predictions with the discriminative signals from the teacher's pseudo-labels and simultaneously applying entropy-based regularization to sharpen the probability distribution and balance class coverage, the framework improves both the quality of the pseudo-labels and the discriminative capacity of the model. To enable parameter-efficient fine-tuning, only the layer norm parameters and visual prompts in the student model are updated, while the remaining parameters are frozen, significantly reducing computational overhead. Extensive experimental results on artwork datasets show that InfoMSD achieves accuracy improvements of +6.43 and +3.02% over CLIP zero-shot baselines, while adjusting less than 1% of the model parameters. Compared to existing lightweight distillation methods, InfoMSD achieves average accuracy gains of 1.35 and 0.96%, respectively. Overall, InfoMSD offers a novel, information-theoretic paradigm for unsupervised and efficient fine-tuning in object recognition within artistic imagery, balancing performance and efficiency.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2026.1766828</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2026.1766828</link>
        <title><![CDATA[Real-time grading method of tunnel surrounding rock based on image recognition]]></title>
        <pubdate>2026-02-05T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Yihuan Xiao</author><author>Hao Yuan</author><author>Qingye Shi</author><author>Zemin Qiu</author><author>Liao Tang</author><author>Yihua Yu</author><author>Yabin Li</author><author>Yin Pan</author><author>Qinghua Xiao</author>
        <description><![CDATA[To enable rapid, accurate grading of tunnel surrounding rock during construction, we propose a real-time grading method that integrates image processing with lightweight deep learning. We developed an automated pipeline that combines image-processing techniques and machine-learning algorithms to extract and classify characteristic parameters of tunnel surrounding rock, enabling real-time monitoring and classification at the tunnel palm surface. The study demonstrates that: (1) Following the proposed image-acquisition standards for rock and tunnel palm surfaces, images are converted to grayscale, denoised, enhanced, and normalized, which facilitates efficient and accurate extraction of structural features and improves the precision of classification parameters; (2) An optimized lithology identification and classification model was built, and a rock-hardness, strength, and integrity sensing approach based on the ShuffleNetV2 convolutional neural network was introduced to achieve real-time surrounding-rock grading. On an engineering site, the method attains 85% accuracy for lithology classification, 75% for rock-mass integrity, and 80% for overall surrounding-rock grade, confirming its feasibility and practical value. These results offer theoretical insight and engineering utility for the scientific evaluation of tunnel surrounding-rock grade.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1714882</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1714882</link>
        <title><![CDATA[CBAM-DenseNet with multi-feature quality filtering: advancing accuracy in small-sample iris recognition]]></title>
        <pubdate>2026-01-27T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Yongheng Pang</author><author>Zishen Wang</author><author>Nan Jiang</author><author>Jia Qin</author><author>Suyuan Li</author>
        <description><![CDATA[In the context of the information age, traditional password and key-based authentication mechanisms are no longer sufficient to meet the growing demands for information security. Iris recognition technology has garnered attention due to its high security and uniqueness. Current iris recognition methods based on single feature extraction are prone to loss of feature information, which affects recognition rates. To address this, this paper proposes a multi-feature fusion-based iris recognition method. The method employs a comprehensive quality evaluation scheme to filter iris images, ensuring the quality of the input images. An improved CAN network is used to effectively remove image noise, and a DenseNet network-based iris feature extraction method is combined with a fusion space and attention mechanism (CBAM) to enhance the expressiveness of features. Through experiments with small sample sizes and testing on various public iris databases, the proposed method has been validated for significant improvements in recognition accuracy and robustness.]]></description>
      </item><item>
        <guid isPermaLink="true">https://www.frontiersin.org/articles/10.3389/frai.2025.1675834</guid>
        <link>https://www.frontiersin.org/articles/10.3389/frai.2025.1675834</link>
        <title><![CDATA[An improved YOLOv10-based framework for knee MRI lesion detection with enhanced small object recognition and low contrast feature extraction]]></title>
        <pubdate>2026-01-20T00:00:00Z</pubdate>
        <category>Original Research</category>
        <author>Hongwei Yang</author><author>Wenqu Song</author><author>Tiankai Jiang</author><author>Chuanhao Wang</author><author>Luping Zhang</author><author>Zhian Cai</author><author>Yuhan Sun</author><author>Qing Zhao</author><author>Yuyu Sun</author>
        <description><![CDATA[Rationale and objectivesTo address the challenges in detecting anterior cruciate ligament (ACL) lesions in knee MRI examinations, including difficulties in identifying tiny lesions, insufficient extraction of low-contrast features, and poor modeling of irregular lesion morphologies, and to provide a precise and efficient auxiliary diagnostic tool for clinical practice.Materials and methodsAn enhanced framework based on YOLOv10 is constructed. The backbone network is optimized using the C2f-SimAM module to enhance multi-scale feature extraction and spatial attention; an Adaptive Spatial Fusion (ASF) module is introduced in the neck to better fuse multi-scale spatial features; and a novel hybrid loss function combining Focal-EIoU and KPT Loss is employed. To ensure rigorous statistical evaluation, we utilized a five-fold cross-validation strategy on a dataset of 917 cases.ResultsEvaluation on the KneeMRI dataset demonstrates that the proposed model achieves statistically significant improvements over standard YOLOv10, Faster R-CNN, and Transformer-based detectors (RT-DETR). Specifically, mAP@0.5 is increased by 1.3% (p < 0.05) compared to the standard YOLOv10, and mAP@0.5:0.95 is improved by 2.5%. Qualitative analysis further confirms the model's ability to reduce false negatives in small, low-contrast tears.ConclusionThis framework effectively connects general object detection models with the specific requirements of medical imaging, providing a precise and efficient solution for diagnosing ACL injuries in routine clinical workflows.]]></description>
      </item>
      </channel>
    </rss>