40.3K
views
60
authors
12
articles
Editors
5
Impact
Loading...
Original Research
05 October 2023

Image caption technology aims to convert visual features of images, extracted by computers, into meaningful semantic information. Therefore, the computers can generate text descriptions that resemble human perception, enabling tasks such as image classification, retrieval, and analysis. In recent years, the performance of image caption has been significantly enhanced with the introduction of encoder-decoder architecture in machine translation and the utilization of deep neural networks. However, several challenges still persist in this domain. Therefore, this paper proposes a novel method to address the issue of visual information loss and non-dynamic adjustment of input images during decoding. We introduce a guided decoding network that establishes a connection between the encoding and decoding parts. Through this connection, encoding information can provide guidance to the decoding process, facilitating automatic adjustment of the decoding information. In addition, Dense Convolutional Network (DenseNet) and Multiple Instance Learning (MIL) are adopted in the image encoder, and Nested Long Short-Term Memory (NLSTM) is utilized as the decoder to enhance the extraction and parsing capability of image information during the encoding and decoding process. In order to further improve the performance of our image caption model, this study incorporates an attention mechanism to focus details and constructs a double-layer decoding structure, which facilitates the enhancement of the model in terms of providing more detailed descriptions and enriched semantic information. Furthermore, the Deep Reinforcement Learning (DRL) method is employed to train the model by directly optimizing the identical set of evaluation indexes, which solves the problem of inconsistent training and evaluation standards. Finally, the model is trained and tested on MS COCO and Flickr 30 k datasets, and the results show that the model has improved compared with commonly used models in the evaluation indicators such as BLEU, METEOR and CIDEr.

3,239 views
9 citations
Original Research
02 June 2023

Wafer defect recognition is an important process of chip manufacturing. As different process flows can lead to different defect types, the correct identification of defect patterns is important for recognizing manufacturing problems and fixing them in good time. To achieve high precision identification of wafer defects and improve the quality and production yield of wafers, this paper proposes a Multi-Feature Fusion Perceptual Network (MFFP-Net) inspired by human visual perception mechanisms. The MFFP-Net can process information at various scales and then aggregate it so that the next stage can abstract features from the different scales simultaneously. The proposed feature fusion module can obtain higher fine-grained and richer features to capture key texture details and avoid important information loss. The final experiments show that MFFP-Net achieves good generalized ability and state-of-the-art results on real-world dataset WM-811K, with an accuracy of 96.71%, this provides an effective way for the chip manufacturing industry to improve the yield rate.

4,246 views
9 citations
6,361 views
10 citations
Recommended Research Topics
Frontiers Logo

Frontiers in Neuroscience

Neuroscience-Inspired Visual Sensing and Understanding
Edited by Qingbo Wu, King Ngi Ngan, Lei Bai, Weisi Lin
17K
views
33
authors
7
articles
Frontiers Logo

Frontiers in Neuroscience

Deep Facial Attribute Analysis
Edited by Ke Zhang, Li Zhang, Yinghui Kong
14.2K
views
19
authors
5
articles
Frontiers Logo

Frontiers in Neuroscience

Multimodal Perceiving Technologies in Neuroscience and Vision Applications
Edited by Xiaojiang Peng, Linlin Shen, Avid ROMAN-GONZALEZ, Bingding Huang, Junsong Wang, Xiaomao Fan, Chuang Lin
11.3K
views
23
authors
5
articles
Frontiers Logo

Frontiers in Neuroscience

Artificial Systems Based on Human and Animal Natural Vision in Image and Video
Edited by HIROKI TAMURA, Gang Yang, Yuki Todo, Zheng Tang
22K
views
44
authors
8
articles
Frontiers Logo

Frontiers in Neuroscience

Advances in Computer Vision: From Deep Learning Models to Practical Applications
Edited by Hancheng Zhu, Rui Yao, Yanqiu Huang, Lu Tang
22.6K
views
52
authors
12
articles