AUTHOR=Gan Yao , Fu Yanyun , Wang Deyong , Li Yongming TITLE=A novel approach to attention mechanism using kernel functions: Kerformer JOURNAL=Frontiers in Neurorobotics VOLUME=Volume 17 - 2023 YEAR=2023 URL=https://www.frontiersin.org/journals/neurorobotics/articles/10.3389/fnbot.2023.1214203 DOI=10.3389/fnbot.2023.1214203 ISSN=1662-5218 ABSTRACT=Artificial Intelligence (AI) is a new technological science that researches and develops theories, methods, technologies and application systems used to simulate, extend and expand human intelligence. With the rise of AI, transformer has been highly successful in various natural language processing (NLP) tasks. However, its attention mechanism requires a quadratic calculation cost with respect to the input sequence length, which limits its efficiency and scalability for longorder tasks. To address this challenge, we propose a linear transformer based on the kernel approach, named Kerformer. Our proposed method simplifies the attention operation by leveraging a nonlinear reweighting mechanism that transforms the attention mechanism from traditional maximum attention to dot product attention based on feature mapping. The Kerformer algorithm focuses on two key properties of softmax computation: non-negativity and non-linear weighting.To satisfy these properties, we separately perform a non-negativity operation on Query(Q) and Key(K) and make their computations separable. In addition, we incorporate the SE Block to re-weight the non-negativity processed K matrices and improve the performance of the model.Our approach reduces the time complexity of the attention matrix from O(N 2 ) to O(N ), where N is the sequence length, resulting in significantly improved efficiency and scalability for long-order tasks. In our simulation experiments, Kerformer outperformed other methods with lower time and memory consumption. On NLP and vision tasks, Kerformer achieved higher average accuracy (83.39%) and performed better in long sequence tasks (average accuracy of 58.94%). It also demonstrated superior efficiency and convergence speed in visual tasks compared to other models.