Identifying Membrane Protein Types Based on Lifelong Learning With Dynamically Scalable Networks

Membrane proteins are an essential part of the body’s ability to maintain normal life activities. Further research into membrane proteins, which are present in all aspects of life science research, will help to advance the development of cells and drugs. The current methods for predicting proteins are usually based on machine learning, but further improvements in prediction effectiveness and accuracy are needed. In this paper, we propose a dynamic deep network architecture based on lifelong learning in order to use computers to classify membrane proteins more effectively. The model extends the application area of lifelong learning and provides new ideas for multiple classification problems in bioinformatics. To demonstrate the performance of our model, we conducted experiments on top of two datasets and compared them with other classification methods. The results show that our model achieves high accuracy (95.3 and 93.5%) on benchmark datasets and is more effective compared to other methods.


INTRODUCTION
The biological cell's daily activities are associated with membranes, without which it would not be possible to form a living structure. The essential proteins that make up membranes are the lipids and proteins that are the main components of membranes. In the present biological research, there are eight types of membrane proteins: 1) single-span 1; 2) single-span 2; 3) single-span 3; 4) single-span 4; 5) multi-span; 6) lipid-anchor; 7) GPI-anchor and (8) peripheral (Cedano et al., 1997).
Bioinformatics is present in all aspects of the biological sciences, and how to use computers to classify proteins efficiently and accurately has been a hot research problem in the direction of bioinformatics and computer science. Although traditional physicochemical as well as biological experiments are desirable in terms of predictive accuracy, these methods are too cumbersome and require a great deal of human and material resources. To save time and financial costs, and to better understand the structure and function of membrane proteins, a number of calculations have been developed to efficiently discriminate between protein types (Feng and Zhang, 2000;Cai et al., 2004;Zou et al., 2013;Wei et al., 2017;Zhou et al., 2021;Zou et al., 2020;Qian et al., 2021;Zou et al., 2021). The extant methods are in large part improvements on Chou's algorithm (Chou and Elrod, 1999). Song et al. (Lu et al., 2020) used Chou's 5-step method to extract evolutionary information to input to a support vector machine for protein prediction. Cao and Lu  avoided loss of information due to truncation by introducing a fag vector and used a variable length dynamic twoway gated cyclic unit model to predict protein. Yang (Wu et al., 2019) designed a reward function to model the protein input under full-state reinforcement learning. Wu and Huang (Wu et al., 2019) et al. used random forests to build their own model and used binary reordering to make their predictions more efficient. To avoid the limitations of overfitting, Lu and Tang (Lu et al., 2019) et al. used an energy filter to make the sequence length follow the model adaptively.
In most methods of machine learning, predictions are made using fixed models for different kinds of proteins, which generally suffer from two problems. Firstly, they do not allow for incremental learning, and secondly, they do not consider task-to-task connections at the task level. Lifelong learning approaches aim to bridge these two issues. Lifelong machine learning methods were first proposed by Thrun and Mitchell (Thrun and Mitchell, 1995), who viewed each task as a binary problem to classify all tasks. A number of memory-based and neural network-based approaches to lifelong learning were then proposed and refined by Silver (Silver, 1996;Silver and Mercer, 2002;Silver et al., 2015) et al.  proposed the Efficient Lifelong Learning Algorithm (ELLA), which greatly enhanced the algorithm proposed by Kumar (Kumar and Daume, 2012) et al. for multi-task learning (MTL). Ruvolo and Eaton  viewed lifelong learning as a real-time task selection process, Chen (Chen et al., 2015) proposed a lifelong learning algorithm based on plain Bayesian classification, and Shu (Shu et al., 2017) et al. investigated the direction of lifelong learning by improving the conditional random field model. Mazumder (Chen and Liu, 2014) investigated human-machine conversational machines and enabled chatbots to learn new knowledge in the process of chatting with humans. Chen, Liu and Wang (Wang et al., 2016) proposed a number of lifelong topic modeling methods to mine topics from historical tasks and apply them to new topic discovery. Shu et al. proposed a relaxed labeling approach to solve lifelong unsupervised learning tasks. Chen and Liu (Chen and Liu, 2019) provide more detailed information on the direction of lifelong machine learning in this book.
After a lot of research and careful selection, we ended up using a DSN model based on sequence information and lifetime learning of membrane proteins themselves. First, we processed the membrane protein sequence dataset based on BLAST (Altschul et al., 1997) to obtain the scoring matrix (PSSM) (Dehzangi et al., 2017;Sharma et al., 2018;Yosvany et al., 2018;Chandra et al., 2019). Then, we extracted valid features from the PSSM by the averaging block method (Avblock) (Shen et al., 2019), the discrete wavelet transforms method (DWT) Wang et al., 2017), the discrete cosine transforms method (DCT) (Ahmed et al., 1974), the histogram of oriented gradients method (HOG) (Qian et al., 2021)and the Pse-PSSM method. The features extracted by these five methods are then stitched end-to-end and fed into our model for prediction. Finally, the performance is evaluated by random validation tests and independent tests. Through the results we can see that our model achieves good prediction results in the case of predicting membrane protein types. Figure 1 shows a sketch of the main research in this paper.

MATERIALS AND METHODS
In this experiment, the sequence of discriminating membrane protein types can be broadly divided into three steps: creating the required model, performing training and testing of the model, and making predictions and conducting analysis of the results. Firstly, the features are extracted from the processed dataset. Next, the features are integrated into a lifelong learning model for prediction. Finally, the sequence information of the membrane proteins is transformed into algorithmic information, which is then analyzed and predicted using the model. Figure 2 shows the research infrastructure for this approach.

Analysis of Data Sets
In order to test the performance of our lifelong learning model, we experimentally selected two membrane protein datasets for testing, Data 1 and Data 2. The specifics of the two datasets are shown in Table 1. Data 1 and Data 2 include eight membrane protein types. Data 1 is from the work of Chou (Chou and Shen, 2007) et al. The training and test sets were randomly obtained from Swiss-Prot (Boeckmann et al., 2003) by percentage assignment, which ensured that the quantities of these two sequences are consistent. Data 2 is from the work of Chen (Chen and Li, 2013) et al. where they used the CD-hit (Li and Godzik, 2006) method to remove redundant sequences from dataset 1 so that no two-by-two sequences would have less than 40% identity.

Extracting the Message of Evolutionary Conservatism
The PSSM used in this experiment is the "Position-Specific Scoring Matrix." This scoring matrix stores the sequence information of membrane proteins. We use the PSSM matrix Ding et al., 2017;Shen et al., 2019) for membrane protein prediction because it reflects the evolutionary information of membrane proteins very well. For any membrane protein sequence, such as Q, the PSSM can be derived by PSI-BLAST (Altschul et al., 1997), after several iterations. First it forms the PSSM based on the first search Frontiers in Genetics | www.frontiersin.org March 2022 | Volume 12 | Article 834488 2 result, then it performs the next step which is the second search based on the first search result, then it continues with the second search result for another time and repeats this process until the target is searched for the best result. As the performance of the experimental results is best after three iterations, we generally adopt three iterations as the setting. The value of its E is 0.001. Assume that the sequence Q q 1 q 2 q 3 ... . q L , whose length is L. This is followed by storing the PSSM containing the membrane protein evolution information inside a matrix with a size-area of L × 20. The matrix is represented as follows: In addition, the expression below shows the representation of PSSM original (i, j): where ω(i, k) is the frequencies of the k-th type of amino acid at the i-th position and D(k, j) is the mutation percentage of the substitution matrix from the k-th molecular substances into the sequence of the protein. The larger the value, the more conserved the position is. If this is not present, the contrary will be achieved.

Pse-Pssm
Pse-PSSM (pseudo-PSSM) is a feature extraction method often used in membrane protein prediction (Chou and Shen, 2007). PSSM matrices are often used in the characterization of membrane proteins. This feature extraction method, which aims to preserve PSSM biological information through pseudo amino acids, is expressed as follows: The P normalized is as follows: where f i,j is the normalised PSSM score with a mean of 0 for the 20 amino acids. And the p i,j is the raw score. While a positive score refers to the occurrence of the corresponding homozygous mutation, which is more frequent in multiple reciprocals, over and above chance mutations, a negative score is the opposite of a positive score.

Average Blocks
The AB method was first proposed by Huang et al. Its full name is the averaging block methodology (AvBlock) (Jong Cheol Jeong et al., 2011). When feature extraction is performed for PSSM, the extracted feature values are diverse because the size of individual features is different and the abundance of amino acids also varies in the individual membrane proteins. To solve this type of problem, we can average the features for the local features of the PSSMs. Inside each module after averaging, 5% of the membrane protein sequences are covered, and this method is the AB feature extraction method. When performing the AB method feature extraction on the PSSM, it is not necessary to consider the sequence length of the membrane proteins. When  we split the PSSM matrix by rows, it becomes a block of size L/20 each, with 20 blocks. After this operation, every 20 features form a block. the AB formulation is as follows: i 1, . . . , 20; j 1 . . . , 20; j 1, . . . , 20; k j + 20 × (i − 1) Where N/20 is the size of j blocks and Mt(p + (i − 1) × 20 N , j) is a vector of size 1 × 20 from position i th of the j th block that is taken in the pssms.

Discrete Wavelet Transform
We refer to a discrete wavelet transform feature extraction method as DWT, which uses the concepts of frequency and position (Nanni et al., 2012). It is because we can consider the membrane protein sequence as a picture, then matrix the sequence and extract the coefficient information from the matrix by DWT. This method was first suggested by Nanni et al.
In addition to this, we refer to the projection of the signal f(t) onto the wavelet function as the wavelet transform (WT). This is shown below: where in the above equation, the scale variable is denoted by a, the translational variable by b, and ψ( t−b a ) refers to the wavelet parsing and analysis function. T (a, b) refers to the transform coefficients used in conjunction with a particular location when performing a specific wavelet period signal transform. Further, an efficient DWT algorithm was submitted by Nanni et al.; they denote the discrete signal f(t) by x [n] and perform a DWT on it. It is expressed in terms of the coefficients as follows: where N is the length of the discrete signal and the low-pass and high-pass filters are g and h, respectively. y j,low [n] is the approximate coefficient when the signal is in the low-frequency part, while y j,high [n] is the detailed coefficient when the signal is in the high-frequency band. In our study, their mean, standard deviation, maximum and minimum values are computed through the DWT quadruple layer. In addition, the PSSM discrete signal after transforming four times, consists of 20 discrete signals. The 4-stage DWT structure can be seen in Figure 3.

Discrete Cosine Transform
The DCT (Ahmed et al., 1974), known as the Discrete Cosine Transform, converts a signal to its fundamental frequency by means of a linearly separable transform. This method has been widely used in the field of image compression. In this experiment, we compressed the PSSM matrix of membrane proteins using a 2-dimensional DCT (2D-DCT). 2D-DCT is defined as follows: The mission of the DCT is to convert a uniformly distributed information density into an uneven distribution. Once its length and signal have been converted, the most important part of the information is collected in the low frequency section of the PSSM, that is in the middle and top-left corner.

Histogram of Oriented Gradient
Histogram of Oriented Gradients (HOG), which is a method for describing features, is mainly used in computer vision. In this experiment, to handle a PSSM matrix using the HOG method, it is first necessary to look at it as a particular image matrix. In the first step, the horizontal gradient values and vertical gradient values of the PSSM are used to derive the direction and size of the gradient matrix. In the second step, the gradient matrix is divided into 25 sub-matrices by direction and size. In the third step, conversion of the results generated in the second step is carried out according to the requirement to generate 10 histogram channels per sub-matrix.

Lifelong Learning
Lifelong learning (Thrun and Mitchell, 1995), like machine learning, can be divided into the directions of lifelong supervised learning, lifelong unsupervised learning and lifelong reinforcement learning. The lifelong machine learning part of the study focuses on whether the model can be extended when new categories of categories are added to the model. When the current model has been classified into n categories, if a new class of data is added, the model can somehow be adaptively expanded to classify n + 1 category. Multi-task learning models and lifelong machine learning are easily translatable to each other if we have all the original data. Whenever a new category is added to the original category and needs to be classified, only one new category needs to be added and then all the training data can be trained again to expand the new category. One obvious disadvantage of this strategy is that it wastes a lot of computational time to compute each new class, and if too many new classes are added, it may lead to changes in the model architecture for multi-task learning. This model therefore uses a dynamically scalable network to better perform incremental learning of the added tasks. Assume that the current model has successfully classified Class 1, Class 2, ... , Class n. When the new data class Class-new is added, the model does not need to train all the data from scratch, but only needs to expand the overall model by adding n new binary classification models. The simple flow of the lifelong learning model is shown in Figure 4.

Dynamically Scalable Networks
Dynamically scalable networks are incremental training of deep neural networks for lifelong learning, for which there will be an unknown amount and unknown distribution of data to be trained fed into our model in turn. To expand on this, there are now T a sequence of task learning models, t 1, .... , t, .... , T is unbounded T in which the tasks at time point t carry training data D t {x i , y i } Nt i 1 . It is important to note that each subtask can be a single or a group of tasks. For simplicity, even though our approach is general for any kind of task, we only consider the two-classification problem. That is, input features x ∈ R d of y ∈ {0, 1}. One challenge with lifelong learning is that at the current time t, all previous training datasets are unavailable (if any, only from previous model parameters). The lifelong learning agent learns the model parameters W t by solving the following problem in a reasonable amount of time t: where L is task specific loss function, W t is the parameter for task t, and Ω (W t ) is the regularization (e.g. element-wise ℓ 2 norm) to enforce our model W t appropriately. In case of a neural network which is our primary interest, W t {W l } L l 1 is the weight tensor.
To counter these problems that arise in the course of lifelong learning, we allow the knowledge generated in previous tasks to be used to the maximum extent possible. At the same time, it is allowed to dynamically extend its capabilities when mechanically accumulated knowledge does not explain well for emerging tasks. Figure 5 and Algorithm 1 illustrate our progressive learning process.

EXPERIMENT RESULTS
In this subsection we will have an analysis of the capabilities of the respective modelling and methodological approaches. Furthermore, the modelling we used in that context was compared with other available methods on separate datasets.

Assessment Measurements
To evaluate our lifelong learning model better, we chose several parameters: sample all-prediction accuracy (ACC), single sample specificity (SP), single sample sensitivity (SN), and Mathews correlation coefficient. These metrics are widely used in the analysis of biological sequence information: For the above equation, true trueness (TP) refers to the number of true samples correctly predicted; false positivity (FP) refers to the number of true samples incorrectly predicted; true negativity (TN) refers to the number of negative samples correctly predicted; and false negativity (FN) refers to the number of negative samples erroneously predicted (Chou et al., 2011a;Chou et al., 2011b;Chou, 2013).

Situational Analysis of Two Data Sets
The lengths of the datasets used in our experiments are shown in Figure 6. Most of the membrane proteins in dataset 1 and dataset 2 have a similar length distribution because of their specific type. To better demonstrate the superiority of lifelong learning for membrane protein classification, we calculated the amino acid frequencies for all protein types in the experiment, as shown in Figure 7.  The PSSM sequence matrix contains important genetic information required for protein prediction. Many elements of biological evolution, such as the stability of the three-dimensional structure and the aggregation of proteins, can have an impact on the storage and alteration of sequences. These elements demonstrate that PSSM captures important information about ligand binding. Thus, proving the validity of the PSSM characterization method.
We will compare the model methods we used in this use with other existing methods in terms of prediction accuracy on dataset 1.   (Chou and Shen, 2007). c The results are taken from (Chen and Li, 2013 The methods involved in the comparison are MemType-2L (Chou and Shen, 2007) and predMPT (Chen and Li, 2013). Details can be found in Table 2, where it is clear that our model approach has an overall ACC of 95.3%. 3.7% higher than MemType-2L's 91.6%, 2.7% higher than predMPT's 92.6%, and 2.4% higher than Average weights' 92.7%. In the independent test set, our method was superior for membrane protein type 2 (88.4%), type 4 (83.3%), type 5 (96.3%) and type 7 (100%).

The Forecasting Results for Dataset 2
As a solution to the possible problem of untimely updates in Data 1, Chen and Li (Chen and Li, 2013) used the Swissprot annotation method to update Data Set 1, resulting in a new dataset of membrane proteins (Data Set 2). The results of the comparison using Dataset 2 are presented in Table 3. The overall average accuracy of our models was 3.2% higher than the predMPT method (90.3%). Even though they added features such as 3D structure to the predMPT prediction session, our model performance was clearly higher than it. We outperformed it by 2.6% in terms of prediction accuracy for type 1 (94.1 vs. 91.5%). In contrast, analog 5 outperformed it by 1.3% (94.1 vs. 92.8%).

CONCLUSION AND DISCUSSION
In previous work, investigators have often used the PseAAC (Chou, 2001) approach to identify membrane protein types, and this approach has indeed performed well in the field of protein classification. Using Chou's operation (Chou and Shen, 2007) on feature extraction from PSSM, we were inspired to use the five methods of Pse-pssm, DCT, AvBlock, HOG and DWT to extract features. In order to avoid the low accuracy of a single feature extraction method, we integrated the above five methods together and fed the integrated features into our DSN model method.
Our constructed lifelong learning dynamic network model proved to achieve superior results on different datasets (95.3 and 93.5%). However, the prediction of some small sample affiliations by the methodology has not been as accurate as we had anticipated. In order to improve the performance of this model, we will consider improving our own features and combining some other feature extraction methods, and adjusting the parameters of our model in our future research.

DATA AVAILABILITY STATEMENT
Publicly available datasets were analyzed in this study. This data can be found here: https://github.com/sjw-cmd/DATA.git.

AUTHOR CONTRIBUTIONS
WL proposed the original idea. JS and HW designed the framework and the experiments. WL, JS and YZ performed the experiments and the primary data analysis. WL and JS wrote the manuscript. YQ, QF and XC modified the codes and the manuscript. All authors contributed to the manuscript.