ORIGINAL RESEARCH article

Front. Chem., 13 April 2022

Sec. Analytical Chemistry

Volume 10 - 2022 | https://doi.org/10.3389/fchem.2022.887960

Extreme Point Sort Transformation Combined With a Long Short-Term Memory Network Algorithm for the Raman-Based Identification of Therapeutic Monoclonal Antibodies

  • JL

    Jin Ling 1

  • LZ

    Luxia Zheng 1

  • MX

    Mingming Xu 1

  • GC

    Gang Chen 1

  • XW

    Xiao Wang 2

  • DM

    Danzhuo Mao 2

  • HS

    Hong Shao 1*

  • 1. NMPA Key Laboratory for Quality Control of Therapeutic Monoclonal Antibodies, Shanghai Institute for Food and Drug Control, Shanghai, China

  • 2. NMPA Key Laboratory for Quality Analysis of Chemical Drug Preparations, Shanghai Institute for Food and Drug Control, Shanghai, China

Abstract

Therapeutic monoclonal antibodies (mAbs) are a new generation of protein-based medicines that are usually expensive and thus represent a target for counterfeiters. In the present study, a method based on Raman spectroscopy that combined extreme point sort transformation with a long short-term memory (LSTM) network algorithm was presented for the identification of therapeutic mAbs. A total of 15 therapeutic mAbs were used in this study. An in-house Raman spectrum dataset for model training was created with 1,350 spectra. The characteristic region of the Raman spectrum was reduced in dimension and then transformed through an extreme point sort transformation into a sequence array, which was fitted for the LSTM network. The characteristic array was extracted from the sequence array using a well-trained LSTM network and then compared with standard spectra for identification. To demonstrate whether the present algorithm was better, ThermoFisher OMNIC 8.3 software (Thermo Fisher Scientific Inc., U.S.) with two matching modes was selected for comparison. Finally, the present method was successfully applied to identify 30 samples, including 15 therapeutic mAbs and 15 other injections. The characteristic region was selected from 100 to 1800 cm−1 of the full spectrum. The optimized dimensional values were set from 35 to 53, and the threshold value range was from 0.97 to 0.99 for 15 therapeutic mAbs. The results of the robustness test indicated that the present method had good robustness against spectral peak drift, random noise and fluorescence interference from the measurement. The areas under the curve (AUC) values of the present method that were analysed on the full spectrum and analysed on the characteristic region by the OMNIC 8.3 software’s built-in method were 1.000, 0.678, and 0.613, respectively. The similarity scores for 15 therapeutic mAbs using OMNIC 8.3 software in all groups compared with that of the relative present algorithm group had extremely remarkable differences (p < 0.001). The results suggested that the extreme point sort transformation combined with the LSTM network algorithm enabled the characteristic extraction of the therapeutic mAb Raman spectrum. The present method is a proposed solution to rapidly identify therapeutic mAbs.

Introduction

Biologics, especially therapeutic monoclonal antibodies (mAbs), have been progressively utilized in clinics over the last decade and have been established as a major therapeutic modality to treat severe human diseases (Singh et al., 2018; Li et al., 2020). To date, approximately 80 antibody-based biopharmaceutical products have been approved for therapeutic use worldwide in 2019, and 42 therapeutic mAbs have been marketed in China (Castelli et al., 2019). Over 100 mAbs are in development, and new therapeutic mAbs are marketed every year (Ettah and Ashton, 2018; Castelli et al., 2019). Meanwhile, the number of counterfeits is increasing. Therapeutic mAb products are commonly colourless injections or white freeze-dried powder, which are easy to make counterfeit (Loethen et al., 2015). Therapeutic mAbs are considered an attractive target for counterfeiters for another important reason: they are very expensive (Degardin et al., 2017). Therefore, strict control of therapeutic mAb products is required to fight against the counterfeit of mAbs (Frosch et al., 2019). Velpandian et al. did a protein analysis using Bradford assay and SDS-PAGE to confirm the presence of bevacizumab in 16 samples including six suspected and 10 others. The results suggested that the counterfeit bevacizumab can be preliminarily identified by simple methods (Velpandian et al., 2016). These simple methods can only differentiate the therapeutic mAbs from the protein-free counterfeits. Legrand et al. reported a strategy based on the concomitant use of capillary zone electrophoresis analysis (CZE-UV), size exclusion chromatography coupled to multi-angle light scattering (SEC-MALS) and liquid chromatography hyphenated to tandem mass spectrometry (LC-MS/MS) which enables the structural identification of mAbs in addition to comprehensive characterization and quantification in samples in potentially counterfeit samples (Legrand et al., 2022). Although the concomitant use of several analysis methods can provide an accurate result, it is complicated and takes a long time.

Various rapid analysis methods based on spectrum and chromatography can allow quick identification and quality assurance in pharmacies, medical facilities, pharmaceutical warehouses, etc., (Igne et al., 2015; Hattori et al., 2018; Tobias et al., 2021). These analysis methods usually take a few minutes to give a result, which will enhance public safety. In particular, Raman spectroscopy is fast and easy to operate and provides molecular fingerprints by detecting unique vibrations from atomic bonds within a molecular structure (McAvan et al., 2020). A Raman spectrum usually contains functional groups and an overall chemical arrangement that could be used for molecular structure analysis (Lanzarotta et al., 2020). As a result, the Raman technique has been widely employed in the pharmaceutical industry and in the random inspection of small molecule drugs (Lawson and Rodriguez, 2016). Several applications have been reported for mAb analysis. A Raman spectroscopy-based nutrient control strategy was developed to enable dual control of lactate and glucose levels for a fed-batch CHO cell culture process for mAb production, which could benefit both cell metabolism and mAb product quality (T et al., 2021). Yilmaz and his colleagues demonstrated the possibility of subclass-independent quantitative mAb prediction by Raman spectroscopy in real time (Yilmaz et al., 2020). The applicability of Raman spectroscopy to detect posttranslational modifications (PTMs) and degradation seen in mAbs in the manufacturing process has been investigated (McAvan et al., 2020). The reports highlighted Raman spectroscopy as a technique to suit the application of mAb analysis, as it is information-rich, minimally invasive, insensitive to water background and requires little to no sample preparation (Jones et al., 2019).

A long short-term memory (LSTM) network is a novel recurrent neural network (RNN) architecture in conjunction with an appropriate gradient-based learning algorithm (Yu et al., 2021). Unlike RNN networks, the LSTM network has a hidden inner state that enables applying the previous sequence input to a later calculation and avoids gradient disappearance through a gating unit system. Efficient processing of sequences of data and effective propagation of information along the sequence are commonly observed in LSTM networks. Therefore, LSTM networks have been successfully applied to process time-series and position-series data, for example, stemming from speech or video and natural language sentences. Some studies are currently attempting to use LSTM networks to analyse Raman spectra. Yu et al. presented Raman spectroscopy combined with an LSTM network to predict eight strains isolated from the marine organism Urechis unicinctus and compared them with a method using a normal convolutional neural network (CNN) (Yu et al., 2021). The proposed LSTM methods achieved average isolation-level accuracies greater than 94%, which is higher than that of the normal CNN network. Wang and his colleagues presented a rapid method to screen hepatitis B patients using serum Raman spectroscopy combined with LSTM networks. In a previous report, principal component analysis (PCA) was selected to extract key features of spectral data for dimensional reduction of the multidimensional spectrum, followed by training of the LSTM network. Meanwhile, a fully connected layer (FCL) was used for the classification of HBV (Wang et al., 2020).

Spectral characteristic extraction is very important in the signal preprocessing step (Fukuhara et al., 2019; Galeev et al., 2019; Ling et al., 2020). Original spectral data usually contain many invalid data points, which will cause a serious performance loss and low accuracy rate. Without preprocessing characteristic extraction, model training will be difficult. FCL is usually used as a classifier at the terminal of the model. However, artificial neural networks are nonlinear fitting models, and adding an FCL classifier may cause overfitting in some cases, especially in small-sample-size training. In the present study, we introduce a novel Raman spectrum detection approach with LSTM networks for therapeutic mAb identification. First, fast Fourier transformation (FFT) and inverse fast Fourier transformation (IFFT) are performed on the spectral characteristic region to remove spectral noise and reduce dimensions. Extreme point sort transformation is a critical step to convert the spectrum into a sequence array that involves the peak absolute positions and their relative intensities. The sequence array can be considered a number-labelled sentence, which fits the LSTM network. LSTM networks are well trained and then employed to extract the characteristics of the sequence array deeply into a characteristic array. The extracted characteristic array will be compared with the characteristic arrays extracted from standard spectra using the same method to obtain a similarity score. The sample will be identified as a therapeutic mAb class where the similarity score is higher than the threshold value of the therapeutic mAb class.

Methods

Therapeutic Monoclonal Antibodies

A total of 15 therapeutic mAbs from seven manufacturers, including adalimumab, bevacizumab, ranibizumab, tocilizumab, evolocumab, secukinumab, rituximab, trastuzumab, pertuzumab, denosumab, ixekizumab, ustekinumab, guselkumab, emicizumab, and etanercept, were selected for the study. Eleven effect targets and four protein formats of mAbs were involved when choosing therapeutic mAbs. Information on the 15 therapeutic mAbs is shown in Supplementary Table S1. They were labelled from 0 to 14 for artificial intelligence identification.

Raman Analysis

As shown in Figure 1A, approximate 250 μL of therapeutic mAb injections were loaded into a 96-well quartzose plate for Raman scanning. Sufficient injection fluid had to be added to make the fluid level flush with the rim of the tiny cup. ThermoFisher Scientific DXR confocal Raman microscopy (Thermo Fisher Scientific Inc., U.S.) was employed for analysis, which was equipped with a 20× objective, a 532 nm laser and a 532 nm filter. A full spectral range of 57–3,417 cm−1 was set to acquire the Raman spectrum. Data acquisition was performed using ThermoFisher OMIC 8.3 software (Thermo Fisher Scientific Inc., U.S.) with a 10 mW laser power, 10 s exposure time, and 20 scans. For each therapeutic mAb, 18 samples in three batches were randomly selected to collect spectra. Raman spectra were acquired from five random points on each sample. Finally, a total of 1,350 Raman spectra were obtained to build a Raman spectrum database for 15 therapeutic mAbs.

FIGURE 1

Dataset Preparation

Spectra were exported as comma-separated values (CSV) files using ThermoFisher OMIC 8.3 software. The CSV file contained the Raman shift values and the numeric value of the intensity for every single point of the Raman spectrum. These numeric values of both Raman shift and intensity in CSV files were read by Python v3.8.5 and then inserted into a MySQL v5.7.20 (MySQL AB, Sweden) database table to create an in-house dataset with some basic information, such as mAb name, specifications, and date of analysis. Numeric labels of data from 0 to 14 were assigned to each mAb. Before data processing, all numeric value data with labels were exported from the database into a text file to obtain high loading performance.

Data Transformation

Here, we present an extreme point sort transform for the array of Raman spectra, which transforms array-type spectrum data into a list of sequence numbers. First, the characteristic part of the mAbs, ranging from 100–1,800 cm−1 in the Raman spectrum, was captured. Then, noise removal and dimensionality reduction were performed on the characteristic region using FFT and IFFT. The FFT formula is shown as follows:where is the consecutive integers from 0 to −1, is the sequence number of the time-domain signal array, and is the total number of array elements.

After FFT processing, the low-frequency range was picked to perform IFFT to obtain a noise removed and a low-dimensional spectrum. The IFFT formula is shown as follows:where is the consecutive integers from 0 to −1, is the sequence number of the frequency-domain signal array, and is the total number of the low-frequency range array elements.

The positions of extreme points of the spectrum were extracted using the Scipy library. The first extreme point was removed and then sorted by their intensities in descending order. The extreme point array was padded up to digitals with zero to match the input dimension of the model.

Data visualization after each processing step was performed using the Matplotlib library. The data labels were converted into one-hot labels using the Keras library. The dataset containing all numeric values and labels was split randomly into a training dataset and validation dataset with a split ratio of 0.8, which means that 80% of the data were used for model training and the other 20% were used for model validation. The test dataset was created using 20 additional Raman spectra of each therapeutic mAb. These spectra were never used before as a test set.

Long Short-Term Memory Modelling

As shown in Figure 2, the training process was marked within a light blue full line box, and the predicting process was marked within a green dashed line box. The well-trained models were employed for spectral characteristic extraction to obtain a characteristic array, which was compared to the characteristic arrays of standard spectra for final identification.

FIGURE 2

All training and predictions were carried out on a SuperMicro 7049GR GPU server equipped with Dual Intel(R) Xeon(R) Gold 6,149 CPUs, 128 Gb of DDR4 RAM and Quad Nvidia RTX 3090 24 Gb graphics cards. The operating system was the 64-bit CentOS Linux system v7.5. TensorFlow v2.0.0 was employed to train and validate the LSTM network models.

As shown in Supplementary Table S2, the LSTM network model contains one embedding layer, one bidirectional LSTM layer, one FCL, one dropout layer and one output layer to form the output prediction. The dimension of the LSTM layer is 64. The numbers of nodes in the FCL and output layer were 32 and 15, respectively. The activation function of the FCL and output layer were the rectified linear unit (ReLU) function and Softmax function, respectively. The softmax function defined in Eq. 3 was applied in the last layer to produce the prediction probability over the 15 output classes (Hsieh et al., 2020).where are the scores inferred by the net for each class in .

The categorical cross-entropy was selected as the loss function, which was defined in Eq. 4. The goal of the network is to minimize .where is the score for the positive class.

Adam is selected as the optimizer. The learning rate was set as 0.001, and the number of epochs was set as one to five.

Similarity Scoring

Characteristics of the Raman spectrum were extracted as a characteristic array using extreme point sort transformation and the LSTM network model. The characteristic array of the sample spectrum was compared with that of standard spectra to obtain the cosine similarity score using the Sklearn library. The threshold value of the cosine similarity score was set as the lowest score, which was calculated by comparing each standard spectrum to the remaining standard spectra. The sample spectrum was identified when the cosine similarity score was higher than the threshold value.

Production of Simulative Spectra

Spectral peak drift, random noise and fluorescence interference were considered to test the robustness of the method. Elements in the spectral intensity array were integrally shifted to the left or right with 1–20 digitals to simulate spectral peak drift of 1–20 cm−1. The blank positions produced while shifting were filled with the adjacent value. For simulated random noise on the spectrum, every single element in the spectral intensity array was multiplied by a random percentage. The range of the random percentage is 60–100% or 100–140%. For simulated fluorescence interference, a positive half-sine part wave overlapped in the 0–1800 cm−1 range of the original spectrum. The sine wave amplitude was set as 60–140% of the maximum peak intensity. The signal frequency and sampling frequency were both set as 1. The sine wavelength was set as a random integer in the range of 3,600–7,200. The reverse arrangement of the elements in 0–1800 digital positions was picked up as a positive half-sine part to create the simulative spectrum.

Evaluation of Prediction

The loss, precision, accuracy and recall were used to evaluate the model since they are commonly used in most cases for evaluations. The loss values were calculated using the categorical cross-entropy formula described above. The precision, accuracy and recall were calculated as follows:where is true positives, is false-positives, is true negatives, and is false negatives.

A confusion matrix was established to investigate the classification performance. Each column of the matrix stands for a predicted label, while each row represents a true label. The receiver operating characteristic (ROC) curve was drawn with the true positive rate and false-positive rate.

Tests of the prediction performance for the present method and ThermoFisher OMNIC 8.3 software built-in method (analysed on full spectrum and characteristic region, respectively) were performed using a total of 300 Raman spectra (20 Raman spectra for each therapeutic mAb) test set with labels. The ThermoFisher OMNIC 8.3 software built-in method was used to match the spectrum with the user’s standard spectrum library. The match parameter of the analysis region was set as the full spectrum or a range of 100–1800 cm−1 for full spectrum analysis or characteristic region analysis, respectively.

Sample Identification

A total of 30 samples, including 15 therapeutic mAb injections, three biochemical drug and biological product injections, and 12 chemical drug injections, were selected for the identification test using the present method. A sample is identified when the similarity score is higher than the threshold value of the standard spectrum. If the similarity score was lower than every threshold value, it was identified as “not matched”.

Results

Identification Approach

As shown in Figure 1, an optimized Raman identification approach for therapeutic mAbs was proposed. Approximately 250 μL of sample was loaded on a quartzose 96-well plate, followed by Raman analysis to obtain the spectrum. The spectral data were exported into a CSV file with 3,417 data points for data analysis.

Characteristic Region Acquirement

Fifteen typical spectra of therapeutic mAbs are shown in Figure 3. By comparing each spectrum manually, a range of 100–1800 cm−1 was selected as the characteristic region, which is information-rich.

FIGURE 3

Data Transformation

The data visualizations of the extreme point sort transformation of three therapeutic antibodies are shown in Figure 4, which provides insights into the operating mode of the novel data transformation and the conversion process of spectral features. As shown in Figure 4, the original Raman spectrum had a full scan range of 57–3,417 cm−1. After interception, a characteristic region with a range of 100–1800 cm−1 was drawn as a new subspectrum (Figure 4, Line 4). The target dimensional value for each therapeutic mAb is listed in Table 1, which was used for spectral dimensionality reduction. The filtered spectrum and the extreme points demonstrated that approximately 10 extreme points in each filtered spectrum could be extracted as a sequence array. The elements of the extreme point positions were sorted by their corresponding intensity. Finally, the sorted array was padded with zero to the digitals of the target dimensional value for input of the LSTM network (Figure 4, Line 7).

FIGURE 4

TABLE 1

LabelDimensional valueEpochLossAccuracyValidation lossValidation accuracyThreshold
04030.02970.99750.01040.99750.99
14750.02390.99290.01680.99480.99
24910.35970.88390.03640.98690.99
34250.03960.98750.01850.99150.99
44830.03160.99090.00630.99830.99
54610.44350.86050.08050.96940.98
64120.07260.97740.02860.99060.99
73510.43770.86440.05850.97960.99
84230.05560.98210.02910.98880.97
95330.04510.98530.002040.99080.99
103920.09020.97290.03790.98560.99
114730.03940.9880.00960.99730.99
124710.32160.90270.03710.98770.97
134010.32250.90410.02030.99030.99
143510.41840.86760.06580.97940.99

The primary parameters and training results of the optimized model for the identification of therapeutic monoclonal antibodies.

Model Training

A total of 15 models were trained for the extraction of spectral characteristics. The primary parameters and training results of the optimized model for the identification of therapeutic mAbs are listed in Table 1. The optimized target dimensional values were set from 35 to 53. The epoch for training had a range of one to five. The accuracy and validation accuracy of the 15 models ranged from 0.8644 to 0.9975 and 0.9796 and 0.9983, respectively. The optimized threshold values ranged from 0.97 to 0.99.

Robustness Test

To test the robustness of the algorithm, a total of 36,000 simulated spectra were created manually. The average accuracy results (n = 20) are shown in Figure 5A. Fourteen of fifteen therapeutic mAb simulated spectra with an offset of wavenumbers under ±5 cm−1 could be identified using the present method to achieve an accuracy over 95%. The simulated spectra of therapeutic mAbs labelled 0, 1, 2, and 6 were correctly identified, while the offsets of the wavenumber were under ±10 cm−1. As shown in Figure 5B, all simulated spectra with a random incremental range of 10% or −10% intensity could be identified and achieved an accuracy over 95%. The simulated spectra of therapeutic mAbs labelled two could be correctly identified, while the random incremental range of intensity was in the range of ±40%. As shown in Figure 5C, all simulated spectra with a 10% or −10% increment of the maximum peak intensity could be identified and achieved an accuracy over 95%. The results indicated that the present algorithm had good robustness against spectral peak drift, random noise and fluorescence interference.

FIGURE 5

Model Evaluation

Figure 6 shows the ROC curves of the prediction results, confusion matrixes and similarity scores. As shown in Figure 6A, the areas under the curve (AUC) values of the present algorithm, OMNIC 8.3 software built-in algorithm analysed on the full spectrum and analysed on the characteristic region were 1.000, 0.678 and 0.613, respectively. In the confusion matrix, the diagonal shows the percent of correct prediction records for each therapeutic mAb, and the off-diagonals show the percent of misclassifications for each therapeutic mAb (Figure 6B). The similarity scores of classification for 15 therapeutic mAbs using the present algorithm were 100. The similarity scores for 15 therapeutic mAbs using OMNIC 8.3 (analysed on the full spectrum) were 81.40 ± 1.40, 86.13 ± 0.17, 97.27 ± 0.04, 17.85 ± 1.84, 82.70 ± 0.22, 91.07 ± 0.32, 86.59 ± 0.21, 98.59 ± 0.02, 0, 93.93 ± 0.08, 56.36 ± 2.13, 90.74 ± 0.17, 84.65 ± 0.19, 96.83 ± 0.04 and 81.41 ± 0.98. The similarity scores for 15 therapeutic mAbs using OMNIC 8.3 (analysed in the characteristic region) were 92.76 ± 0.11, 89.58 ± 0.13, 97.68 ± 0.04, 54.07 ± 2.27, 86.49 ± 0.18, 93.20 ± 0.26, 89.33 ± 0.14, 98.83 ± 0.02, 86.63 ± 0.17, 95.32 ± 0.06, 90.10 ± 0.15, 91.47 ± 0.16, 88.71 ± 0.14, 97.36 ± 0.03 and 89.37 + 0.16. All groups compared with the relative present algorithm group had extremely remarkable differences (p < 0.001). The results suggested a high classification performance for the present algorithm. These results suggested that the extreme point sort transformation combined with the LSTM network algorithm has an excellent classification performance for Raman-based therapeutic mAb identification.

FIGURE 6

Sample Identification

A total of 30 samples, including 15 therapeutic mAb injections and 15 other injections, were tested using the present method. As shown in Table 2, 15 therapeutic mAbs were correctly identified, and 15 non-antibody samples were identified as “not matched”.

TABLE 2

Sample IDCategorySamplePredictionSimilarity scoreMatched
1Therapeutic antibodyAdalimumabAdalimumab0.999Yes
2BevacizumabBevacizumab0.998Yes
3RanibizumabRanibizumab0.998Yes
4TocilizumabTocilizumab0.999Yes
5IloeuvumabIloeuvumab0.997Yes
6ScuciumabScuciumab0.990Yes
7RituximabRituximab0.992Yes
8TrastuzumabTrastuzumab0.999Yes
9PertuzumabPertuzumab0.988Yes
10DenosumabDenosumab0.996Yes
11IxekizumabIxekizumab0.999Yes
12UstekinumabUstekinumab0.997Yes
13GuselkumabGuselkumab0.986Yes
14AlmezumabAlmezumab0.998Yes
15EtanerceptEtanercept0.999Yes
16Biochemical drug and biological productHuman immunoglobulin for intravenous injectionNM
17Tetanus immunoglobulin injectionNM
18Atosiban acetate injectionNM
19Chemical drugPotassium and magnesium aspartate injectionNM
20Compound aminobarbital injectionNM
21Dopamine hydrochloride injectionNM
22Lidocaine hydrochloride injectionNM
23Tramadol hydrochloride injectionNM
24Aminotoluene acid injectionNM
25Epinephrine hydrochloride injectionNM
26Bupivacaine hydrochloride injectionNM
27Potassium chloride injectionNM
28Sodium chloride injectionNM
29Invert sugar injectionNM
30Sterile water for injectionNM

Application of the presented method to identify 30 injections, including 15 monoclonal antibodies and 15 other injections.

NM: not matched.

Discussion

Raman spectroscopy has been increasingly studied for protein sample analysis, which could indeed offer the possibility of detecting protein bands and chemicals with few preparations and little cost for a few seconds to a few minutes of measurement (Li et al., 2012; Neugebauer et al., 2015; Degardin et al., 2017; Zhang et al., 2019). The injection of therapeutic mAb usually contains antibody proteins and pharmaceutical excipients (Singh et al., 2018; Grilo and Mantalaris, 2019). They are both important components of the injection. The characteristic responses of complex components could be obtained as a spectrum using Raman spectroscopy (Wang et al., 2018). The antibody proteins and pharmaceutical excipients equally contribute to the spectrum. Therefore, it is possible to screen therapeutic mAbs utilizing Raman spectroscopy. However, most Raman peaks of therapeutic mAbs have lower intensities than those of chemical drugs. As a result, the therapeutic mAb spectra could not be identified using the previous algorithm, which is suitable for chemical drugs.

For the analysis method, LSTM networks achieve a better classification effect than other methods (Fan et al., 2019). A bidirectional long short-term memory (Bi-LSTM) network is the process of making any neural network have sequence information in both directions backwards (future to past) or forwards (past to future) (Neishabouri et al., 2021). Bidirectional input flows in two directions, making a Bi-LSTM network preserve the future and past information. In a previous study, 1,200 data points of the Raman spectrum were directly input into the LSTM network, which caused 800 epochs of training (Yu et al., 2021). Excessive training epochs will lead to overfitting of the model. Like sample pretreatment in the field of chemical analysis, preprocessing of spectral data plays an important role in data analysis. In the present study, we provide an extreme point sort transformation to convert the characteristic region of the Raman spectrum to a sequence array. The spectral peak positions and relative intensities, two critical spectral characteristics, were involved in the array. This is an efficient transformation that even the Raman analysis under the influence, and only a few effects would be observed at the identification stage.

FCLs and the softmax function are commonly used in classification models. FCLs are added at the end of the sequence model to make the model end-to-end trainable. The FCLs learn a function between the high-level features given as an output from the previous layers. The softmax function transforms any input values into values between 0 and 1 so that they can be interpreted as probabilities. However, inappropriate parameters of FLCs may lead to overfitting of the model, which will enable counterfeit drugs to be identified as genuine. The result is not accepted in the application of counterfeit drug identification. Therefore, the FLC was only employed in the training process for the fitting model. Then, the model extracted the characteristics from sample spectra followed by calculating the similarity score with standard spectra to classify samples with the threshold value. This step reduces the false-positive rate to avoid identifying the counterfeit drug as genuine.

In summary, we propose a novel extreme port sort transformation combined with an LSTM network algorithm to identify 15 therapeutic mAbs with Raman spectra. The extreme port sort transformation converted the characteristic region of the Raman spectrum into a sequence array containing the original spectral peaks and intensity profile. The characteristic array was extracted from the sequence array by the LSTM network model. The final identification was based on the comparison of the characteristic array. The algorithm was successfully applied in therapeutic mAb identification using an independent test dataset. The algorithm was compared with ThermoFisher OMNIC 8.3 build-in spectrum matching algorithms, and the results indicated that the present algorithm provides a better classification performance.

Statements

Data availability statement

The datasets generated for this study can be found here: https://github.com/ttelva2/EPSTLSTM.git.

Author contributions

HS, GC and JL contributed to the conception and design of the study. JL, LZ, MX, XW and DM performed the Raman analysis and the method tests. JL organized the database, did the algorithm research and wrote the draft of the manuscript. GC and HS contributed to the manuscript’s revision.

Funding

This research was financially supported by Shanghai Municipal Drug Administration Research Projects (Grand No. ZX-2021-07) and Shanghai Public Service Platform for Research and Development Projects (Grand No. 19DZ2294600).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary Material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/fchem.2022.887960/full#supplementary-material

References

  • 1

    CastelliM. S.McGonigleP.HornbyP. J. (2019). The Pharmacology and Therapeutic Applications of Monoclonal Antibodies. Pharmacol. Res. Perspect.7, e00535. 10.1002/prp2.535

  • 2

    DégardinK.DespondsA.RoggoY. (2017). Protein-based Medicines Analysis by Raman Spectroscopy for the Detection of Counterfeits. Forensic Sci. Int.278, 313325. 10.1016/j.forsciint.2017.07.012

  • 3

    EttahI.AshtonL. (2018). Engaging with Raman Spectroscopy to Investigate Antibody Aggregation. Antibodies7, 24. 10.3390/antib7030024

  • 4

    EysterT. W.TalwarS.FernandezJ.FosterS.HayesJ.AllenR.et al (2021). Tuning Monoclonal Antibody Galactosylation Using Raman Spectroscopy-Controlled Lactic Acid Feeding. Biotechnol. Prog.37, e3085. 10.1002/btpr.3085

  • 5

    FanX.MingW.ZengH.ZhangZ.LuH. (2019). Deep Learning-Based Component Identification for the Raman Spectra of Mixtures. Analyst144, 17891798. 10.1039/c8an02212g

  • 6

    FroschT.WyrwichE.YanD.DomesC.DomesR.PoppJ.et al (2019). Counterfeit and Substandard Test of the Antimalarial Tablet Riamet by Means of Raman Hyperspectral Multicomponent Analysis. Molecules24, 3229. 10.3390/molecules24183229

  • 7

    FukuharaM.FujiwaraK.MaruyamaY.ItohH. (2019). Feature Visualization of Raman Spectrum Analysis with Deep Convolutional Neural Network. Analytica Chim. Acta1087, 1119. 10.1016/j.aca.2019.08.064

  • 8

    GaleevR. R.SemanovD. A.GaleevaE. V.FalaleevaT. S.AryslanovI. R.SavelievA. A.et al (2019). Peak Window Correlation Method for Drug Screening Using Raman Spectroscopy. J. Pharm. Biomed. Anal.163, 916. 10.1016/j.jpba.2018.09.041

  • 9

    GriloA. L.MantalarisA. (2019). The Increasingly Human and Profitable Monoclonal Antibody Market. Trends Biotechnol.37, 916. 10.1016/j.tibtech.2018.05.014

  • 10

    HattoriY.SekoY.PeerapattanaJ.OtsukaK.SakamotoT.OtsukaM. (2017). Rapid Identification of Oral Solid Dosage Forms of Counterfeit Pharmaceuticals by Discrimination Using Near-Infrared Spectroscopy. Bme29, 114. 10.3233/bme-171708

  • 11

    HsiehC.-H.LiY.-S.HwangB.-J.HsiaoC.-H. (2020). Detection of Atrial Fibrillation Using 1D Convolutional Neural Network. Sensors20, 2136. 10.3390/s20072136

  • 12

    IgneB.TalwarS.FengH.DrennenJ. K.AndersonC. A. (2015). Near-Infrared Spatially Resolved Spectroscopy for Tablet Quality Determination. J. Pharm. Sci.104, 40744081. 10.1002/jps.24618

  • 13

    JonesR. R.HooperD. C.ZhangL.WolversonD.ValevV. K. (2019). Raman Techniques: Fundamentals and Frontiers. Nanoscale Res. Lett.14, 231. 10.1186/s11671-019-3039-2

  • 14

    LanzarottaA.KimaniM. M.ThatcherM. D.LynchJ.FulcherM.WitkowskiM. R.et al (2020). Evaluation of Suspected Counterfeit Pharmaceutical Tablets Declared to Contain Controlled Substances Using Handheld Raman Spectrometers. J. Forensic. Sci.65, 12741279. 10.1111/1556-4029.14287

  • 15

    LawsonL. S.RodriguezJ. D. (2016). Raman Barcode for Counterfeit Drug Product Detection. Anal. Chem.88, 47064713. 10.1021/acs.analchem.5b04636

  • 16

    LegrandP.DembeleO.AlamilH.LamoureuxC.MignetN.HouzéP.et al (2022). Structural Identification and Absolute Quantification of Monoclonal Antibodies in Suspected Counterfeits Using Capillary Electrophoresis and Liquid Chromatography-Tandem Mass Spectrometry. Anal. Bioanal. Chem.414, 26992712. 10.1007/s00216-022-03913-y

  • 17

    LiH.Er SawP.SongE. (2020). Challenges and Strategies for Next-Generation Bispecific Antibody-Based Antitumor Therapeutics. Cell Mol. Immunol.17, 451461. 10.1038/s41423-020-0417-8

  • 18

    LiS.-X.ChenQ.-Y.ZhangY.-J.LiuZ.-M.XiongH.-L.GuoZ.-Y.et al (2012). Detection of Nasopharyngeal Cancer Using Confocal Raman Spectroscopy and Genetic Algorithm Technique. J. Biomed. Opt.17, 125003. 10.1117/1.JBO.17.12.125003

  • 19

    LingJ.LiG.ShaoH.WangH.YinH.ZhouH.et al (2020). Helix Matrix Transformation Combined with Convolutional Neural Network Algorithm for Matrix-Assisted Laser Desorption Ionization-Time of Flight Mass Spectrometry-Based Bacterial Identification. Front. Microbiol.11, 565434. 10.3389/fmicb.2020.565434

  • 20

    LoethenY. L.KauffmanJ. F.BuhseL. F.RodriguezJ. D. (2015). Rapid Screening of Anti-infective Drug Products for Counterfeits Using Raman Spectral Library-Based Correlation Methods. Analyst140, 72257233. 10.1039/c5an01679g

  • 21

    McAvanB. S.BowsherL. A.PowellT.O’HaraJ. F.SpitaliM.GoodacreR.et al (2020). Raman Spectroscopy to Monitor Post-Translational Modifications and Degradation in Monoclonal Antibody Therapeutics. Anal. Chem.92, 1038110389. 10.1021/acs.analchem.0c00627

  • 22

    NeishabouriA.WahlN.MairaniA.KötheU.BangertM. (2021). Long Short‐term Memory Networks for Proton Dose Calculation in Highly Heterogeneous Tissues. Med. Phys.48, 18931908. 10.1002/mp.14658

  • 23

    NeugebauerU.RöschP.PoppJ. (2015). Raman Spectroscopy towards Clinical Application: Drug Monitoring and Pathogen Identification. Int. J. Antimicrob. Agents46 (Suppl. 1), S35S39. 10.1016/j.ijantimicag.2015.10.014

  • 24

    SinghS.KumarN. K.DwiwediP.CharanJ.KaurR.SidhuP.et al (2018). Monoclonal Antibodies: A Review. Ccp13, 8599. 10.2174/1574884712666170809124728

  • 25

    TobiasS.ShapiroA. M.GrantC. J.PatelP.LysyshynM.TiL. (2021). Drug Checking Identifies Counterfeit Alprazolam Tablets. Drug and Alcohol Dependence218, 108300. 10.1016/j.drugalcdep.2020.108300

  • 26

    VelpandianT.NathM.LaxmiM.HalderN. (2016). Finger Printing of Counterfeit Bevacizumab and Identifying it before Clinical Use. Natl. Med. J. India29, 326

  • 27

    WangW.-t.ZhangH.YuanY.GuoY.HeS.-x. (2018). Research Progress of Raman Spectroscopy in Drug Analysis. AAPS PharmSciTech19, 29212928. 10.1208/s12249-018-1135-8

  • 28

    WangX.TianS.YuL.LvX.ZhangZ. (2020). Rapid Screening of Hepatitis B Using Raman Spectroscopy and Long Short-Term Memory Neural Network. Lasers Med. Sci.35, 17911799. 10.1007/s10103-020-03003-4

  • 29

    YilmazD.MehdizadehH.NavarroD.ShehzadA.O'ConnorM.McCormickP. (2020). Application of Raman Spectroscopy in Monoclonal Antibody Producing Continuous Systems for Downstream Process Intensification. Biotechnol. Prog.36, e2947. 10.1002/btpr.2947

  • 30

    YuS.LiX.LuW.LiH.FuY. V.LiuF. (2021). Analysis of Raman Spectra by Using Deep Learning Methods in the Identification of Marine Pathogens. Anal. Chem.93, 1108911098. 10.1021/acs.analchem.1c00431

  • 31

    ZhangC.SpringallJ. S.WangX.BarmanI. (2019). Rapid, Quantitative Determination of Aggregation and Particle Formation for Antibody Drug Conjugate Therapeutics with Label-free Raman Spectroscopy. Analytica Chim. Acta1081, 138145. 10.1016/j.aca.2019.07.007

Summary

Keywords

Raman spectroscopy, long-short term memory network, therapeutic monoclonal antibody, extreme point sort transformation, algorithm study

Citation

Ling J, Zheng L, Xu M, Chen G, Wang X, Mao D and Shao H (2022) Extreme Point Sort Transformation Combined With a Long Short-Term Memory Network Algorithm for the Raman-Based Identification of Therapeutic Monoclonal Antibodies. Front. Chem. 10:887960. doi: 10.3389/fchem.2022.887960

Received

02 March 2022

Accepted

23 March 2022

Published

13 April 2022

Volume

10 - 2022

Edited by

Christopher Mark Smales, University of Kent, United Kingdom

Reviewed by

Zufang Huang, Fujian Normal University, China

Rajapandiyan Panneerselvam, SRM University, India

Updates

Copyright

*Correspondence: Hong Shao,

This article was submitted to Analytical Chemistry, a section of the journal Frontiers in Chemistry

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Outline

Figures

Cite article

Copy to clipboard


Export citation file


Share article

Article metrics