Discovery of the Linear Region of Near Infrared Diffuse Reflectance Spectra Using the Kubelka-Munk Theory

Particle size is of great importance for the quantitative model of the NIR diffuse reflectance. In this paper, the effect of sample particle size on the measurement of harpagoside in Radix Scrophulariae powder by near infrared diffuse (NIR) reflectance spectroscopy was explored. High-performance liquid chromatography (HPLC) was employed as a reference method to construct the quantitative particle size model. Several spectral preprocessing methods were compared, and particle size models obtained by different preprocessing methods for establishing the partial least-squares (PLS) models of harpagoside. Data showed that the particle size distribution of 125–150 μm for Radix Scrophulariae exhibited the best prediction ability with Rpre2 = 0.9513, RMSEP = 0.1029 mg·g−1, and RPD = 4.78. For the hybrid granularity calibration model, the particle size distribution of 90–180 μm exhibited the best prediction ability with Rpre2 = 0.8919, RMSEP = 0.1632 mg·g−1, and RPD = 3.09. Furthermore, the Kubelka-Munk theory was used to relate the absorption coefficient k (concentration-dependent) and scatter coefficient s (particle size-dependent). The scatter coefficient s was calculated based on the Kubelka-Munk theory to study the changes of s after being mathematically preprocessed. A linear relationship was observed between k/s and absorption A within a certain range and the value for k/s was >4. According to this relationship, the model was more accurately constructed with the particle size distribution of 90–180 μm when s was kept constant or in a small linear region. This region provided a good reference for the linear modeling of diffuse reflectance spectroscopy. To establish a diffuse reflectance NIR model, further accurate assessment should be obtained in advance for a precise linear model.

Particle size is of great importance for the quantitative model of the NIR diffuse reflectance. In this paper, the effect of sample particle size on the measurement of harpagoside in Radix Scrophulariae powder by near infrared diffuse (NIR) reflectance spectroscopy was explored. High-performance liquid chromatography (HPLC) was employed as a reference method to construct the quantitative particle size model. Several spectral preprocessing methods were compared, and particle size models obtained by different preprocessing methods for establishing the partial least-squares (PLS) models of harpagoside. Data showed that the particle size distribution of 125-150 µm for Radix Scrophulariae exhibited the best prediction ability with R 2 pre = 0.9513, RMSEP = 0.1029 mg·g −1 , and RPD = 4.78. For the hybrid granularity calibration model, the particle size distribution of 90-180 µm exhibited the best prediction ability with R 2 pre = 0.8919, RMSEP = 0.1632 mg·g −1 , and RPD = 3.09. Furthermore, the Kubelka-Munk theory was used to relate the absorption coefficient k (concentration-dependent) and scatter coefficient s (particle size-dependent). The scatter coefficient s was calculated based on the Kubelka-Munk theory to study the changes of s after being mathematically preprocessed. A linear relationship was observed between k/s and absorption A within a certain range and the value for k/s was >4. According to this relationship, the model was more accurately constructed with the particle size distribution of 90-180 µm when s was kept constant or in a small linear region. This region provided a good reference for the linear modeling of diffuse reflectance spectroscopy. To establish a diffuse reflectance NIR model, further accurate assessment should be obtained in advance for a precise linear model.

INTRODUCTION
The implementation of process analytical technology (PAT) in the pharmaceutical industry is intended to enhance the quality of products through the measurement of critical quality and performance parameters (Roggo and Ulmschneider, 2008). Near infrared spectroscopy (NIRS) is regarded as a vital tool for the implementation of PAT, as it is increasingly used in pharmaceutical research and development due to its high analysis speed, low-cost, and non-destructive characteristics (De Beer et al., 2011). NIR spectra of chemical species (consisting of C-H, N-H, O-H, and S-H bonds; Sarraguça et al., 2011) can be used to predict their chemical and physical properties (Prieto et al., 2009).
The NIR technology includes two main parts that are transmission spectroscopy and diffuse reflectance spectroscopy. The selection of spectral form is mainly based on the state of samples (i.e., transmission spectroscopy is suitable for liquid samples such as herbal extracts and liquid preparations, while diffuse reflectance spectroscopy is generally used for solid samples such as pharmaceutical powders or granules). Diffuse reflectance spectroscopy is an analytical technique that measures the diffuse reflection of different wavelengths of light to obtain the surface information of the materials.
Various physical, chemical, and biochemical properties in Mediterranean soils were NIR predicted (Zornoza et al., 2008). Chen et al. employed an NIR model for the analysis of total polyphenol content in green tea (Chen Q. et al., 2008). Classification accuracy of about 100 % was obtained by discriminant and classification tree analyses of 82 honey samples by diffuse reflectance mid-infrared Fourier transform spectroscopy (DRIFTS) (Bertelli et al., 2007). Borin et al. utilized NIR technology for the simultaneous quantification of some common adulterants (starch, whey, or sucrose) found in milk powder samples (Borin et al., 2006). All these investigations have illustrated the trend of using NIR technology to predict physical and chemical information.
Recently, the application of NIR in studying Chinese herbal medicine (CHM) has dramatically increased such as discrimination analysis and quality control for various samples e.g., raw materials, excipients, and dosage forms. Wu et al. used the NIR and different PLS models to quantify the baicalin contents of Yinhuang oral solution based on a total error concept (Wu et al., 2013). Chen et al. employed NIR to distinguish Ganoderma lucidum samples collected from different geographical origins using principal component analysis (PCA) and discriminant analysis algorithms (Chen Y. et al., 2008). On the other hand, it is well known that the particle size of sample affects NIR spectra. Several studies have been published on the effect of particle size on the determination of drug content in mixed powder products (Norris and Williams, 1984;Aucott and Garthwaite, 1988;Bull, 1991). Franke et al. (1998) reported the particle size determination of lactose using chemometricsbased NIR spectra. However, they did not mention any basic principle to determine particle size in the experiments. Paskatan et al. (2001) reviewed theoretical and practical particle size analysis of powder by NIR spectroscopy. But they did not show the relationship between the basic light scattering principle and the particle size of main contents.
Kubelka-Munk theory (Otsuka, 2004) is the basic quantitative theory of NIRS. The particle size of sample affects the light scattering, directly influencing model construction. It was shown that an accurate knowledge of the particles is crucial in the product development (Blanco and Peguero, 2008). Meanwhile, the differences in CHM particle size could result in different optical path lengths and multiplicative light scattering effects (Jin et al., 2012). Thus, it is important to establish an expeditious method to determine the particle size of CHM.
However, there were a few NIR studies on the simultaneous determination of particle size and active pharmaceutical ingredients of CHM. Wu Z. S. et al. demonstrated that the particle size affected NIR measurement of saikosaponin A in Bupleurum chinense DC . Bittner et al. employed a successful application of NIR spectroscopy in combination with multivariate data analysis (MVA) for the simultaneous identification and particle size determination of amoxicillin trihydrate particles (Bittner et al., 2011).
Scrophularia radix (Xuanshen), the root of Scrophularia ningpoensis Hemsl., was a typical CHM with a history going back over 1000 years (The State Pharmacopoeia Commission of People's Republic of China, 2015). It is originally from Zhejiang province and it is a component of the natural herbal supplement named "Zhe Ba Wei." The major ingredients of Scrophularia radix are iridoids, and harpagoside is one of the main bioactive components with antioxidant, antimicrobial and antitumor activities (Miyazawa and Okuno, 2003;Jing et al., 2011).
In this study, Scrophularia radix was taken as an example and harpagoside was regarded as an API of Scrophularia radix. HPLC was used as a reference method to determine the harpagoside content. NIR was used to monitor the prediction potential of the models of single particle size and mix particle size simultaneously. To our best knowledge, this paper is the first to study on particle size and harpagoside determination in Scrophularia radix with NIR diffuse reflectance spectroscopy. The differences between single particle size model and mix particle size model from the perspective of the Kubelka-Munk theory were explained.

Preparation of Samples
Scrophularia radix samples were crushed into pieces by a disintegrator after brushing off soil dust from the surface. Thirty samples of Scrophularia radix were then pulverized with a blender and screened through a 10-mesh sieve. Finally, the powders were divided into four parts. One part was used for HPLC determination of the harpagoside content. The remaining parts were then smashed and screened through 24-, 50-, 65-, 80-, 100-, 120-, and 150-mesh sieves. An amount of each sieved sample of Scrophularia radix powder (1 g) was accurately weighed and placed in a 100-mL Erlenmeyer flask. The sample was extracted with 50 mL of 50% ethanol under ultrasonic vibration (40 kHZ, 220 V) for 45 min. After cooling to room temperature, the solution was filtered through a 0.45-µm membrane filter for HPLC analysis.

NIR Equipment and Measurement
The NIR spectra were recorded by a XDS Rapid Content Analyser and VISION software (Metrohm NIR Systems, Florida, USA).  The wavelength range for the spectra was 780-2,500 nm. Each spectrum was an average of 64 scans with air as the background, and the wavelength increment was of 0.5 nm. Unless stated otherwise, each sample was measured in triplicate and its mean value was used in the subsequent analysis.

HPLC Method
A certain amount of harpagoside standard was accurately weighed with an XS205DU electronic balance (Mettler Toledo, Greifensee, Switzerland) and then dissolved in 100 mL of methanol to obtain the concentration of 0.02432 mg·mL −1 .
HPLC analysis of Scrophularia radix (according to Chinese Pharmacopoeia, 2010 ed) was carried out using a Waters 2695 HPLC system, Waters 2996 DAD detector and auto-sampler (Waters Technologies, Palo Alto, CA). Ten microliters aliquots of the sample solutions were chromatographically analyzed in gradient elution mode on an octadecylsilyl column [250 × 4.6 mm, 5 µm (Dikma, China)] with the mobile phase consisting  Table 1). The column temperature was kept at 30 • C and the detection wavelength set at 280 nm. This chromatographic method exhibited good linearity (Y = 3 × 10 6 X−104747, R 2 = 0.9998) over the concentration range 0.04864-0.02432 mg·mL −1 .

Software
Data analysis was performed by the Unscrambler version 9.6 software package (CAMO Software AS, Oslo, Norway) and home-made routines programmed in MATLAB code (MATLAB v7.0, Math Works, Natick, MA). Following the Kennard-Stone algorithm, 210 samples were divided into 140 calibration samples and 70 validation samples. The root mean square error of calibration (RMSEC), root mean square error of crossvalidation (RMSECV), root mean square error of prediction (RMSEP) and corresponding R 2 were used to evaluate the PLS model.
In order to establish a robust harpagoside model, a number of preprocessing methods were selected. For instance, multiplicative scatter correction (MSC) and standard normal variate (SNV) were used to eliminate redundant effects of #The original spectra without any pretreatment. *The best preprocessing methods using in each different single particle size.
Frontiers in Chemistry | www.frontiersin.org particle size. Derivative methods including first derivative (1D) and second derivative (2D) were obtained to reduce baseline variations observed in original diffuse reflectance spectra and to enhance spectral features. Meanwhile, a ninepoint Savitzky-Golay smoothing filter (SG) was employed to depress the background noise amplified by the derivative. For the particle size model, MSC, SNV, and second derivative were not appropriate for an effect to be modeled, so 1D + SG, normalization and baseline subtraction were used. Leaveone-out cross-validation was used to validate the validity of methods. The lowest predicted residual sum of squares (PRESS) value was used to determine the optimum latent variables.

Quantitative Models of NIR Diffuse Reflectance Using the Kubelka-Munk Theory
Kubelka-Munk theory is the theoretical basis for the establishment of quantitative models of NIR diffuse reflectance and its function is as follows (Otsuka, 2004): According to the Kubelka-Munk function, reflectance is inversely to proportional to the light-scattering coefficient (s), and the s value is inversely proportional to particle size.
FIGURE 5 | The relation map of the reference value and predicted value using each different particle size.
Frontiers in Chemistry | www.frontiersin.org The absorbance of NIR diffuse reflectance is expressed by the Kubelka-Munk equation:

Spectral Characteristics of NIR Diffuse Reflectance Spectra of Different Particle Size Samples
The representative raw spectra of Scrophularia radix with different particle sizes are shown in Figure 1 i.e., the spectral profiles were similar in shape. However, the main influences of particle size variation on diffuse reflectance spectra was the baseline offset. The well-known phenomenon that larger particles showed a stronger absorption, illustrates that the particle size is vital to the response. Some weak absorption peaks were demonstrated in the second overtone region (SCOT, 1,000-1,400 nm) of the fundamental C-H stretching bands, while much fluctuations in the region of first combination-overtone (FCOT, 1,400-2,040 cm −1 ) and combination region (CR, 2,040-2,500 nm) were observed. Those absorption peaks might be caused by the diffuse reflectance on different particle sizes.

HPLC Determination of Harpagoside Content in Scrophularia Radix
The HPLC chromatograms of the representative sample and standard are shown in Figure 2. The retention time of harpagoside in a sample extract was the same as that for the standard solution. Figure 3 shows the harpagoside concentration of 30 samples. There is a significant difference in harpagoside concentration of samples of different particle sizes. The biggest difference of the particle sizes was located in the range of 180-250 µm, but the overall concentration design was suitable for the modeling. #The original spectra without any pretreatment. *The best preprocessing method for different mix particle size models.

PLS Models for NIR Diffuse Reflectance Data Using Scrophularia Radix of Each Single Particle Size
Based on different preprocessing methods, the PLS model for each particle size was constructed. Figure 4 showed the relationship between the latent variables and PRESS for different preprocessing methods. In general, the lowest PRESS value means the best latent variables (Pan et al., 2015). The model was validated for prediction by internal sample set. Moreover, the model performance values for each particle size using different preprocessing methods are illustrated in Table 2. Data showed that the raw spectra were the best to construct the particle size model of 355-850 µm and <90 µm. While the best preprocessing method for the particle size model of 250-355 µm, 180-250 µm, 150-180 µm, 125-150 µm, and 90-150 µm was EMSC, SG9, SG9, SNV, and MSC, respectively.
In addition, the model evaluation parameters, i.e., RMSEC, RMSECV, RMSEP, and RPD, for the particle size of 355-850 µm was 0.0576, 0.1642, 0.2094, and 2.02, respectively. The parameter values of other particle sizes are summarized in Table 2. The relation map between predicted value and reference value is shown in Figure 5, indicating that the best prediction result was for the particle size of 125-150 µm. Therefore, it could be known that the NIR model was influenced by different particle sizes and its quantitative characteristics was explored according to different particle sizes.

PLS Models for NIR Diffuse Reflectance Data Using Scrophularia Radix of Mix Particle Size
The comparison of model performance for different types of mix particle size (i.e., seven, six, five, four, and three types of particle size) manifests that the mix particle size model was best FIGURE 6 | The PRESS values of different preprocessing methods for mix particle size model. Frontiers in Chemistry | www.frontiersin.org constructed for 3-type mix particle size ( Table 3). Preprocessing methods were also various, such as MSC, SNV, EMSC and SG9. It can be seen from Figure 6, the optimum preprocessing method for the mixed particle size of 180-850 µm, 150-355 µm, 125-250 µm, 90-180 µm, and 0-150 µm was SG9, untreated original spectra, EMSC, EMSC, and MSC, respectively, as this model has the lowest PRESS value.
The best prediction from the mix particle size model was for 90-180 µm with RPD value >3 ( Table 3). The RPD values of other mix particle size models were also about 2, meaning that the model performance of the mix particle size models was similar. This result further revealed that particle size was vital to quantitative model performance of diffuse reflectance spectra using NIR sensor. In order to make the relationship clearer, a detailed comparison of the model of the single particle size and mixed particle size was summarized.

Comparison of the Model Performance for Single Particle Size and Mix Particle Size
It can be concluded from the comparison between the single particle size and mixed particle size models that the RPD value of the former was better than the latter. Although the prediction results were good in the prediction performance in a certain particle size range by using a single particle size model, the prediction results of single particle size model were not stable. Most of the applications of NIR diffuse reflectance spectra were for a relatively broad range of particle sizes. As a result, a mix particle size calibration model was used for prediction in subsequent studies. Moreover, the mix particle size correction model was also used to predict the validation set for each particle size for examining which particle size samples could be more accurately predicted as well as achieving the guideline for subsequent sample preparation. The model for particle size of 90-180 µm was selected to predict the particle size of 150-180 µm, 125-150 µm, and 90-125 µm and the best preprocessing method is MSC (Table 4) and RPD values of the three prediction models are 3.81, 5.78, and 2.81 ( Table 5).
On the other hand, the RPD values of the models of single particle size were 3.40, 4.78, and 2.52. Compared with the single particle size model, the RPD value of the mix particle size model was better illustrating that the prediction of the mix particle size correction model was more accurate ( Table 5). The relation map between the reference and validation sets was shown in Figure 7. The correlation between reference and prediction values was good, which further demonstrated that the mix particle size model was better than the single particle size model. Why particle size was of great importance to the quantitative model of the NIR diffuse reflectance? It was performed by the Kubelka-Munk theory, which is a critical theory in the NIR diffuse reflectance.

Discovery of the Linear Region of NIR Diffuse Reflectance Spectra Using the Kubelka-Munk Theory
In practice, NIR diffuse reflectance is usually used for solid particle determination and its quantitative evidence is based on the Kubelka-Munk theory (Figure 8).
It can be learnt from the equation that the absorbance had relationship with the k/s value. A linear relationship was discovered between k/s value and A within a certain range. #The original spectra without any pretreatment. *The best prediction model for the single particle size by using the mix particle size model. As illustrated in Figure 9, the value for k/s was >4 obviously indicating that a linear region existed. This results also explained and guided the modeling performance of NIR diffuse reflectance. It was found that such a linear region provides a reference for the linear modeling of diffuse reflectance spectra. It is important to note that the linear region is beneficial for establishing a NIR diffuse reflectance model. According to our data, when the scatter coefficient s does not change, the absorption coefficient k is proportional to the sample concentration. In this study, the quantitative models for single particle size and mix particle size were both constructed to minimize the limitation that the particle size of samples was only available in a certain FIGURE 7 | The relation map of the calibration particle size models. range. The model of single particle size was better than the mix particle size owing to a small change in the scattering coefficient s.

CONCLUSIONS
Particle size is of great importance to the quantitative model of the NIR diffuse reflectance. In this study, the single particle size and mix particle size models of Radix Scrophulariae were constructed using PLS methods. For the single particle size model, it was obvious that the best prediction model was for  Frontiers in Chemistry | www.frontiersin.org the particle size distribution of 125-150 µm. This particle size distribution illustrated that small particle size was beneficial to construct the quantitative model of harpagoside in Radix Scrophulariae.
For the mix particle size model, a better prediction was obtained for the particle size distribution of 90-180 µm indicating that the mix particle size model could explain more variation in the sample, and the accuracy and robustness of the mix particle size model would be improved. Meanwhile, the quantitative evidence of NIR diffuse reflectance of different particle sizes was based on the Kubelka-Munk theory. A linear relationship was discovered between k/s value and A within a certain range. Data showed that a narrow range of the scatter coefficients s resulted in a better model. Besides, the value for k/s was >4 clearly indicating that a linear region exited. This linear region helped explain and guide the modeling performance of NIR diffuse reflectance data. Finding such a linear region provided a methodological reference for the linear modeling of NIR diffuse reflectance spectra. Thus, further accurate assessment should be obtained in advance for a precise linear model.
Our study also showed that the quantitative analysis of CHM samples was more accurate when the scattering coefficient s remains unchanged or differs insignificantly at theoretical level.

AUTHOR CONTRIBUTIONS
ZW and YQ: conceived the research; XP: performed the experiment; SD: wrote the manuscript; CD, LM, and XH: analyzed the data. All the authors prepared the manuscript and discussed the results.