Impact Factor 4.106
2018 JCR, Web of Science Group 2019

Frontiers journals are at the top of citation and impact metrics

Original Research ARTICLE

Front. Plant Sci., 05 January 2017 |

Classification and Identification of Plant Fibrous Material with Different Species Using near Infrared Technique—A New Way to Approach Determining Biomass Properties Accurately within Different Species

Wei Jiang1,2,3, Chengfeng Zhou1,3, Guangting Han1*, Brian Via3,4*, Tammy Swain5, Zhaofei Fan4 and Shaoyang Liu6
  • 1Laboratory of New Fiber Materials and Modern Textile, The Growing Base for State Key Laboratory, Qingdao University, Qingdao, China
  • 2College of Textiles, Qingdao University, Qingdao, China
  • 3Forest Products Development Center, Auburn University, Auburn, AL, USA
  • 4School of Forestry and Wildlife, Auburn University, Auburn, AL, USA
  • 5Institute for Commercial Forestry Research, Scottsville, South Africa
  • 6Department of Chemistry and Physics, Troy University, Troy, AL, USA

Plant fibrous material is a good resource in textile and other industries. Normally, several kinds of plant fibrous materials used in one process are needed to be identified and characterized in advance. It is easy to identify them when they are in raw condition. However, most of the materials are semi products which are ground, rotted or pre-hydrolyzed. To classify these samples which include different species with high accuracy is a big challenge. In this research, both qualitative and quantitative analysis methods were chosen to classify six different species of samples, including softwood, hardwood, bast, and aquatic plant. Soft Independent Modeling of Class Analogy (SIMCA) and partial least squares (PLS) were used. The algorithm to classify different species of samples using PLS was created independently in this research. Results found that the six species can be successfully classified using SIMCA and PLS methods, and these two methods show similar results. The identification rates of kenaf, ramie and pine are 100%, and the identification rates of lotus, eucalyptus and tallow are higher than 94%. It is also found that spectra loadings can help pick up best wavenumber ranges for constructing the NIR model. Inter material distance can show how close between two species. Scores graph is helpful to choose the principal components numbers during the model construction.


Plant fibrous material is one of the most valuable materials because of its renewability, abundance and wide application (Cheng, 2009). It can be used in textile (Costa et al., 2013), paper (Hubbell and Ragauskas, 2010), food (Muangrat et al., 2010), medical (Pomin and Mourão, 2008), composite (Messing and Oppermann, 1979), biofuel (Guazzotti et al., 2003), and other areas. In each area the use of plant fibrous material is not limited to one species. Several species are normally used for one production process to ensure enough resource and yield of the product. However, different species of biomass have various properties. Therefore, identification and determination of the properties of plant fibrous material prior to process is of great significance for industrial utilization to ensure the quality of the final product.

It is easy to identify different plant fibrous materials when they are in raw condition, because they have special color, shape and structure. However, most of the materials before processing are semi products which are ground, rotted or pre-hydrolyzed (Zheng et al., 2001; Cheng, 2009). Under these conditions, the materials from different species can hardly be identified. Traditionally, they are all considered as raw material and process wet chemistry methods was used to characterize their chemical composition as guidance for the following procedure. However, wet chemistry is known to be time consuming, high pollution and complex procedure, which is not encouraged for the future (Jiang et al., 2010).

Even though the classification/identification method on plant fibrous materials have not been studied wildly, near infrared (NIR) is found to be a rapid quantitative determination method on plant fibrous material in recent years (Kelley et al., 2004; Jiang et al., 2014; Zhou et al., 2015). However, most of the NIR researches are focused on one species or several similar species (Yeh et al., 2004; Cozzolino et al., 2006; Jin and Chen, 2007; Xu et al., 2015). The limited number of work including multiple species model construction all had high prediction errors (Table 1) (Ono et al., 2003; Kelley et al., 2004; Yeh et al., 2004; Jin and Chen, 2007; Yao et al., 2010). This indicates that NIR is a good tool to fast evaluate biomass properties on either broad range with high prediction error or small range with more accuracy. A NIR modeling method which can combine broad range of species and prediction accuracy still need to be studied further.


Table 1. A comparison of NIR model prediction of lignin between different species.

Some researchers found that NIR has potential ability to classify/identify samples from different species, although these researches mostly focused on food science (Barbin et al., 2012; Chen et al., 2012; Zhang et al., 2014). It is believed that high classification accuracy is much easier to achieve than quantitative analysis. If the classification model can approach 100% accuracy or close, it is easy to analyze the unknown sample's property by using a two-step prediction method. This method can first identify the species of the unknown sample, and then quantify the sample using the prediction model constructed on the corresponding species. Therefore, the NIR method of classifying/identifying plant fibrous materials is essential and worth to be studied. It is not only to classify unknown samples for pretreatment, but also a big premise for high precise quantitative analysis.

This research tried to construct an accurate classification model using NIR on six different species which were pre-ground. Soft Independent Modeling of Class Analogy (SIMCA) and partial least squares (PLS) were used to build the models, respectively.

Materials and Methods

Sample Preparation

Six species of biomass were used in this research. Southern pine (25 samples) and Tallow (24 samples) samples were harvested in Alabama, USA. Eucalyptus samples (50 samples) were shipped from South Africa. Kenaf (13 samples), Ramie (10 samples) and Lotus (17 samples) samples were collected from Xinjiang Province, Hu Nan Province and Shandong Province, respectively, in China. All the samples were ground to 40 mesh powders, and then air dried under ambient conditions. In this research, 20 southern pine samples, 20 Tallow samples, 35 Eucalyptus samples, 10 Kenaf samples, 8 Ramie samples and 14 Lotus samples were used for constructing the model. All the rest of the samples were used to verify the model accuracy.

The six species belong to three different groups. Pine is a softwood, Eucalyptus and Tallow are hardwoods. Ramie and Kenaf are bast samples. Lotus belongs to aquatic plant. These three big groups with six small species cover most of the bio-based material used in the world. The successful classification of them is very important and significant.

Near Infrared Spectra Collection

The NIR spectra were collected using a PerkinElmer spectrum 400 FT-IR/FT-NIR spectrometer. Biomass powders were analyzed and the reflectance spectra were collected. The spectrum covers a range of 10,000–4000 cm−1 with a spectral resolution of 4 cm−1. Each spectrum is an average of 32 scans.

Classification Method

The classification models were conducted with two different methods. One was Soft Independent Modeling of Class Analogy (SIMCA) method (Gemperline et al., 1989). The other one was partial least squares (PLS) modeling method. Prior to modeling, a spectral pretreatment was performed using multiple scattering correction (MSC) coupled with a first and second derivative with a Savitzky-Golay approach to decrease the noise of the spectra. The pretreatment can significantly reduce the noise including sample color, sample size unevenness and machine noise.

SIMCA is a statistical method for supervised classification of data. The samples in different species can be analyzed using principal components (PC) analysis. This method is used on classification of thermally modified wood in a previous study (Bachle et al., 2012).

PLS is traditionally a quantitative analysis method. In this study, we set up some rules that can use PLS to be applied on classification research. As described in Table 2, the samples that come from different species were assigned to different values (1, 2, 3…n). Then a PLS model was constructed based on these values. If the predicted value of the sample was inside the 0.5 error area (±0.5) of one number, this sample was identified to the relevant species.


Table 2. Algorithm for classify different species samples using PLS.

In this research, the values of the six species were assigned as following: 1: Tallow, 2: Eucalyptus, 3: Pine, 4: Kenaf, 5: Ramie, 6: Lotus (Roughly based on the cellulose content from low to high).


NIR Spectra of All Samples

By reviewing the NIR spectra of the six species in Figure 1, it is found that the six species can be clearly separated to two different groups. The wood samples including Eucalyptus, Tallow and Pine have similar spectra while Lotus, Kenaf, and Ramie hold close patterns, especially in the wavenumber range of 7500–6000 cm−1. This indicates that the wood samples and non-wood samples can be easily separated.


Figure 1. Raw spectra (left) and First derivative spectra (right) of 6 species samples.

SIMCA Classification

An optimized classification model was successfully constructed using SIMCA method. It is found that the model has perfect prediction ability on Kenaf, Lotus, Ramie, Pine, and Eucalyptus (Table 3). They show 100% recognition rate and rejection rate. Tallow has 100% recognition rate while 94% rejection rate, which means the model may identify some other samples to Tallow. The identification results (Table 4) show that most of the samples were successfully identified to the correct species including Tallow. Only one Lotus samples was misidentified to other samples. As described in the previous section, Lotus is the Aquatic plant which differs from wood and bast samples; and moreover, the sample size of Lotus is not large enough. Only 14 Lotus samples were involved for the model construction and three for identification, which causes the Lotus samples not to be identified completely. In the future study, by adding more samples for model construction could help improve the accuracy at lotus species.


Table 3. Classification performance report using SIMCA method.


Table 4. Identification result of SIMCA model.

PLS Classification

Another classification model was successfully constructed using PLS method with optimized parameters. The cross validation report (R2 = 98.49) shows the species have strong relevance with the number that set in previous section. The classification results were calculated based on the method of Table 2. It is found that the classification results (Table 5 and Figure 2) perfectly matched the SIMCA model, in which the Pine, Kenaf, Ramie and Lotus have excellent classification results, while Tallow and Eucalyptus slightly overlap on data.


Table 5. Classification results using PLS (cross validation).


Figure 2. Cross validation results using PLS.


Wavenumber Range Selection for Improving Classification Precision

This section explains how the optimized wavenumber ranges were chosen. Spectra loading plots are the data that were generated from PLS method. They show the most important information that was used in constructing the model. Figure 3 shows the spectra loading plots of PC1–4. It is found that the wavenumbers higher than 9000 cm−1 barely contain any useful information. The best wavenumber ranges were 7500–4000 cm−1 for PC 1; 7800–4000 cm−1 for PC2, PC3, and PC4. It is also found that 9000–7800 cm−1 may contain helpful information from loading plots of PC2 and PC3. Based on the above results, the wavenumber ranges of 7500–4000 cm−1 or (9000–7800)–4000 cm−1 were chosen to construct the model. It was found that the optimized wavenumber ranges are 7500–4000 cm−1 for SIMCA method, and 8500–4000 cm−1 for PLS method, respectively. Figures 4, 5 approve the above optimization. It was found that all the classification and identification performances were significantly improved by using the optimized wavenumber ranges.


Figure 3. Spectra loading plots of PC1–4 using PLS.


Figure 4. Classification results using different wavenumber ranges for SIMCA (left) and PLS (right) model.


Figure 5. Identification results using different wavenumber range for SIMCA model.

Relationship between Species on Classification

The study found that the Eucalyptus and Tallow samples were not perfectly classified in previous results. This section explains why this happens and how to separate them better.

Table 6 gives the inter material distance (IMD) between species using SIMCA method. The IMD shows the relationship between species: when the two species have closer relationship, the IMD will be smaller; and when the two species have big difference, the IMD will be larger. It was found that the IMDs between wood species (Eucalyptus, Tallow and Pine) and Bast species (Kenaf and Ramie) are all higher than 10, which means the wood species and bast species can be separated effortlessly. The IMDs between Lotus and Bast species and those between Lotus and Wood species are 6–10, implying that Lotus samples can be easily separated from other species. The IMD between the bast fibers (Kenaf and Ramie) is 4.69, which is lower than 6. The IMDs are all lower than 6 within wood species, the IMD between Eucalyptus and Pine is 5.29, and the IMD between Tallow and Pine is 3.8, the IMD between Tallow and Eucalyptus is the lowest value of 2.61, which can explain why the Eucalyptus and Tallow samples overlap a little during classification.


Table 6. Inter material distance of SIMCA model.

Figure 6 gives the score values of all the samples for PC1–4 using PLS method. The score values show clearly how close the species are, and give us the idea on which PC we can chose to classify the species better. It was found that only wood samples (Eucalyptus, Tallow, and Pine) and non-wood samples (Kenaf, Ramie and Lotus) can be separated using PC 1. By choosing PC 2, the pine samples were separated from Eucalyptus and Tallow; Kenaf, Ramie and Lotus samples were also separated well. Eucalyptus and Tallow samples started to separate by choosing PC 3. Eucalyptus and Tallow samples were well separated when PC 4 was chosen. However, the other samples were mixed again. When choosing PC 5 (data not shown), it was found that all the samples were mixed. The data above demonstrates that combining PC1–4 are the best for classifying all the samples.


Figure 6. Scores values of PC1–4 using PLS.


The spectra of six different species samples, including Tallow, Eucalyptus, Pine, Ramie, Kenaf and Lotus, were collected and analyzed using NIR classification software (SIMCA). A new algorithm was also created to classify the six species using quantitative analysis method (PLS). Results found that the six species can successfully be classified using SIMCA and PLS methods. These two methods show similar results. The identification rete and rejection rate for all the samples were above 94%. It was also found that spectra loadings, inter material distance and scores graph were helpful for construct the model.

In the future study, with more species added in the model, the NIR model could be able to identify most of the plant fibrous species frequently used in the industry. And combined with a quantitative analysis method on each species, a wildly applicable and high precision rapid prediction system can be established and used in the future.

Author Contributions

GH and BV developed the research hypothesis and the experiment design. WJ, TS, and ZF performed sample preparation, spectra collection and SIMCA analysis. WJ and CZ performed PLS analysis and the manuscript draft. SL revised the English and discussion. The final manuscript is the end product of joint writing efforts of all authors.


This work was supported by the Award Funds for Outstanding Middle-Aged and Young Scientists of the Shandong Province (BS2014CL044), Taishan Scholars Construction Engineering of Shandong Province, and the Program for Scientific Research Innovation Team in the Colleges and Universities of the Shandong Province.

Conflict of Interest Statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


Bachle, H., Zimmer, B., and Wegener, G. (2012). Classification of thermally modified wood by FT-NIR spectroscopy and SIMCA. Wood Sci. Technol. 46, 1181–1192. doi: 10.1007/s00226-012-0481-z

CrossRef Full Text | Google Scholar

Barbin, D., Elmasry, G., Sun, D.-W., and Allen, P. (2012). Near-infrared hyperspectral imaging for grading and classification of pork. Meat Sci. 90, 259–268. doi: 10.1016/j.meatsci.2011.07.011

PubMed Abstract | CrossRef Full Text | Google Scholar

Chen, L., Wang, J., Ye, Z., Zhao, J., Xue, X., Vander Heyden, Y., et al. (2012). Classification of Chinese honeys according to their floral origin by near infrared spectroscopy. Food Chem. 135, 338–342. doi: 10.1016/j.foodchem.2012.02.156

PubMed Abstract | CrossRef Full Text | Google Scholar

Cheng, J. (2009). Biomass to Renewable Energy Processes. Boca Raton, FL: CRC Press.

Google Scholar

Costa, S. M., Mazzola, P. G., Silva, J. C. A. R., Pahl, R., Pessoa, A. Jr., and Costa, S. A. (2013). Use of sugar cane straw as a source of cellulose for textile fiber production. Ind. Crop. Prod. 42, 189–194. doi: 10.1016/j.indcrop.2012.05.028

CrossRef Full Text | Google Scholar

Cozzolino, D., Fassio, A., Fernández, E., Restaino, E., and Manna, A. L. (2006). Measurement of chemical composition in wet whole maize silage by visible and near infrared reflectance spectroscopy. Anim. Feed Sci. Technol. 129, 329–336. doi: 10.1016/j.anifeedsci.2006.01.025

CrossRef Full Text | Google Scholar

Gemperline, P. J., Webber, L. D., and Cox, F. O. (1989). Raw materials testing using soft independent modeling of class analogy analysis of near-infrared reflectance spectra. Anal. Chem. 61, 138–144. doi: 10.1021/ac00177a012

CrossRef Full Text | Google Scholar

Guazzotti, S. A., Suess, D. T., Coffee, K. R., Quinn, P. K., Bates, T. S., Wisthaler, A., et al. (2003). Characterization of carbonaceous aerosols outflow from India and Arabia: biomass/biofuel burning and fossil fuel combustion. J. Geophys. Res. Atmosphere. 108:4485. doi: 10.1029/2002JD003277

CrossRef Full Text | Google Scholar

Hubbell, C. A., and Ragauskas, A. J. (2010). Effect of acid-chlorite delignification on cellulose degree of polymerization. Bioresour. Technol. 101, 7410–7415. doi: 10.1016/j.biortech.2010.04.029

PubMed Abstract | CrossRef Full Text | Google Scholar

Jiang, W., Han, G., Via, B., Tu, M., Liu, W., and Fasina, O. (2014). Rapid assessment of coniferous biomass lignin–carbohydrates with near-infrared spectroscopy. Wood Sci. Technol. 48, 109–122. doi: 10.1007/s00226-013-0590-3

CrossRef Full Text | Google Scholar

Jiang, W., Han, G., Zhang, Y., and Wang, M. (2010). Fast compositional analysis of ramie using near-infrared spectroscopy. Carbohydr. Polym. 81, 937–941. doi: 10.1016/j.carbpol.2010.04.009

CrossRef Full Text | Google Scholar

Jin, S., and Chen, H. (2007). Near-infrared analysis of the chemical composition of rice straw. Ind. Crop. Prod. 26, 207–211. doi: 10.1016/j.indcrop.2007.03.004

CrossRef Full Text | Google Scholar

Kelley, S. S., Rowell, M., Davis, M., Jurich, K., and Ibach, R. (2004). Rapid analysis of the chemical composition of agricultural fibers using near infrared spectroscopy and pyrolysis molecular beam mass spectrometry. Biomass Bioenergy 27, 77–78. doi: 10.1016/j.biombioe.2003.11.005

CrossRef Full Text | Google Scholar

Messing, R. A., and Oppermann, R. A. (1979). High Surface Low Volume Biomass Composite. Washington, DC: Google Patents.

Google Scholar

Muangrat, R., Onwudili, J. A., and Williams, P. T. (2010). Alkali-promoted hydrothermal gasification of biomass food processing waste: a parametric study. Int. J. Hydrogen Energy 35, 7405–7415. doi: 10.1016/j.ijhydene.2010.04.179

CrossRef Full Text | Google Scholar

Ono, K., Hiraide, M., and Amari, M. (2003). Determination of lignin, holocellulose, and organic solvent extractives in fresh leaf, litterfall, and organic material on forest floor using near-infrared reflectance spectroscopy. J. For. Res. 8, 191–198. doi: 10.1007/s10310-003-0026-2

CrossRef Full Text | Google Scholar

Pomin, V. H., and Mourão, P. A. (2008). Structure, biology, evolution, and medical importance of sulfated fucans and galactans. Glycobiology 18, 1016–1027. doi: 10.1093/glycob/cwn085

PubMed Abstract | CrossRef Full Text | Google Scholar

Xu, F., Zhou, L., Zhang, K., Yu, J., and Wang, D. (2015). Rapid determination of both structural polysaccharides and soluble sugars in sorghum biomass using near-infrared spectroscopy. BioEnergy Res. 8, 130–136. doi: 10.1007/s12155-014-9511-z

CrossRef Full Text | Google Scholar

Yao, S., Wu, G., Xing, M., Zhou, S., and Pu, J. (2010). Determination of lignin content in Acacia spp. using near-infrared reflectance spectroscopy. BioResources 5, 556–562.

Google Scholar

Yeh, T.-F., Chang, H.-m., and Kadla, J. F. (2004). Rapid prediction of solid wood lignin content using transmittance near-infrared spectroscopy. J. Agric. Food Chem. 52, 1435–1439. doi: 10.1021/jf034874r

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhang, L.-G., Zhang, X., Ni, L.-J., Xue, Z.-B., Gu, X., and Huang, S.-X. (2014). Rapid identification of adulterated cow milk by non-linear pattern recognition methods based on near infrared spectroscopy. Food Chem. 145, 342–348. doi: 10.1016/j.foodchem.2013.08.064

PubMed Abstract | CrossRef Full Text | Google Scholar

Zheng, L., Du, Y., and Zhang, J. (2001). Degumming of ramie fibers by alkalophilic bacteria and their polysaccharide-degrading enzymes. Bioresour. Technol. 78, 89–94. doi: 10.1016/S0960-8524(00)00154-1

PubMed Abstract | CrossRef Full Text | Google Scholar

Zhou, C., Jiang, W., Via, B. K., Fasina, O., and Han, G. (2015). Prediction of mixed hardwood lignin and carbohydrate content using ATR-FTIR and FT-NIR. Carbohydr. Polym. 121, 336–341. doi: 10.1016/j.carbpol.2014.11.062

PubMed Abstract | CrossRef Full Text | Google Scholar

Keywords: accurate, classification, fibrous material, identification, quantitative analysis, near infrared

Citation: Jiang W, Zhou C, Han G, Via B, Swain T, Fan Z and Liu S (2017) Classification and Identification of Plant Fibrous Material with Different Species Using near Infrared Technique—A New Way to Approach Determining Biomass Properties Accurately within Different Species. Front. Plant Sci. 7:2000. doi: 10.3389/fpls.2016.02000

Received: 16 November 2016; Accepted: 16 December 2016;
Published: 05 January 2017.

Edited by:

Aude Tixier, UC Davis, USA

Reviewed by:

Robert Henry, University of Queensland, Australia
Chenhuan Lai, Nanjing Forestry University, China

Copyright © 2017 Jiang, Zhou, Han, Via, Swain, Fan and Liu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Guangting Han,
Brian Via,