Hyperspectral remote sensing for tobacco quality estimation, yield prediction, and stress detection: A review of applications and methods

Tobacco is an important economic crop and the main raw material of cigarette products. Nowadays, with the increasing consumer demand for high-quality cigarettes, the requirements for their main raw materials are also varying. In general, tobacco quality is primarily determined by the exterior quality, inherent quality, chemical compositions, and physical properties. All these aspects are formed during the growing season and are vulnerable to many environmental factors, such as climate, geography, irrigation, fertilization, diseases and pests, etc. Therefore, there is a great demand for tobacco growth monitoring and near real-time quality evaluation. Herein, hyperspectral remote sensing (HRS) is increasingly being considered as a cost-effective alternative to traditional destructive field sampling methods and laboratory trials to determine various agronomic parameters of tobacco with the assistance of diverse hyperspectral vegetation indices and machine learning algorithms. In light of this, we conduct a comprehensive review of the HRS applications in tobacco production management. In this review, we briefly sketch the principles of HRS and commonly used data acquisition system platforms. We detail the specific applications and methodologies for tobacco quality estimation, yield prediction, and stress detection. Finally, we discuss the major challenges and future opportunities for potential application prospects. We hope that this review could provide interested researchers, practitioners, or readers with a basic understanding of current HRS applications in tobacco production management, and give some guidelines for practical works.


Introduction
As the primary raw material for a variety of cigarette products, tobacco is one of the most important economic crops, both in China and around the world. China grows nearly one-third of the world's tobacco crop (Hu et al., 2010). The relevant industries provide the governments with substantial fiscal revenue. The enormous economic benefits are inextricably linked to the meticulous field management of countless practitioners. However, in recent years, the tobacco industry begins to face bottlenecks in development. On the one hand, with the gradual increase in awareness of tobacco risks, people's attitudes toward tobacco consumption have changed. Tobacco products are no longer seen as ordinary commodities, but as harmful ones. A number of consumers are seeking high-quality, less harmful products. On the other hand, as shown in Figure 1, from 2013 to 2019, the number of tobacco farmers declines from 1.84 million to 0.92 million, a reduction of nearly 50% 1 , leading to unstable yields and unsustainable development (Jia Kang, 2020). The reasons are mainly due to the tobacco planting is a labor-intensive industry with high labor intensity and needs to purchase roasting facilities. The cost benefit ratio of tobacco is lower than other crops (e.g., soybean, corn, and peanuts). Moreover, current tobacco field management methods still largely rely on the experience of tobacco farmers. Take fertilization as an example: when to fertilize, which areas to fertilize, and how much to fertilize are all determined by farmers' observation. The advantages of this empirical method are simple and fast. Because it doesn't require any assistance from the instruments. However, it requires farmers or practitioners to be able to make a rapid and accurate diagnosis of tobacco growth and quality, which is not easy in practice. This qualitative approach not only doesn't reduce the field management costs, but also affects the accurate assessment of tobacco growth status and quality. Besides the empirical method, laboratory testing is another commonly used tobacco quality diagnosis method (Deng et al., 2020). It can measure various agronomic parameters quantitatively such as leaf nitrogen, chlorophyll, water, and nicotine content. This approach can accurately obtain information of various components in tobacco leaves. However, it costs more detection time and expenses, requires considerable professional knowledge, which is rarely used in the actual production (Deng et al., 2020).
In light of this, a concept of "precision tobacco agriculture (PTA)" was born. It inherited from the concept of precision agriculture (PA) or site-specific management (SSM) (Bachmaier and Gandorfer, 2009;Carrow et al., 2010). In our view, PTA is a cost-effective method to address above bottlenecks and to achieve sustainable development (Chang et al., 2014), and the specific applications of PTA should including growth and quality estimation, yield prediction, and stress detection (e.g., diseases, pests, or heavy metals). In China, the first work on tobacco management zones was carried out in 2009. And the results showed that soil nutrients were similar within the management zones, which provided an information basis for SSM in tobacco fields (Xin- Zhong et al., 2009). In this case, the field information acquisition was the first step in PTA applications. However, the traditional destructive field sampling methods and laboratory measurement are generally labor-intensive and time-consuming. Therefore, there is a great demand for a method that can accurately and quickly obtain the field information on tobacco growth and quality during the growing seasons.
Fortunately, HRS technology, with its contactless observation, high spectral resolution, and flexibility, is gradually becoming recognized as a suitable alternative to traditional field sampling methods to obtain crop information (Park and Lu, 2015;Ma et al., 2022). In the field of agricultural, the most important ability of HRS is that it can obtain sufficient hyperspectral reflectance data of crops with a non-destructive mean, and with the assistance of various regression modeling algorithms, the relationship between reflectance data and various crop agronomic traits (e.g., leaf nitrogen, chlorophyll, water content, etc.) can be inferred quantitatively (Weiss et al., 2020;Jiang et al., 2022). This process is known as "spectral inversion". Furthermore, with the development of UAV system platforms and lightweight hyperspectral imaging sensors, the inversion missions of those large-scale or scattered farmlands will become easier and faster (Johansen et al., 2019;Liao et al., 2020). The UAV-borne HRS has demonstrated a bright application prospect (Aasen et al., 2015;Zhong et al., 2018). As for HRS applications for PTA applications, it has also made great improvements (Long et al., 2019;Zhu et al., 2020). According to our survey, studies on tobacco are increasing yearly. Figure 2 shows the number of related publications from 2010 to 2022. The data are from the "Web of Science" (https://www.webofscience.com) website with the topics "tobacco" and "hyperspectral". We can find a gradual increase in the number of publications on tobacco. However, a large portion of them are patents. There are not many research articles. This is also one of the reasons why we drafted this review. We hope that interested researchers gain some insight into the latest advances in scientific research of HRS for PTA applications from our collection and summary.
In this review, we comprehensively retrieved the research of HRS in PTA to provide readers or researchers with an enhanced perceptiveness. The cited references mainly come from the Web of Science, IEEE Xplore (https://ieeexplore.ieee.org), and Google Scholar (https://scholar.google.com) websites. A few form the CNKI 2 (https://www.cnki.net) website. The literature types include academic journals, international conferences, professional books, and dissertations. Retrieval keywords consist of "hyperspectral remote sensing", "agricultural remote sensing", "tobacco remote sensing", "UAV hyperspectral & tobacco", and the combinations of them. To ensure the timeliness of references, we tend to adopt literatures with relatively recent publication dates. Thus, references cited in this review were mainly published from 2010 to 2022. Furthermore, according to our retrieval results, the existing reviews related HRS and agriculture applications cover various aspects: UAVborne HRS (Xiang et al., 2019), hyperspectral imaging technologies (Adão et al., 2017;Mahlein et al., 2018), precision agricultural applications (Beǵuéet al., 2018;Latif, 2018), leaf area index (LAI) (Ke et al., 2016), crop yield prediction and nitrogen status assessment (Chlingaryan et al., 2018;Fu et al., 2021), wheat grain protein (Ma et al., 2022), etc. The rest of this article is organized as follows: in section 2, we introduce the principles and workflow of HRS applications for PTA; in section 3, we compare three commonly used hyperspectral data acquisition system platforms; the details of specific applications and methodologies are presented in section 4; the discussion of issues and recommendations is arranged in section 5; the conclusion is in section 6. We hope that new readers and researchers will have a holistic view according to our presentation.
2 Principles and workflow of HRS for PTA Minerals on earth usually have unique diagnostic spectrum reflectance signatures (Vane and Goetz, 1993). Green plants, or plant ecosystems, are composed of the same compounds, which also have numerous unique diagnostic absorption features in the solar reflected spectrum from 400 to 2500nm. To give readers a visualized understanding of this unique feature. Figure 3 shows a typical reflectance curve of tobacco leaves containing several absorption and reflection features (400-1000 nm) caused by various biological parameters such as chlorophyll, water, and protein. This characteristic allows us to determine the physical, chemical, and biological compositions of plants with the help of remote sensing technologies, which are built on spectral radiometry theory (Borengasser et al., 2007).
As the frontier technology of the current remote sensing field, hyperspectral imaging technologies can obtain sufficient spectra information of ground objects from each pixel in an image of a scene. Hyperspectral reflectance data also have been verified to be more efficient in crop phenotypic traits estimation Angel and McCabe, 2022), as well as target classification and precision agriculture (Teke et al., 2013;Zhao et al., 2018). Compared to traditional multispectral remote sensing (MRS), the main differences between HRS and MRS include two aspects. The first is HRS imaging sensors can obtain image data in several hundred narrow and contiguous spectral bands, while the MRS sensors can only measure image data in a few wide and discrete spectral bands. As shown in Figure 4, the wavelength range is from 400 to 1000 nm. The MRS has four discrete bands, usually including red, green, blue, and near-infrared bands, whereas the HRS has 100 contiguous bands. The second is HRS data can be used to extract the spectral features of most natural materials, which MRS data cannot do. HRS images contain much more spectral information than MRS. So, HRS has a greater potential for detecting differences among materials on the earth's 120 surface (Pu, 2017). Figure 5 shows the general workflow of HRS applications for tobacco quality estimation, yield prediction, and disaster level assessment. The first is UAV-borne hyperspectral image data acquisition. The complete improved image can be obtained by stitching and alignment the original images. Radiometric correction and geometric calibration are also necessary to reduce noise interference, improve reflectance precision and radiometric accuracy. These operations can convert the original images into the hyperspectral reflectance data of the whole tobacco fields. Radiometric correction is essential for correcting systematic error and radiation distortion (Watts et al., 2012). Considering the atmospheric gases and aerosols absorption during the image collection, methods based on the radiation transmission theory have been widely used for radiation correction, such as MODTRAN (Berk et al., 2014), 6S (Hu et al., 2013), and FLAASH (Vibhute et al., 2015). Moreover, due to the effects of hyperspectral sensors, system platforms, and terrains in data acquisition. The generated image pixels are squeezed, stretched, distorted, and offset with respect to the actual position of planting areas. Thus, geometric correction is necessary too. In practice, both radiometric and geometric corrections are well-established techniques that can be processed directly in professional software (e.g. ENVI, ERDAS, and IDRISI).
The obtained hyperspectral image contains sufficient spectra information of tobacco leaves from each pixel in an image of a scene. However, not all spectral bands are sensitive to the observed indicators. The sensitivity of different bands is varied. Besides, hyperspectral data also have a high dimensionality and high similarity of adjacent bands. So, it is necessary to perform dimensionality reduction and denoising. In order to select the most sensitive spectral bands, various data transformation or feature extraction algorithms are applied, such as the principal component analysis (PCA), the successive projection algorithm (SPA), the elastic net (EN) algorithm, and fuzzy clustering (Koonsanit et al., 2012;Cohen et al., 2013;Liu and Li, 2017;Zhang et al., 2017). The details of those commonly used data dimensionality reduction methods are described in the review Sun and Du (2019).
After performing hyperspectral dimensionality reduction, spectral inversion is conducted to extract information from hyperspectral images for various data mining tasks. Here, inversion modeling plays an important role in quantitative estimation. It bridges the gap between hyperspectral reflectance data and agronomic traits of tobacco. In general, most of the inversion methods can be divided into empirical models and mechanistic approaches, or a combination of them Weiss et al. Comparison of MRS (left) and HRS (right) data, x, y indicate the spatial domain, l indicates the spectral domain. The general workflow of UAV-borne HRS for PTA applications. Zhang et al. 10.3389/fpls.2023.1073346 Frontiers in Plant Science frontiersin.org (2020). The empirical models mainly rely on data collection and statistics, also known as "regressions", such as partial least squares regression (PLSR) (Dong et al., 2015), support vector machine (SVM) (Mountrakis et al., 2011), random forest (RF) (Johansen et al., 2020), neural networks (Yuan et al., 2017), etc. Its main task is to fit the numerical relationship between the measured agronomy traits in practical and spectral features. As for the mechanistic methods, they are mainly based on assumptions and modeling. For example, radiative transfer model (RTM) (Erten et al., 2016), physically-based model (Verrelst et al., 2019), SVM (Rivera et al., 2015), and neural network (Ermida et al., 2017). Furthermore, the deep learning methods of convolutional neural networks (CNN) are very effective for extracting agronomic features from HRS images (Kattenborn et al., 2021). The existing studies demonstrate that CNN can be utilized in various specific problems, such as tree species classification (Sothe et al., 2020), palm tree detection (Freudenberg et al., 2019), mapping plant communities (Wagner et al., 2019), etc. It provides researchers with a more effective HRS image analysis method, and numerous works have demonstrated that CNN outperforms shallow machine learning methods.

Available HRS data acquisition systems
Data is the most important part of HRS for tobacco agronomic traits analysis. In general, there are two kinds of data that are necessary. One is the hyperspectral reflectance data of tobacco fields. Another is the corresponding agronomic parameter data. The former can be obtained by various hyperspectral data acquisition systems (e.g., handheld spectrometer, UAV-borne, and satellite). The latter is measured by specific instruments and laboratory analysis (e.g., LAI-2500, SPAD-502Plus, and AutoAnalyzer 3), which may take some time to get results. In this section, we have a brief comparison between the handheld spectrometer, UAV-borne, and satellite systems. An intuitive comparison can be found in Table 1.
The first is handheld spectrometers (e.g., ASD FieldSpec 4, Specim-IQ). They have a high resolution and signal-to-noise ratio, better intensity accuracy and wavelength accuracy, as well as strong resistance to external interference and excellent instrument stability. They also come with a collection of great calculation tools and can perform some complex calculations, such as derivation, deconvolution, etc. In the agricultural field, due to its small size, lightweight, and convenient carrying. Some field experiments can be carried out and the measurement results can be displayed within seconds, which greatly improves efficiency. Thus, they are widely adopted for crop agronomic traits monitoring (Jia et al., 2013b;Liang et al., 2018;Cao et al., 2021). The shortcoming is that they take a lot of time to collect data due to the small coverage, especially when dealing with large planting areas.
The second is the UAV-borne HRS system platforms (e.g., V185G, GaiaSky-mini3-VN). UAV platforms are more flexible, especially in terms of revisit frequency. They can perform observation tasks in a specific area at any time, as long as the meteorological conditions are favorable (e.g., low wind speed, clear sky, and cloudless), which increases the efficiency of hyperspectral image acquisition. The application of UAV platforms makes it possible to obtain and analyze tobacco plants quickly at the canopy level (Inoue et al., 2012;Zhu et al., 2020;Liu et al., 2021). In addition, with the improvement in load capacity and battery endurance, there is also significant performance in face of largescale regional observation tasks . Compared to handheld spectrometers, UAV platforms save a lot of manual work and time; and compared to satellite platforms, UAV platforms are relatively accurate and convenient observation tools. The working height of UAV-borne HRS is usually 100 m. Thus the spatial resolution of UAV imagery is higher than satellite but lower than handheld. There are also some limitations of the UAV itself, such as flight duration, flight stability, and the maximum load, all of which still need to be improved.
The third is satellite-based hyperspectral data observation system platforms (e.g., GF-5, EO-1 Hyperion, and MODIS 3 ). All of them have a greater swath width and larger spatial coverage. It makes them have a significant performance in face of large-area observation tasks (Chaurasia et al., 2006;Wang et al., 2021). But their spatial and temporal resolutions are relatively low. The working height is usually several hundred kilometers and the revisiting cycle often takes a few days. Because of the huge launch and maintenance cost, most of the satellite system platforms are supported by governments or large business organizations (e.g., CNSA 4 , NASA 5 , and Space X). However, the public can access some satellite data for free or by paying some fees (e.g., Landsat, Sentinel, and Gaofen). It should be noted that the quality of satellite hyperspectral images is highly susceptible to environmental factors such as cloud cover, rainy weather, and clutter reflections (Mulla, 2013). So, It may be difficult for the public to collect highquality satellite hyperspectral images focused on the specific area and timings .

Applications and methods
In this section, the specific studies are introduced from three aspects: quality estimation, yield prediction, and stress detection.

Quality estimation
Tobacco quality is a holistic and dynamic concept, high-quality tobacco evolves over time, geography as well as consumers' desires. In general, tobacco quality mainly includes four aspects: The existing researches on tobacco quality estimation are decentralized, and the studies mainly focus on chemical compositions and exterior quality, rarely involving inherent quality and physical properties (not the forte of HRS technology). However, the inherent quality can be inferred by chemical compositions (Shen et al., 2017).

Chemical compositions 4.1.1.1 Nitrogen
Nitrogen is the most important nutrient for tobacco growth. Over-and-under-application of nitrogen fertilizers not only limits tobacco productivity but also leads to a negative impact on quality. The tobacco plants absorb the most nitrogen after 40 days of transplanting. An excessive supply of nitrogen fertilizer will result in the leaves being larger than normal, delaying tobacco maturity. Insufficient nitrogen will also lead to a delay in ripening, leaves becoming brown, and declining quality (Li, 2006). Moreover, low leaf nitrogen content (LNC) makes it taste bland, and high LNC will lead to a pungent smell (Shen et al., 2017). Thus, an accurate estimation of nitrogen status is essential to determine the final quality and total yield, improve the use efficiency of nitrogen fertilizer, and reduce environmental pollution . Jia et al. (2013b) extracted the central band that is sensitive to tobacco LNC based on the coefficient of determination (R 2 ) of the linear regression model using the specific ratio vegetation index (SR) and normalized difference vegetation index (NDVI) as independent variables. The optimum band combination was R 590 / R 1980 for SR, and (R 1970 -R 650 )/(R 1970 +R 650 ) for NDVI. They selected 20 SR and 20 NDVI band combinations with the higher R 2 as the independent variables of stepwise multiple linear regression (SMLR) and error back propagation neural network (BPNN) models to inverse the tobacco LNC. The experiment results showed that the BPNN model achieved the best performance with R 2 was 0.91 and the root mean square error (RMSE) was 0.09. The R 2 and RMSE of the SMLR model were 0.86 and 0.60, respectively. Liang et al. (2014) investigated the relationship between spectral features of tobacco cultivars and their nitrogen use. A 15 N tracer pot experiment was conducted with four tobacco cultivars under different nitrogen use efficiency. The authors configured two nitrogen levels, N1 (1.0 g/pot) and N2 (3.0 g/pot), and utilized three VIs (i.e., ratio vegetation index (RVI), difference vegetation index (DVI), and NDVI) to evaluate the nitrogen use efficiency.

Phosphorus
Phosphorus is an essential mineral element required for tobacco photosynthesis and respiration. Li et al. (2014) generated a visual reporting system to monitor the dynamic changes of phosphorus concentration by expressing a purple gene extracted from cauliflower. The authors selected wild-type and transgenic tobacco plants as the experiment targets and studied their correlation between leaf phosphorus concentration and the hyperspectral reflectance at 554 nm. The results showed that the R 2 of transgenic tobacco leaves was 0.96, and the R 2 of wild-type leaves was only 0.45.

Potassium
Potassium is also an essential mineral element that can increase the intensity of photosynthesis. In general, the higher content of potassium in tobacco leaves, the higher yields and quality will be. Li (2006) studied the quantitative relationship between the leaf potassium concentration and 19 spectral parameters of tobacco. The modeling method was exponential fit. According to the fitting results, there were three spectral parameters achieved better performance: pigment-specific simple ratio (PSSRa), optimized soil adjusted vegetation index (OSAVI), and NDVI (670, 780 nm), the corresponding R 2 were 0.929, 0.928, and 0.927, respectively. Junying et al. (2020a) proposed a method to predict tobacco K 2 O content based on UAV-borne hyperspectral imaging. The model equation was: where Y was the predicted value of tobacco K 2 O content, and R 498.6 was the first derivative of the logarithm of original reflectivity at 498.6 nm. The results on test set showed that the RMSE of this model was 0.40, and the absolute value of the mean relative error was 8.04%.

Chlorophyll
Chlorophyll is an important indicator in the process of plant growth, including photosynthetic rate, nutritional status, and maturity (Peng and Gitelson, 2012). Especially for tobacco, a broad-leaf crop with leaves harvested, leaf chlorophyll content The above parameters are referenced to ASD FieldSpec 4, DJ M600-borne GaiaSky-mini3-VN, and MODIS, respectively.
plays an important role in growth and quality. Guo et al. (2019) investigated the relationship between leaf chlorophyll content (LCC) and various tobacco canopy hyperspectral parameters, including 9 parameters based on red edge position, 3 parameters based on red edge area, and 6 parameters based on VIs. Among them, 7 parameters with high significant level were taken as the independent variables of six regression functions to build inversion models (i.e., linear, exponential, parabolic, power, logarithm, and cubic regression models). Thus, there were 42 inversion results in total. The combination of (SD r -SD y )/(SD r +SD y ) 6 and linear regression obtained the best performance with R 2 = 0.948, RMSE=0.127 mg/g, and relative error (RE)=9.31%.  conducted an spectral inversion of tobacco chlorophyll-a content under different light qualities. The leaf spectral reflectance data was collected by an ASD field spectrometer. Linear regression and BPNN models were applied to predict leaf chlorophyll-a content. The results demonstrated that BPNN has the most reliable performance with R 2 = 0.86 and RMSE=0.05. A similar study can also be found in Dongyun et al. (2015). Roughly the same parameters were used to estimate the LCC of tobacco leaves infected by the mosaic virus. The best correlation was achieved for the combination of (SD r -SD y )/(SD r +SD y ) and SD r /SD b under the stepwise regression model (R 2 = 0.885).

Total sugar
Total sugar is an important biochemical indicator reflecting the quality of tobacco leaves. It has a balanced effect on the taste of tobacco products. Junying et al. (2020b) proposed a method to predict the total sugar content based on UAV-borne hyperspectral imaging. The model was built by combining the spectral characteristics and the measured total sugar values. The function formula was: where Y was the predicted content of total sugar, R 863.59 , R 414.7 , and R 469.29 denoted the first derivative of the logarithm of the original spectral reflectance at 863.59, 414.7, and 469.29 nm, respectively. According to the sample test results, the RMSE of this model was 1.84, and the absolute value of the mean relative error was 8.82%. Soares et al. (2019) developed an inline simultaneous analytical method to quantify the leaf sugar content using near-infrared hyperspectral imaging. The inversion model was established offline using partial least square regression (PLSR). The R 2 and RMSE were 0.778 and 2.28, respectively.

Alkaloid
Nicotine is the main alkaloid in tobacco and is the primary factor in the commercial value of tobacco Henry et al. (2019). Moreover, nicotine is also the foremost chemical that influences tobacco quality. The leaf nicotine content is a key indicator for estimating the quality of fresh tobacco leaves (Dou et al., 2016). In order to quantitatively determine the relationship between leaf nicotine content and spectral reflectance, Jia et al. (2013a) explored the specific bands that can be utilized to detect nicotine. The SMLR and BPNN were applied to establish the inversion model between hyperspectral reflectance and leaf nicotine content. The experiment results showed that BPNN had the most significant performance with R 2 = 0.968 and RMSE=0.109. Soares et al. (2019) developed an inline simultaneous analytical method to quantify nicotine content using near-infrared hyperspectral imaging. They used PLSR and achieved a result of R 2 = 0.798 and RMSE=0.447. Dou et al. (2016) evaluated the relationship between 11 spectral parameters and leaf nicotine content. The first-order derivative of reflectance data was calculated to perform a standardized analysis. Furthermore, five methods (e.g., linear, power, logarithmic, exponential, and negative exponential) were utilized to fit the values. The statistical analysis showed that the combination of power function and (SD r -SD y )/(SD r +SD y ) obtained the best results with R 2 = 0.8112, RMSE=0.2272, and relative error (RE)= 14.42%. Divyanth et al. (2022) applied hyperspectral and four machine learning algorithms to predict tobacco nicotine content. The average spectra of region of interest (ROI) were used to establish the inversion model based on PLSR, RF, support vector regression (SVR), and PLSRa€"variable importance in projection (PLSRa€"VIP). The models were evaluated using leave-one-out cross-validation and on 15% test set. The results showed that the PLSR (R 2 = 0.93, RMSE=0.21%) outperformed SVR (R 2 = 0.89, RMSE=0.36%), RF (R 2 = 0.90, RMSE=0.35%), and PLSR-VIP (R 2 = 0.91, RMSE=0.30%).

Moisture
Leaf moisture content is an important index for tobacco cultivation and precision field management. Sun et al. (2016) proposed a fast and non-destructive way to evaluate the leaf moisture content of tobacco leaves. Mahalanobis distance coupled with Monte Carlo cross-validation (MCCV) was applied to eliminate outlier samples. Savitzky-golay smoothing (SG), roughness penalty smoothing (RPS), kernel smoothing (KS), and median smoothing (MS) were applied to preprocess the raw data. Then SPA and MLR were used to select crucial bands and build the inversion model, respectively. The results showed that the best model was MD-MCCV-MS (R 2 = 0.9132, RMSE=0.1162).

Leaf area index
Leaf area index (LAI) is one of the most essential exterior parameters of tobacco. It reflects the tobacco canopy structure and growth status. Two external quality indicators, max leaf width & length, are necessary to determine LAI. So, we classified LAI into external quality. The relevant formula is: where k is a constant with a value of 0.6345, ϱ is the planting density, L ij is the value of leaf length, W ij is the value leaf width, m is the number of measured plants, and n is the number of leaves of each plant, respectively. Chaurasia et al. (2006) estimated the field-scale LAI of tobacco using MRS data from a satellite platform. The ground LAI data were measured by LAI-2000 (LICOR Inc., Nebraska) canopy analyzer. Two regression models (exponential and power functions) were conducted between the measured ground LAI and three vegetation indices (SR, NDVI, SAVI). The power model performed better than the exponential model for LAI estimation (NDVI: R 2 = 0.62). This work demonstrated the feasibility of satellite MRS data for fieldscale LAI estimation, although the correlation is not high. ZhengYang et al. (2011) assessed and compared the performance of some hyperspectral models in terms of their prediction capability of tobacco LAI. The hyperspectral data were collected in different water and nitrogen conditions by handheld spectrometer. Four vegetation indices, NDVI, RVI, modified soil-adjusted vegetation index (MSAVI), and modified second triangular vegetation index (MTVI2). The PCA method was applied for hyperspectral data dimensionality reduction, and BPNN was used for LAI inversion. The R 2 and RMSE of the BPNN model were 0.889 and 0.195, respectively. Qiao et al. (2011) studied the relationship between NDVI and LAI. A linear regression model was built and the R 2 was 0.568.

Tobacco classification
Tobacco classification is an important method for evaluating the grades of tobacco leaves. The determination of tobacco grades directly involves the purchase prices, which is important for farmers, enterprises, and other parties, so the relevant study is of great significance in practice. Current research advances in tobacco classification have focused on scoring tobacco leaves for size, color, structure, chrominance, or completeness using RGB images (Bose et al., 2016;Fan et al., 2018;Lin et al., 2022). Considering that the hyperspectral images contains more spectral features than RGB images. In this case, we can establish a relationship between the chemical compositions and exterior qualities according to the hyperspectral reflectance data (Liu and Shi, 2020). Thus, the classification accuracy can be greatly improved. And the feasibility has been proven in studies on 366 the classification of tobacco leaves health grades. Zhu et al. (2017) used three machine learning algorithms to achieve early detection of tobacco mosaic virus via hyperspectral images. Herein, the SPA method was adopted to select the effective wavelengths to reduce the redundant spectral information. The RF, SVM, and BPNN were applied to guarantee the detection accuracy and obtain more valuable features. The experiment results showed that the overall accuracy of the train set and test set varied between 84.17-100.00% and 75.00-98.33%, respectively. The study in Gu et al. (2019) attested to the applicability of HRS imaging technology in the detection of tobacco tomato spotted wilt virus (TSWV) infection. The authors adopted three wavelengthAE' selection methods, SPA, boosted regression tree (BRT), and genetic algorithm (GA), and four machine learning algorithms, BRT, SVM, RF, and classification and regression tree (CART), to analyze the spectral characteristics of normal and diseased leaves in the range of 400-1000 nm. The results showed that the reflectance curve of healthy leaves was significantly higher than diseased leaves after 5 days of infection. The overall classification accuracy reached 95.8% under the SPA-BRT model. Sahu and Dante (2018) investigate the potential of HRS imaging for cured tobacco classification. A multivariate calibration model was developed using end-member extraction and linear discriminant analysis (LDA). Mahalanobis distance was used to show the differences between different tobacco grades. The classification accuracy can reach 93%.

3D modeling
Considering the complex geometry of plants and their interplay with the illumination scenario highly affects spectral information acquisition. Behmann et al. (2015) proposed a 3D modeling method combined hyperspectral images and 3D point clouds. The authors used tobacco leaves as an example and analyzed the effects of plant geometry on NDVI. The geometry sensor with different elevation angles resulted in different NDVI values. The low NDVI values on the horizontal parts were caused by the specular reflection, which was independent from leaf chlorophyll content.

Yield prediction
Tobacco yield predictions are important to stabilize tobacco prices in the marketplace and policy making. The producers need to monitor crop growth and development, an accurate early production forecast is as relevant for farmers as it is for the entire tobacco industry. Svotwa et al. (2013) reviewed the applications of remote sensing in crop area assessment and yield prediction, some recommendations were given for tobacco such as the Garvin model, feasible VIs, etc.
As soil nutrients and fertilizer application play a significant role in tobacco growth and yield. Chang et al. (2014) investigated the potential of NDVI for management zone delineation to build fertilizer applications in tobacco-planted fields. The yield mapping was built through SMLR analysis to find the key yieldlimiting factors of soil components and NDVI. The value of NDVI was collected by the GreenSeeker handheld spectrometers. According to the results, the soil organic matter, active phosphorus, and available nitrogen were the main limiting factors for tobacco growth. The results also showed that the value of NDVI_60 (60 days after transplanting) had a relatively high correlation with yield. Falcioni et al. (2022) proposed a rapid quantification method to estimate biomass production using HRS with visible, near-infrared (NIR), and shortwave spectroscopy (SWIR). PCA and PLSR algorithms were used to extract the key wavelengths and built the prediction model of tobacco yield, respectively. The results showed that the most important wavelengths were well distributed into 400 (violet) 440 (blue), 550 (green), 670 (red), 700-750 (red edge), 1330 (NIR), 1450 (SWIR), 1940 (SWIR), and 2200 (SWIR) nm operating ranges of the spectrum. The established model also had an excellent prediction capacity for yield with R 2 = 0.85 and RMSE=0.93.
Besides the soil nutrients and fertilizer application, photosynthetic capacity is also a major factor affecting crop yield. Increasing photosynthetic capacity remains probably the best strategy for improving crop yields (Ort et al., 2015). Herein, the maximum Rubisco carboxylation (V c,max ) and maximum electron transport rate (J max ) are generally used as indicators to assess photosynthetic capacity. And hyperspectral techniques coupled with machine learning methods are effective in quantifying these parameters .
Three different methods used the PLSR model with inputs of hyperspectral reflectance (400-900 nm), VIs (SR, modified normalized difference index (mND), and structure insensitive pigment index (SIPI)), and RTM-derived (PROCOSINE model) crop traits, were synthesized and compared with their ability to reveal photosynthetic differences across tobacco species (Fu et al., 2020). The results showed that PLSR with inputs of hyperspectral reflectance and VIs achieved an R 2 of ∼0.8 for predicting V c,max and J max , higher than the R 2 of ∼0.6 obtained by PLSR of PROCOSINE model. However, the performance of the PLSR model varies significantly across species, regions, and growth environments. To alleviate this bottleneck, Fu et al. (2019) developed a novel ensemble framework that stacked six machine learning algorithms (e.g., artificial neural network (ANN), least absolute shrinkage and selection operator (LASSO), Gaussian process (GP), SVM, RF, and PLSR) to estimate V c,max and J max . The ensemble framework was established based on leaf reflectance spectra in the range of 400-2500 nm and six tobacco genotypes. According to the results, the mean R 2 and RMSE of six regression algorithms for predicting V c,max (J max ) ranged from 0.60 (0.45) to 0.65 (0.56) and 47.1 (40.1) to 54.0 (44.7) mmol m -2 s -1 , respectively. And the stacking regression performed better than any of the individual models with increases in R 2 of 0.1 (0.08) and decreases in RMSE by 4.1 (6.6) mmol m -2 s -1 .

Stress detection
Stress detection aims to assess various factors that are detrimental to the survival and growth of tobacco plants, usually caused by infection and competition, such as disease, pests, weeds, heavy metal damage, etc. All of these are the main limiting factors for the final yield and quality of tobacco.

Disease and pest
Hyperspectral imaging technology has been successfully applied for plant disease detection, modeling, and classification (Moghadam et al., 2017). Wang et al. (Wang et al., 2011;Wang et al., 2012) focused on tobacco plants and studied the feasibility of HRS technology to monitor disease and pest stress in natural conditions. The raw hyperspectral data were measured by ASD handheld spectrometers and transformed by the first differential coefficient. The results showed that the wavelengths of 631, 638, 696, 733, and 864 nm were sensitive to severity levels, which provided a theoretical foundation for the application of HRS technology to quantify disease and pest stress levels. Yusuf and He (2011) investigated the effect of black-shank disease on the spectral characteristics and leaf water content of tobacco. The diseased tobacco plant samples were obtained via artificial inoculation. The corresponding reflectance data were collected by the hyperspectral imaging system in the laboratory. PCA and minimum noise fraction (MNF) methods were used to extract pivotal information and remove noise. Plant senescence reflectance index (PSRI) and water band index (WBI) were used to determine the disease level and leaf water content. The results demonstrated the wavelength of 730 and 790 nm were the most useful for discriminating black-shank disease severity levels, with an overall accuracy of 90 to 94%. Krezhova et al. (2014) applied HRS technology to detect TSWV infection at young tobacco plants. The hyperspectral reflectance data were obtained by a handheld spectrometer on the 14th and 20th days after the inoculation. The leaf viral concentration was determined by the serological method, i.e., double antibody sandwich enzyme-linked immunosorbent assay (DAS-ELISA). According to the results, on the 14th day after inoculation, there are no visible changes but the mean spectral reflectance had significant differences between healthy and infected plants at four spectral ranges (green, red, red edge, and NIR regions). And on the 20th day, the infection was deepening and the position of red edge was shifted. The results were consistent with the serological analysis.
Hayes and Reed (2021) conducted a field study using UAVborne hyperspectral imaging to detect tobacco black-shank disease. In this work, the authors proposed two hyperspectral indices (broad-band index and narrow-band index) to observe the differences in the mean spectral reflectance of symptomatic and asymptomatic tobacco plants. The subspace LDA algorithm was adopted to test the identification ability and obtained an overall accuracy of 85.7%. Hong-Bo et al. (2007) investigated the spectral features of tobacco leaves infected by aphids and had a comparison of different damage levels. The reflectance curve and its first-order derivative curve were selected as the observation indices. And the linear regression model was applied to analyze the leaf chlorophyll content under different aphid damage levels(healthy, light, middle, and severe). The results showed that the values of reflectance curves decreased with increasing damage levels. The descent rate was 12%, 27%, and 52%, respectively. As for the first-order derivative curve, the maximum values of spectral reflectance also decreased as the damage level increased. The maximum values of the derivative were 0.031, 0.022, 0.026, and 0.019, respectively.

Heavy metal
Excess heavy metals in crops will depress normal plant growth and the yield will be harmful if they are loaded into the food chain. Copper ion is an indispensable element for plant growth, but too large concentrations can also impair normal plant growth. Qu and Jiao (2018) investigated the copper ion content of tobacco leaves under copper-stressed conditions from hyperspectral data by inverting a modified RTM (PROSPECT cu ). According to the experiment analysis, the copper ion content had a high sensitivity in the range of 1896-1973 nm. The results showed that the values of R 2 and RMSE were 0.87 and 0.087, respectively. Yu et al. (2021) aimed to identify the tobacco canopy features that respond to leaves stressed by different concentrations of hydrargyrum (Hg). PCA and the competitive adaptive reweighted sampling (CARS) algorithm were used to reduce the hyperspectral data dimensionality and pick effective wavelengths. Partial least squares discriminant analysis and least-squares SVM (LS-SVM) algorithms were utilized to assess the stress levels of tobacco plants. As a result, the combination of CARS and LS-SVM methods achieved an accuracy of 100%. Henry et al. (2023) investigated the spectral differences of tobacco leaves under macronutrient deficiencies. Information entropy and spectral derivatives methods were adopted to identify the efficient wavelengths. PCA and LDA algorithms were used to reduce data dimensionality and classify the symptoms. The results showed that the overall accuracy on young, intermediate, and mature plants was 92%, 82%, and 75%, respectively. The results also showed that the deficiencies of nitrogen, sulfur, and magnesium will affect the classification accuracy to a large extent, but phosphorus and potassium deficiencies had little effect on the results.

Issues and recommendations
HRS is a non-destructive information acquisition technology about objects from distance. This character is perfect for crop quality estimation, yield prediction, and stress detection. The existing researches show that HRS technology has enormous potential for various agricultural applications. In our view, HRS will be indispensable for digital agriculture and agricultural informatization in the future. Certainly, there are also some problems to be solved, whether the technology itself or the specific applications.

Issues
First of all, the cost of hyperspectral data acquisition is relatively high, no matter the financial or labor cost. Taking the example of a UAV-borne HRS system, the price of UAVs varies by type and function. It may take thousands to tens of thousands of RMB. The carried hyperspectral camera, the price is approximately half a million RMB or more (Feng et al., 2021). Ordinary farmers or research groups rarely have their own UAV-borne HRS devices due to lacking finance and technology support. They usually choose to rent a suite of equipment from the data service providers, and the price is about 50000 RMB per time. As for the ground-based handheld spectrometer, it can provide the highest accuracy of reflectance data with less interference, but it requires operators to traverse the entire field and select suitable samples to collect spectral data. This method is troublesome and time-consuming, especially facing a large-scale area. And its price is about 150000-300000 RMB. Thus, the popularization of agricultural UAV-borne HRS still faces obstacles. Secondly, external factors will affect the image quality during data collection, such as measuring time, light intensity, solar altitude angle, etc. Due to the limitation of endurance capability, the UAV must complete the mission within a limited time (about 30 minutes). To guarantee image quality, the UAV should keep at a suitable height (about 50-100 m). The obtained hyperspectral images contain the spectral information of all ground objects in the lens, such as crops, soil, roads, and weeds, which may cause noise for targets to distinguish. How to balance the image quality with the flying height, time, spatial resolution, and coverage area still need further investigation.
Thirdly, the ripening and harvesting times of tobacco leaves in different positions are varied (usually 20 days apart). The order of harvesting is bottom, middle, and top, respectively. Some observation tools (e.g., UAVs) can only obtain the canopy reflectance data. Therefore, we can easily find that the canopy spectra are not fully representative of the bottom and middle. In practical research, this problem may lead to large differences between the results obtained by model prediction and the actual values. Besides, the growth status of tobacco seedlings in each period from transplanting to harvesting may also affect the final quality. However, many studies collected plant samples from one stage (e.g., returning seedling stage, root elongation stage, flourishing stage, or maturity stage). So, whether the canopy spectral data in one stage can predict the final tobacco quality is also a question that needs to be verified.
Fourthly, the relevant research about tobacco are decentralized, mainly focusing on one agronomic parameter, and establishing an inversion model based on the corresponding hyperspectral reflectance data. As for the deeper active mechanisms, there are few studies explored. According to the discovery of Li (2006), the potassium ion has significant effects on leaf nitrogen and chlorophyll content. So, the relationship between various biophysical indices is an important basis for spectral inversion. How to exploit these relationships to monitor agronomic parameters of tobacco that are not sensitive to the spectral response is also worth studying.
Finally, the localization and universality of various models. Due to the differences in species, regions, and growing environment, the established inversion models may have some unique geographical features. We named this phenomenon "model localization". But some researchers prefer universal models. For example, (Feng et al., 2021) hope to construct a universal crop monitoring model based on UAV-borne HRS. The support of existing technologies such as ensemble learning  and transfer learning Wan et al., 2022) make it possible to build universal models.

Recommendations
The researches mentioned in this review illustrated that the HRS technology was effective for various precision tobacco agriculture scenarios (e.g., quality estimation, yield prediction, and stress detection). However, there are still many challenges to make these studies available to guide the practical production. Here are some recommendations for future studies.
The first is hyperspectral data collection. Recent researches have demonstrated that UAV-borne HRS is a game-changer in precision agriculture, which offers unprecedented spectral, spatial, and temporal resolution (Maes and Steppe, 2019). However, the accuracy of UAV-based data is relatively lower than handheld spectrometers. So, more works with near-ground HRS calibration were needed to strengthen UAV-borne HRS for precision tobacco agriculture applications. Meanwhile, low-cost and highperformance UAVs should be manufactured to make them affordable to more people and to improve the performance of UAV platforms in terms of flight stability, duration and load. In our view, data quality is important, and it relies on high performance sensors, and only a tool that is economical enough will be widely used.
The second is data processing and modeling. Hyperspectral sensors are very sensitive optical components that are highly susceptible to environmental interference. The quality of the obtained data has a significant impact on modeling. And considering the hyperspectral data coupled with field sampling data is indispensable in spectral inversion. Many studies lacked the detection of outliers in field samples. They usually employed one or more algorithms to build simple inversion models and selected the best one which has the highest R 2 and lowest RMSE. The main work of researchers is to optimize the models and improve their accuracy. We think this is detrimental to the development of the remote sensing community. In future studies, more universal models should be introduced based on some novel technologies such as ensemble learning and transfer learning. Of course, methods to reduce the noise caused by environmental factors should also be proposed.
The third is to pay more attention to multi-parameter and multi-stage models for quality estimation, yield prediction, or stress detection. The existing literature mainly performs inversion or predictive modeling based on a single parameter or growth stage. It's necessary to investigate the dynamic development of tobacco phenotypic traits at different growth stages. Furthermore, the canopy reflectance spectrum is a comprehensive indicator. It is the result of all factors (internal and external) reacting together. So, the inversion models based on multi-parameter are meaningful to improve the overall accuracy. It can be used as a new research direction in the future.
The fourth is the relationship between the inherent quality and chemical compositions of tobacco. Currently, the inherent quality estimation mainly depends on the feelings of people smoking. However, this method requires evaluators to smoke frequently, which is very harmful to their health. So, we have an idea that the first step is the quantitative inversion of chemical compositions using HRS, and the second step is to establish the quantitative or qualitative relationships between chemical compositions and inherent qualities. The objective is to find the optimal range of each chemical composition corresponding to high-quality tobacco that meets consumers' demands.

Conclusions
In this paper, we focused on the application of HRS in precision tobacco agriculture and presented a comprehensive review of related applications and methodologies in terms of quality estimation, yield prediction, and stress detection. Compared to traditional destructive field sampling, laboratory testing, and MRS methods, HRS can provide unprecedented spectral, spatial, and temporal resolution. We compared three commonly used HRS system platforms: handheld, UAV, and satellite. Both of them have benefits, shortcomings, and suitable scenarios. We also depicted a detailed technology roadmap of UAV-borne HRS for precision tobacco agriculture. As for the specific applications, we summarized in three parts: quality estimation, yield prediction, and stress detection. The relevant modeling methods and their performances were also analyzed. In summary, the key issue is how to establish the quantitative inversion models between spectral features and the corresponding observation indices. The commonly used methods for hyperspectral data dimensionality reduction are PCA, SPA, GA, clustering analysis, etc. And inversion models are usually driven by PLSR, PCR, MLR, BPNN, SVM, RF, etc. Several studies used stacking regression. The independent variables of these algorithms are usually full-band spectrum, key-band spectrum, first-order derivative spectrum, and various VIs. Also, how to improve the accuracy and universality of the relevant models is still a challenge that needs to be solved.  Publisher's note All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.