A convolutional neural network with image and numerical data to improve farming of edible crickets as a source of food—A decision support system

Crickets (Gryllus bimaculatus) produce sounds as a natural means to communicate and convey various behaviors and activities, including mating, feeding, aggression, distress, and more. These vocalizations are intricately linked to prevailing environmental conditions such as temperature and humidity. By accurately monitoring, identifying, and appropriately addressing these behaviors and activities, the farming and production of crickets can be enhanced. This research implemented a decision support system that leverages machine learning (ML) algorithms to decode and classify cricket songs, along with their associated key weather variables (temperature and humidity). Videos capturing cricket behavior and weather variables were recorded. From these videos, sound signals were extracted and classified such as calling, aggression, and courtship. Numerical and image features were extracted from the sound signals and combined with the weather variables. The extracted numerical features, i.e., Mel-Frequency Cepstral Coefficients (MFCC), Linear Frequency Cepstral Coefficients, and chroma, were used to train shallow (support vector machine, k-nearest neighbors, and random forest (RF)) ML algorithms. While image features, i.e., spectrograms, were used to train different state-of-the-art deep ML models, i,e., convolutional neural network architectures (ResNet152V2, VGG16, and EfficientNetB4). In the deep ML category, ResNet152V2 had the best accuracy of 99.42%. The RF algorithm had the best accuracy of 95.63% in the shallow ML category when trained with a combination of MFCC+chroma and after feature selection. In descending order of importance, the top 6 ranked features in the RF algorithm were, namely humidity, temperature, C#, mfcc11, mfcc10, and D. From the selected features, it is notable that temperature and humidity are necessary for growth and metabolic activities in insects. Moreover, the songs produced by certain cricket species naturally align to musical tones such as C# and D as ranked by the algorithm. Using this knowledge, a decision support system was built to guide farmers about the optimal temperature and humidity ranges and interpret the songs (calling, aggression, and courtship) in relation to weather variables. With this information, farmers can put in place suitable measures such as temperature regulation, humidity control, addressing aggressors, and other relevant interventions to minimize or eliminate losses and enhance cricket production.


Introduction
In recent years, edible insects have gained global attention as an under utilized source of food with great potential to contribute to future food and feed needs.It is also noted that insect farming has minimal damage to the earth due to low greenhouse gas emissions (Lange and Nakamura, 2021).Some of the edible insects which can be farmed for food are the crickets.The nutritional content of edible crickets, including vitamins, minerals, proteins, fats, essential amino acids, and flavonoids, is comparable or even superior to that of common animal protein sources like fish, poultry, and cow meat.Owing to their nutritional value and contribution to the livelihoods of many communities globally, over 2,000 species of insects are consumed by hundreds of millions of humans for millennia in more than 110 countries worldwide (van Huis, 2013;Kelemu et al., 2015;Verner et al., 2021).Unfortunately, most edible insects are currently harvested seasonally from the wild, and the harvests are declining due to the degradation of their breeding habitats.Farming of edible insects is in its infancy due to several reasons including limited/unavailable rearing protocols and poor adoption of technology among the farmers.Generally, the production of edible insects as a source of food and feed is low compared to the market needs (van Huis, 2013;Kelemu et al., 2015;Magara et al., 2021;Tanga et al., 2021;Verner et al., 2021).
Manual identification and understanding of insects is tedious, time-consuming, and subject to human error (Alonso et al., 2017;Potamitis et al., 2017;Kawakita and Ichikawa, 2019;Noda et al., 2019;Zhang et al., 2021).Automatic sound/audio signal processing can be improved using machine learning (Noda et al., 2016(Noda et al., , 2019;;Phung et al., 2017;Kawakita and Ichikawa, 2019).Machine learning has successfully been deployed in the identification and classification of insects based on their species (Phung et al., 2017;Zamanian and Pourghassem, 2017), acoustics (Amlathe, 2018;Kiskin et al., 2020;Zhang et al., 2021), wingbeats (Arpitha et al., 2021;Kim et al., 2021), etc.For instance, Kawakita and Ichikawa (2019) explored the classification of bees and hornets based on their flight sounds using the support vector machine (SVM) algorithm combined with Mel-frequency cepstral coefficient (MFCC) features.The model achieved significant recall and precision metric scores but faced challenges in classifying species with subtle differences in sound features.Zamanian and Pourghassem (2017) used multi-layered perceptron (MLP) and genetic algorithms to classify cicada species based on their sounds.Dong et al. (2018) employed convolutional neural networks (CNN) with enhanced spectrograms for insect recognition, while Tey et al. (2022) used spectrogram images and deep learning algorithms for cicada species recognition.These approaches achieved accuracy rates ranging from 77.78% to 99.13%.Kim et al. (2021) and Zhang et al. (2021) used CNN models with MFCC to classify insect sounds, achieving accuracy rates of 92.56% and 85.72% respectively.
Generally, crickets have certain characteristics/behaviors which when learned/known, can be key in informing the farmer what to leverage to improve the health of the insects to increase production.Such characteristics include the sound that informs the behavior/health of the crickets.For instance, crickets produce sounds to signify/mean certain behavior/activities (e.g., courtship, calling, aggression, etc.) (Alexander, 1961;Miyashita et al., 2016;Lin and Hedwig, 2021).The loud calling songs are meant to attract distant females, soft courtship songs initiate sexual behavior with nearby females, and aggression songs are produced when fighting for mates and territories.Keeping other factors (e.g., feeding rate, etc.) constant, these activities are mainly influenced by the temperature and humidity (Ulagaraj, 1976;Srygley, 2014;Niemelä et al., 2019).Therefore, using state-of-the-art machine learning algorithms, this study developed a novel insect sound synthesis decision support system to enable farmers to understand the health/status of their cricket farms and make meaningful decisions as they farm edible crickets as an alternative source of proteins and food.The novel approach can assist farmers improve the production of edible crickets as a sustainable source of food for humans compared to livestock farming and contribute to alleviating food insecurity and malnutrition challenges.
The following are the key contributions of the work reported in this paper: • We studied the performance of shallow machine learning algorithms with numerical features and added weather variables features extracted from cricket songs.Moreover, the different numerical features were combined, trained on the shallow learning algorithms and their performances were evaluated.
• We investigated the important features of the extracted chroma frequencies and corresponding weather variables.Thereafter, the important chroma features were validated and mapped on the chromatic scale.The important weather variables were also discussed.
• We extracted image (i.e., spectrograms) features and trained several deep-learning CNN architectures .Weather variables were injected into the respective CNN architectures and merged with the image features in the deep learning architecture.
• We selected the best-performing machine learning model and deployed its pre-trained model on a decision support system (with a dashboard and notification system integrated) that can help farmers manage cricket farms.

FIGURE
This illustration depicts the experimental setup used to study cricket ( ) behavior and song production under varying temperature and relative humidity conditions.Temperature sensors or data loggers ( ) were placed within the setup to continuously monitor and record temperature changes throughout the study, helping to understand the impact of temperature on cricket behavior and song production.
Additionally, the video and sound recorder ( ) continuously recorded audio and video data, allowing for observation and documentation of the crickets' songs and behavior throughout the experiment and the data was transferred to a hard disk.All data collected during the experiment were transferred to cloud for storage and analysis.
This paper is organized as follows: Section 2 states the experimental setup that includes data collection, data preprocessing, feature extraction, and machine learning.Sections 3, 4 outline the key findings and their interpretation respectively.Lastly, Section 5 concludes the paper.

Materials and methods . Experiments setup and data collection
Video, sound, and associated weather data of the edible crickets, Gryllus bimaculatus, were collected in a laboratory (where environmental conditions were not controlled) as shown in Figure 1.The video was recorded using a Nikon Z6 II camera while temperature and humidity variables were recorded using an Internet of Things (IoTs) sensor installed in the laboratory.The temperature and humidity were transmitted to the International Centre of Insect Physiology and Ecology (icipe) virtual cloud at hourly intervals.During the data processing stage, the video footage was utilized to label the dataset, as it provided visual cues regarding the cricket behavior associated with the various songs they produce.
Cricket data at the nymph stage were not recorded since they do not chirp, i.e., their wings which enable crickets to produce sound are not fully developed.The female cricket's sounds were also not recorded as they do not produce any chirps (Jonsson et al., 2021;Lin and Hedwig, 2021).Therefore, this research processed sound  signals in relation to the male crickets at the adult and mature stages.Data were collected for single males, male-male, or malefemale to observe whether the crickets behave differently under different experimental setups; the specific days when the crickets were paired are shown in Table 1.The data was collected for over 24 hours for each pairing within the different stages for a period of 13 days continuously.The data collected consisted of 465 video recordings each 30 minutes long.

. Data preprocessing
Figure 2 illustrates the flow of the data cleaning process.At the end of the process, appropriate metadata were documented by filling a comma-separated values (CSV) template with the variables outlined in Table 2.The preprocessing steps are described below: Step 1: Recorded videos.The 465 recorded videos were categorized into two groups: those containing cricket sounds and those without any cricket sounds.
Step 2: Separating videos.Information from 202 video recordings, each spanning 30 minutes and containing cricket sounds, was captured in a CSV metadata file.Following this, a human expert reviewed the videos, noting the timestamps of the start and end of each chirp.These timestamps were then appended to the CSV metadata file.A Python script (icipe, 2023a), integrating the MoviePy (Zulko, 2023) library, was used to extract sounds from the videos.The sounds were saved in the form of Waveform Audio File Format (i.e., .wav).
Step 3: Denoising.During the video and sound recording in the laboratory, various noises such as vehicle sounds, bird chirps, and human activity were inadvertently captured.To address this issue, Audacity (Audacity, 2023) software was used to denoise the .wavsound clips.
Step 4: Extracting and annotating cricket chirps.After the 30min-long .wavfiles were extracted and denoised using a tailored Python script (icipe, 2023a), the timestamps recorded in Step 1 (recording the start and end of chirps) were employed to extract the cricket chirps/sounds.The denoised .wavfiles were subsequently saved in a separate folder and, the metadata associated with these files were documented in the CSV file (as shown in Figure 2, top right).The cricket species are known to produce three types of chirps/songs: aggression, courtship, or calling songs (Alexander, 1961;Miyashita et al., 2016;Lin and Hedwig, 2021).In this step,

Label
The actual behavior of the cricket as identified by a human expert the extracted sound clips were annotated with the guidance of an entomologist, the domain expert.This procedure involved referencing the CSV metadata file, retrieving the exact start and end timestamps of cricket chirps, listening to the corresponding songs, observing cricket behavior in the video footage, and labeling the behavior (e.g., aggression, courtship, or calling) in alignment with the corresponding record in the metadata file.In data engineering, this process is commonly referred to as "labeling." Step 5: Merging cricket chirps with weather variables.The CSV metadata, described in the top right of Figure 2 was used to merge the annotated cricket songs and their corresponding temperature and humidity based on the chirps timestamp and the day of data recording.
Step 6: Sound splitting (segmentation).In their natural behavior, cricket chirps can vary in duration, ranging from 0.4 seconds to 3 minutes (Jones, 1966;Mhatre and Balakrishnan, 2006).The duration for splitting sound signals has an impact on the size of the resultant feature matrix size.For example, a longer split duration leads to a larger feature matrix and consequently increases the training time of the algorithm (Gold et al., 2011).Conversely, splitting chirps into shorter lengths augments overall data volume, resulting in ample training data for models.To strike a balance, this study opted to divide cricket chirps into uniform 1-second segments, resulting in a total of 83,809 records.
Besides the independent (chroma, MFCC, and LFCC) features, this research combined chroma+MFCC, chroma+LFCC, and MFCC+LFCC and trained them on various shallow machine learning algorithms described in Section 2.3.The default hyperparameters were applied during feature extraction.The extracted features are described as follows: a) Mel-Scale Frequency Cepstral Coefficients : MFCCs are coefficients that collectively make up a Mel-Frequency Cepstrum (MFC).An MFC is a representation of the short-term power spectrum of a sound signal based on a linear cosine transform of a log power spectrum on a nonlinear Mel scale of frequency (Le-Qing, 2011).This feature extracts a default of 13 numerical coefficients.MFCC is the commonly used feature for insects songs processing and has been used by (Zhang and Guo, 2010;Silva et al., 2013;Noda et al., 2016Noda et al., , 2019;;Yazgaç et al., 2016;Phung et al., 2017;Amlathe, 2018;Kawakita and Ichikawa, 2019).This feature is more preferred because the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal spectrum.The feature was extracted using the librosa.feature.mfcc()function in the Python-based Librosa (McFee et al., 2023)  A pitch is separated into two components i.e., the tone height and chroma.The tone height represents the octave number and the chroma is a representation of the respective pitch spelling attribute.Octave represents 12 pitches.Conversion of audio to chroma can be performed using short-time Fourier transforms (STFT) or constant-q transform (CQT) (McFee et al., 2023).In this study, chroma STFT was used, which computes a chroma from a power spectrogram or waveform.The feature was extracted using the librosa.feature.chroma_stft()function in the Librosa (McFee et al., 2023) library.The extracted chroma features can be matched with different frequencies to determine the pitches within which the cricket songs fall.
c) Linear-Frequency Cepstral Coefficient : LFCC is a feature representation commonly used in audio signal processing and speech recognition tasks.LFCC has the same working as MFCC features and provides a linear-scale representation of the cepstral coefficients.This feature has been used in previous experiments by different authors (Potamitis et al., 2007;Silva et al., 2013;Noda et al., 2016Noda et al., , 2019;;Yazgaç et al., 2016).The features were extracted using the Python-based spafe library.This research extracted the LFCC feature using the spafe.features.lfcc.lfcc()function.
d) Spectrograms: Spectrogram features are represented on a 2D image.The x-axis represents time of sequences of spectra, and color brightness on the other axis represents the frequency of the strength of each component at each time frame.Spectrograms show where there is high or low energy, and how energy levels vary over time (Ali et al., 2024).In insect song synthesis, spectrograms capture the temporal and spectral characteristics of the insect sounds.The features were extracted using the librosa.feature.melspectrogram() function in the Librosa library.
Step 8: Class balancing.In machine learning modeling training an algorithm with imbalanced data leads to the model learning too much of the majority features than the minority.The dataset was explored to check for class imbalance (i.e., balanced labels/classes) based on the target feature.To handle the class imbalance problem, the Synthetic Minority Oversampling Technique (SMOTE) was used.SMOTE creates a new dataset and oversamples by introducing some variance in the minority class.It works by finding the nearest neighbors of the minority class and drawing a vector of each of those points.As such, the method increases the number of minority class instances (to a level set by the user) in the neighborhood, thereby assisting the classifiers in which the data will be fitted to improve their generalization capacity (Fernández et al., 2018).
Step 9: Label encoding.Machine learning algorithms typically work with numerical data and therefore converting categorical labels into numerical labels enables the algorithms to capture the ordinal or nominal relationship between categories.In this context, the labels (calling, aggression, and courtship) were encoded using preprocessing.LabelEncoder() function of Sklearn (Pedregosa et al., 2011) Python library.This ensured that the predictor variables could be correlated with the target variable for easier classification.
Step 10: Feature scaling.This technique is used to normalize independent variables within a certain range.Feature scaling ensures that all features contribute proportionally to the distance calculations and features with larger scales cannot dominate the distances, leading to biased results (Disha and Waheed, 2022).In this study, the data (MFCC, LFCC, spectrograms, and weather numerical variable) were normalized using the min-max (Pedregosa et al., 2011) scaler defined in Equation 1 where X is the original value, X max is the maximum value and X min is the minimum value.The scaler normalizes data within a range of 0 to 1.This study notes that chroma features were already within a scale of 0 and 1, as such, not normalized.
Step 11: Machine learning ready data.After the data normalization, the data was machine-learning-ready.The following section explains machine learning modeling steps.

. Machine learning modeling
Figure 3 shows the steps in machine learning modeling.They are described in the steps below.
Step 1: Data splitting.The data (83,809 records) were split into train and test sets in a ratio of 70:30 respectively using the sklearn.model_selection.train_test_splitfunction in the Sklearn (Pedregosa et al., 2011) Python library.By dividing the available data into separate training and testing sets, the model was trained on a portion of the data and its performance was evaluated on the unseen data.This helps in estimating how well the model generalizes to new/unseen instances.Data splitting helps in preventing data leakage, which occurs when information from the test set inadvertently influences the model training.Keeping the test set separate ensures that the model is evaluated on unseen data, providing a more accurate assessment of its performance (Joseph, 2022;Joseph and Vakayil, 2022).
Step 2: Training machine learning algorithms.This study implemented both shallow and CNN machine learning algorithms.These algorithms are imported from the Sklearn (Pedregosa et al., 2011) Python library and executed.The single features (described in Section 2.2, Step 7) and a combination of them were trained on SVM, k-nearest neighbors (KNN), and RF shallow machine learning algorithms which were configured with their default hyperparameters.Whereas spectrogram image features (described in Section 2.2, Step 7) were trained on CNN deep learning algorithms.These algorithms are briefly discussed as follows: a) Support Vector Machine : This algorithm is used for both regression and classification tasks.In the algorithm, each data item is plotted as a point in n-dimensional space.The ndimensional space represents the number of features to be classified in the model.Classification is done by finding an optimal hyperplane that separates the n-classes (Suthaharan and Suthaharan, 2016).
b) K-Nearest Neighbor : The algorithm evaluates the similarity between the new data and available cases and puts the new case into the category that is most similar to the available categories.
The algorithm stores all the available data and classifies a new data point based on the similarity.When new data appears, it is easily classified into a well suit category, using the KNN algorithm (Kramer and Kramer, 2013).c) Random Forest : This algorithm is mainly used for classification and regression problems.The algorithm consists of N decision trees trained on bootstrap random subsets of the data.RF utilizes ensemble learning, a technique that combines many classifiers to provide solutions to complex problems.The algorithm makes use of feature bagging, which has the advantage of significantly decreasing the correlation between each decision tree and thus increasing its predictive accuracy on average (Breiman, 2001).d) Convolutional Neural Network : CNN is a deep neural network designed to process grid-like data such as images, audio spectrograms, and time-series data.The algorithm works by assigning weights and biases to the input features based on the importance of various objects in the input image.CNN processes data in a 1D array and 2D array.Different architectures (i.e., EfficientNetB4, ResNet152V2, and VGG16) of CNN were used in this study.These algorithms are pretrained models built on the ImageNet dataset using the Keras framework.The initial layers of the pre-trained models were frozen, and the other layers of the models were fine-tuned on the acoustic dataset.The three models had similar architecture consisting of the average pooling 2D, the flattened layer, and the dense layer consisting of Relu activation and 256 neurons.This was followed by a final dense layer consisting of softmax activation and 3 neurons (aggression, calling, and courtship) which were the class labels to be predicted.For the loss, categorical_crossentropy was used since we only wanted to predict one class at a time.To prevent overfitting of the model, the early stopping methods with the patience of 10 and 50 epochs each with 58 steps were implemented.Table 3 contains optimization settings that were used across the pre-trained models.The pre-trained models took images (i.e., spectrogram features) as inputs.Numerical data (i.e., temperature, and humidity) were important variables in understanding the songs produced by the crickets and equally affected their growth.Therefore, the numerical data was injected and merged in the deep learning model architectures as shown in Figure 4.
Step 3: Testing and evaluating the machine learning algorithms.There are different evaluation metrics for classification problems such as accuracy, F1-score, confusion matrix, precision, and recall (Vujović et al., 2021).The choice of an evaluation metric depends on the problem one is investigating and the type of data one is dealing with.This study employed the F1-score and accuracy to evaluate the performance of the machine learning algorithms.The F1 score was selected due to the context of the problem (multi-class classification) and the nature of the data; which had imbalanced classes.It is noted that accuracy tends to underestimate classes with a smaller number of samples in relation to those with a larger number (Steiniger et al., 2020).Therefore, the accuracy score was chosen to compare its performance against the F1-score as it was the most used evaluation metric across many insects' song classification problems (Silva et al., 2013;Noda et al., 2016Noda et al., , 2019;;Amlathe, 2018;Kim et al., 2021).Accuracy and F1-score performance metrics were used to evaluate the SVM, KNN, RF, and CNN algorithms discussed in the previous section.Based on their performance, the best classifier was selected.
The F1-score and accuracy performance evaluation metrics are discussed below: a) Accuracy: It measures the number of correct predictions expressed as a percentage of the total number of predictions.the dataset.F1-score is common in evaluating the performance of models built from imbalanced datasets, as it is not influenced by the majority class.There are 3 average techniques used with F1-score: macro, micro, and weighted.The macro technique gives equal weights to all classes making it suitable for balanced datasets, while the micro technique works by dividing the sum of the diagonal cells of the matrix by the sum of all cells.The weighted technique accounts for class imbalance by computing the average of binary metrics weighted by the number of samples of each class in the target (Pedregosa et al., 2011).F1-score was calculated using Equation 2 (Han et al., 2022).F1-score was calculated using the f1_score function in the Sklearn (Pedregosa et al., 2011) Python library Step 4: Picking the best classifier.The algorithm in the shallow and deep learning categories that gave the best performance metrics was selected.
Step 5: Feature importance, feature selection, and re-training.Feature importance makes us understand which features have the most influence on the model's prediction (Casalicchio et al., 2019).With deep learning, the model automatically locates important features (Liu et al., 2021), and therefore determining feature importance is not easy.Nonetheless, some shallow machine learning algorithms can perform feature importance, such as random forest, and support vector machine.Therefore, in this study, the best-performing shallow machine learning algorithm was selected to perform feature importance and feature selection based on the manually extracted features that were used to train the algorithm.Feature importance was carried out using the inbuilt feature_importances_ function in Sklearn (Pedregosa et al., 2011) Python library.The function derives the list of important features using Gini impurity (Disha and Waheed, 2022), defined in Equation 4, where P p refers to the fraction of positive samples and P n refers to the fraction of negative samples of the total number of samples.

Gini Impurity
Feature selection involves selecting the model inputs that best inform the model's prediction.Selecting the most relevant features helps improve model performance by reducing overfitting.Irrelevant or noisy features introduce unnecessary complexity to the model, leading to a poor generalization of unseen data.Therefore, feature selection enables the model to focus on the most informative features, resulting in better performance, lower variance, and improved predictive performance.It also reduces the computational cost of training and making predictions because of the fewer features.This study employed feature selection to identify a set of features out of those that were ranked (using important features) .The algorithm was retrained with the selected features.

. Deployment: the decision support system
For the farmer to interact with the system, and get insights (i.e., aggression, courtship, or calling) that inform the health status and activities of the insects under production, it was necessary to build and deploy the intelligent system (with the best shallow/deep machine learning pre-trained model at the back-end) on the web for universal access.The web application was developed on Django (Django, 2023) and deployed on a virtual machine instance in icipe's virtual private network on the cloud.

Class labels
The three call types (i.e., aggression, calling, and courtship) of the Gryllus bimaculatus cricket species were observed (based on the video recordings noted in Section 2.2, Step 1).

. Behavior of crickets and time
Figure 5 shows the singing behavior of crickets observed in 24 hours over the period of data collection.Calling songs were distributed across the hours of the day and on all the days.Generally, courtship and aggression songs were observed to occur on specific days.

. E ect of temperature and humidity on cricket songs
The lowest and highest temperatures were 20.2 and 25.5 • C while the lowest and highest humidity recorded was 28.3 and 59.4% RH respectively throughout data collection under natural conditions.Figure 6, shows how the different cricket songs change under the recorded temperature conditions.It is observed that the songs were recorded at temperatures above 22.5 • C to the maximum recorded temperatures, i.e., 25.5 • C (as shown in Table 3); a range of approximately 2 • C.Moreover, Figure 6, shows that most courtship and calling songs were recorded between a humidity of 35 and 55% RH.Most aggression songs were recorded between 30 and 55% RH and were generally evenly distributed.Generally, songs concentrated on the range of approximately 45% RH.

. Training and evaluating machine learning algorithms
The results of training RF, SVM, and KNN with both single and combined features as well as CNN architectures with spectrogram features are reported below:

Di erent behavior recorded over h in days.
A courtship song is produced by male crickets toward female crickets to initiate mating.A calling song is produced by male crickets to attract the attention of female crickets.Aggression song is produced by male crickets to male crickets in the fight for territory or females.

. . Single features in shallow learning algorithms
Table 4 summarizes the performance (accuracy and F1-score) results of the features trained on SVM, KNN, and RF machine learning algorithms trained with single features and fused with temperature and humidity variables.The best classifier was RF with an accuracy of 0.9277 and F1-score of 0.9394 when trained on chroma features.

. . Combined features in shallow learning algorithms
The merged chroma+MFCC (25 features), chroma+LFCC (25 features), and MFCC+LFCC (26 features) and fused with temperature and humidity variables were trained on SVM, KNN, and RF machine learning algorithms and the results are recorded in Table 4.The best classifier was RF when trained on chroma+MFCC with accuracy and F1-score of 0.9441 and 0.9463 respectively.

. . Spectrograms in deep learning algorithms
The extracted spectrograms and the weather data were injected and concatenated into the pre-trained models, and the results were recorded in Table 5.The best pre-trained model was ResNet152V2, with accuracy and F1-score of 0.9942 and 0.9854 respectively.
. Selecting the best classifier, feature importance and feature selection The results of training SVM, KNN, RF, and pre-trained models with single and combined features and spectrograms that were fused with weather variables (temperature and humidity) are documented in Sections 3.4.1,3.4.2,and 3.4.3.From the results, deep learning pre-trained models have a better performance than shallow machine learning models.ResNet152V2 was the best model among all the models trained, with a classification F1-score and accuracy of 0.9854 and 0.9942 respectively.With shallow machine learning algorithms, the RF algorithm was the best classifier when trained with chroma+MFCC, temperature, and humidity features.Figure 7 illustrates the features in their order of importance obtained from RF as the best classifier.In that order, starting from the 6 th feature, the other variables were added incrementally, trained with the RF algorithm, and evaluated on F1-score, and accuracy results are summarized in Figure 8.It was observed that the top 6 features (humidity, temperature, C#, mfcc11, mfcc10, . /frai. .  .Deployment: the decision support system architecture Figure 9 gives the overall architecture of the developed decision support system that was deployed (accessible here icipe, 2023b) on a Kubernetes (Kubernetes, 2023) orchestration system.Generally, the system had three main components: data collection done by the IoTs, the back end that processes the collected data, and the front end for visualizing the results.These components are discussed in detail in Section 4.6.

Class labels
The class labels (calling, courtship, and aggression) of cricket songs were similar to categories of cricket songs reported by other researchers (Alexander, 1961;Miyashita et al., 2016;Lin and Hedwig, 2021).Based on the distribution of those classes, Sections 2.2 (Step 8) and 3.1 identified and treated (using SMOTE) the class imbalance problem to prevent engineering a biased machine learning model that could understand and interpret the majority class more than the minority one.After resolving the class imbalance problem, we anticipate an equal/unbiased interpretation of the classes leading to a better performance of the model (Wang et al., 2016;Deng et al., 2022).Besides recording more calling songs compared to courtship and aggression songs, the study by Doherty (1985) also recorded more calling songs of the Gryllus bimaculatus de Geer cricket species and noted that the calling songs are more important than others since they trigger recognition and elicit phonotaxis (movement toward males) in female crickets.This could be a natural behavior for the survival (males calling the females to mate) of the crickets.Generally, the courtship songs are linked to mating.The cricket aggression songs are linked to the fighting behavior of the crickets.The calling songs are linked to oviposition, i.e., the males calling and the females laying eggs.Calling songs attracted the females toward the males for mating.

. Behavior of crickets and time
As noted in the previous section, crickets call to attract possible mates.A calling song is produced in all instances of cricket pairing highlighted in Table 1.In Section 3.2, it was stated that the calling songs were produced throughout 24 hours.Table 1 indicates the days the crickets were paired in the rearing area, comparing that with the calling songs, this study notes that in all pairs, the males produced calling songs throughout the day.This was important to attract the females for courtship, mating, and reproduction.
In Section 3.2 it was evident that the courtship song was produced only when the male was paired with the female cricket.We observe that this song was produced on days 0, 1, 2, 3, 8, 9, and 10 when the male was paired with the female as observed in Table 1; since the song produced by males was to initiate the mating process with the females.In the same section, we observed that the aggression song was produced on days 3, 4, 11, and 12 when the male was paired with the male as observed in Table 1.This confirms that the aggression song was produced by males toward males in-fight for dominance or territory.

. E ects of temperature and humidity on cricket songs
As stated in Section 3.3, the distribution of courtship, calling, and aggression songs were within a temperature range of 22.5 and 24.5 • C.This study shows that temperature and humidity were quite influential on the type of songs produced by the    2018) who found out that moderate/high temperature and humidity increased crickets' chirping rate.This also coincides with the findings of Doherty (1985) who found that crickets produced more calling songs within a temperature range of 15 and 35 • C.Moreover, Niemelä et al. (2019) found that cricket's behavior was temperature-dependent since an increase in temperature increased their ability to express their behavior and vice versa.Insects' behaviors are temperature and humiditydependent (Holmes, 2010;Ogah et al., 2012;Tochen et al., 2016;Niemelä et al., 2019).This study is confident that automatic synthesis of crickets' songs, temperature, and humidity using methodological (described in Section 2) machine learning approaches can inform farmers of crickets' health (e.g., growth and reproduction rate Ogah et al., 2012) , and therefore increase their production.

. Training, and selecting the best classifier
Different machine learning algorithms were trained with single, combined, and spectrogram features.The results are highlighted in Sections 3. 4.1, 3.4.2, and 3.4.3.The results from the combined features show that a combination of Chroma+ MFCC had the best performance when trained on RF.With the spectrograms, ResNet152V2 was the best classifier.Generally, image (spectrograms) features had a better performance when trained on deep learning algorithms than numerical features (Chroma, MFCC, LFCC, and their combinations) when trained on shallow machine learning algorithms.This shows that deep learning models perform better than shallow machine learning when voluminous data is available to train the algorithms.Therefore, this study considered ResNet152V2 as the best classifier FIGURE Deployment of the decision support system.After every min, the IoT device captures and transmits -second .wavfiles, temperature, and humidity data to the virtual private network through a built-in WIFI module.Thereafter, after every min, the system's back end fetches the latest records (.wav files, temperature, and humidity), preprocess them, and passes that to the pre-trained machine learning engine for prediction and the results are rendered on the front end dashboard.
and was selected for deployment in the decision support system.Since deep learning models do not provide a mechanism for understanding the best variables contributing to the model performance, we considered the best-performing shallow learning classifier (i.e., RF) to further our understanding of feature importance and feature selection.
. Feature importance and feature selection In Section 3.5, it is seen that temperature and humidity had a considerably high significance/contribution to the singing behavior of the crickets.Temperature affects the cricket's ability to express its behavior (including chirping), i.e., low temperature makes insects have few/stagnated behaviors and vice versa.The rate at which the crickets chirp is equally influenced by moisture levels.Higher humidity makes crickets chirp more.Overall, temperature and humidity serve as key environmental factors that shape the physiology, behavior, and ecological interactions of insects.In this research, perhaps the RF machine learning algorithm deciphered the influence of temperature and humidity on the songs produced by the crickets and ranked them as highly significant parameters.
The 12-tonal chroma variations of the cricket songs can be understood from the relationship between the octaves and the frequency.Gryllus bimaculatus produces calling songs within a frequency range of 4.7-5.7 kHz (Miyashita et al., 2016;Lankheet et al., 2017), while the courtship song is produced within a frequency range of 15-20 kHz (Miyashita et al., 2016).All these frequencies can be represented in the chroma octave as is in Table 6.Section 3.5 outlined that, C#, and D chroma tonal features were ranked as highly significant by the RF algorithm.Looking at the calling (4,700-5,700 Hz) and courtship (15,000-20,000 Hz) song in the frequencies Table 6, this study notes that they are (close to) equivalent to those of C# and D , and are indeed part of the, target (calling, courtship, and aggression) variables studied.Moreover, C# and D are close in the musical tonal scale, therefore, the cricket song tones are close.Hypothetically the lower-ranked tones, such as A# and B, are far apart in the music tone scale compared to the favorable tones (C# and D) and that could be the reason why they were ranked very low as shown in Table 6.Regarding the MFCC features, this research has not come across any study that explains how the features are organized and gives their respective meanings of the custom column header provided by this research.This study, therefore, was unable to give a scientific explanation behind the model's selection of mfcc11 and mfcc10 variables as highly ranked variables.
In Section 3.5, the F1-score and accuracy performance metrics of the RF algorithm evaluated with the 6 variables improved from 94.63% and 94.41% to 95.37% respectively, compared to what was illustrated in Table 4.This study notes that RF had consistent results of approximately 95% F1-score while the other algorithms did not.Meaning RF was stable and dependable compared to the others.Moreover, perhaps the less significant features added to the model iteratively, continuously decreased the performance of the RF algorithm.Therefore, 6 features gave the best performance of the model up to 95.63% F1-score and 95.37% accuracy.
. /frai. . .Deployment: the decision support system The pre-trained model discussed in the previous section was integrated at the back end of a web-based application as shown in Section 3.6.The main components, namely the Internet of Things device, the front end, and the back end are discussed below.

. . Internet of things (IoTs) device
The IoT device implemented in Section 3.6 was built on Raspberry Pi (RaspberryPi, 2023) to capture and record sound, temperature, and humidity using a microphone, temperature, and humidity sensors respectively.Despite the recording of the temperature and humidity every hour during experiments (stated in Section 2.1), in the deployed setup, the timestamp, temperature, humidity, and sound (in the form of a 3 seconds .wavfile) were captured after every 5 min.The timestamp, temperature, and humidity were recorded in a CSV file stored in the virtual machine.The name of the .wavfile was appended with the timestamp, temperature, and humidity values separated by a # symbol.A wireless fidelity (WIFI) module was integrated into the IoT.Through an internet connection, the files were then transferred to a virtual machine instance in icipe's virtual private network using the file transfer protocol (FTP).The IoT was also designed to store data on its local storage in cases where there is no access to the internet but synchronize the data to the cloud servers on the availability of the internet.Each IoT device transferred and updated a specific folder (whose name was the unique identification number of a farmer) in the virtual machine.Logically the folder contained data of a specific farmer and the back and front-end web scripts were tailored to read and process data from those folders based on the unique identification number assigned to the farmer.

. . Back end
The pre-trained model was saved as a .h5file and stored in a folder.The data (.wav sound clips) stored in the folders in the cloud servers were linked to each farmer's account.A function extracts the spectrogram features from the 1-second audio file and matches that with the respective temperature and humidity in the CSV metadata file (mentioned in Section 4.6.1).The spectrograms, temperature, and humidity are passed to the pre-trained model (loaded in Django memory) where each set of data goes through the pre-trained model pipeline for necessary preprocessing of the data and prediction.The predicted output is then passed to the front end and displayed on the farmer's dashboard; more details are discussed in the section below.

. . Front end
The web application (icipe, 2023b) consists of the home page that has a link to the frequently asked questions (FAQ) page and log-in interface.Details of these pages are discussed below: a) The Home Page: This page contains a narration of the overall function of the system.b) The FAQ Page: This page is intended to contain information about frequently raised questions about the functioning system and appropriate answers are provided.c) Log-in Page: At the login interface, the farmer is prompted to provide their username and password and log into the system.The login interface also contains functionality for resetting passwords.d) Personal Information, Dashboard, and Notification Pages: On successful login, users are taken to the overview page where they can (i) interpret the activities happening in their cricket farm on a dashboard, (ii) go to the account page to edit their personal information, and (iii) go to the notifications page and see the raised alerts; the alerts are similar to what the farmer receives on their registered email in real-time.To interpret the activities happening in the cricket farm in real-time (every 5 min), the front-end system asynchronously passes (by calling an application programming interface micro-service) the farmer's unique identification to the back end for processing and thereafter waits for feedback.The feedback, i.e., the temperature, humidity, and predicted label (calling, calling, or courtship) are displayed on a dashboard.The dashboard's rule-based results of temperature and humidity were interpreted with colors.Blue means low conditions, green means suitable conditions, and red means high conditions.Literature provided different optimal temperature and humidity ranges that crickets prefer.For instance, Busvine (1955) These findings offer initial insights, prompting farmers to adjust temperature and humidity through measures such as cooling/heating or de/humidifying the rearing area.Conversely, the machine learning insights delve deeper.For instance, negative alerts, such as aggression, prompt the farmer to immediately check the cricket-rearing area, and actions like chasing aggressors can be taken.Positive alerts like calling and courtship indicate good cricket health and growth.With this information, farmers can enhance cricket production as a viable human food source, thereby contributing to alleviating food insecurity.

Conclusion
An experimental setup was created to collect the humidity, temperature, video, and sound signals of the edible cricket, Gryllus bimaculatus.Chroma, Mel Frequency Cepstral Coefficient (MFCC), Linear Frequency Cepstral Coefficients (LFCC), chroma+MFCC, chroma+LFCC, MFCC+LFCC, and spectrograms were extracted from the sound signals.The numerical features were fused with the weather (temperature and humidity) variables and trained on the support vector machine, random forest, and knearest neighbors machine learning algorithms.The spectrogram features fused with temperature and humidity were trained on CNN (EfficientNetB4, VGG16, and ResNet152V2) deep-learning pre-trained models.Using machine learning, this study affirms that temperature and humidity highly influence the behavior (chirping) of crickets.Moreover, the frequencies associated with the ranked C# and D chroma features during calling and courtship were also identified.This shows that machine learning was able to identify natural processes associated with insect behavior.Furthermore, we deployed the deep-learning ResNet152V2 pretrained model at the back end of a web-based decision support system.The system collected data in real-time (every 5 min) and farmers were informed of the predicted (calling, aggression, or courtship) output on the dashboard and notified appropriately.Thereafter, a farmer can put in place appropriate measures (such as cooling, humidifying, getting rid of aggressors, etc.) to avoid the loss of the crickets and improve production.This decision support system can be fine-tuned (by considering more cricket phenomena labels, and wide temperature and humidity ranges) further and adopted by cricket farmers to improve the production of cricket as food for humans and contribute to alleviating food insecurity.
was recorded for 2 days before a different cricket pairing was replicated.

FIGURE
FIGUREThe pre-processing steps for the behavioral and weather data of the edible cricket Gryllus bimaculatus involved several stages, namely extracting cricket sound signals from the videos, removing noise from the extracted sound signals, annotating/labeling the sound signals, splitting sound signals, extracting features, class balancing, label encoding, and feature scaling.
library.b) Chroma: Chroma features represent audio signals in 12 tonal variations (C, C#, D, D#, E, F, F#, G, G#, A, A#, and B).The 12 pitches indicate the amount of energy in each pitch class present in the signal.

FIGUREA
FIGUREA summary of machine learning steps from splitting machine learning ready data ( ), training various classification algorithms ( ), testing and evaluating the model performance ( ), picking the best classifier algorithm evaluated on test data ( ), ranking features in order of their importance based on the best classifier ( ), selecting the number of features which best predict the cricket songs ( ), re-training the model with the selected features ( ), and deploying the pre-trained model ( ) as a .hfile.

FIGURE
FIGUREThe general architecture of the ResNet V , E cientNetB , and VGG pre-trained models.The numerical data (i.e., temperature and humidity) layer was injected and merged in the pre-trained deep learning model.

FIGURE
FIGUREAn illustration of how the cricket songs vary with di erent temperature and humidity values recorded.

FIGUREA
FIGUREA graphical summary of the variable's contribution to the performance of the RF classifier in descending order.The top variables were humidity, temperature, C#, mfcc , mfcc , and D variables.

FIGUREA
FIGUREA graphical representation of RF classifier performance from variables to variables increased incrementally, and evaluated with accuracy and F -score.
TABLE A summary of data collection dates and the cricket pairing over the period of data collection.

TABLE A description
of all the variables contained in the final metadata and used for exploratory data analysis.
TABLE A summary of the performance of the single and combined features trained on the various algorithms and evaluated on accuracy and F -score.
TABLE A summary of the performance of the CNN architectures.