A reliable approach for identifying acute lymphoblastic leukemia in microscopic imaging

Makem, Mimosette; Tamas, Levente; Bușoniu, Lucian

doi:10.3389/frai.2025.1620252

ORIGINAL RESEARCH article

Front. Artif. Intell., 17 July 2025

Sec. Medicine and Public Health

Volume 8 - 2025 | https://doi.org/10.3389/frai.2025.1620252

This article is part of the Research TopicArtificial Intelligence-based Multimodal Imaging and Multi-omics in Medical ResearchView all 6 articles

A reliable approach for identifying acute lymphoblastic leukemia in microscopic imaging

Mimosette Makem¹

Levente Tamas²^*

Lucian Bușoniu²

¹Signal, Image, and Systems Laboratory, Department of Medical and Biomedical Engineering, HTTTC EBOLOWA, University of Ebolowa, Ebolowa, Cameroon
²Department of Automation, Technical University of Cluj-Napoca, Cluj-Napoca, Romania

Leukemia is a deadly disease, and the patient’s recovery rate is very dependent on early diagnosis. However, its diagnosis under the microscope is tedious and time-consuming. The advancement of deep convolutional neural networks (CNNs) in image classification has enabled new techniques in automated disease detection systems. These systems serve as valuable support and secondary opinion resources for laboratory technicians and hematologists when diagnosing leukemia through microscopic examination. In this study, we deployed a pre-trained CNN model (MobileNet) that has a small size and low complexity, making it suitable for mobile applications and embedded systems. We used the L1 regularization method and a novel dataset balancing approach, which incorporates HSV color transformation, saturation elimination, Gaussian noise addition, and several established augmentation techniques, to prevent model overfitting. The proposed model attained an accuracy of 95.33% and an F1 score of 0.95 when evaluated on the held-out test set extracted from the C_NMC_2019 public dataset. We also evaluated the proposed model by adding zero-mean Gaussian noise to the test images. The experimental results indicate that the proposed model is both efficient and robust, even when subjected to additional Gaussian noise. The comparison of the proposed MobileNet_M model’s results with those of ALNet and various other existing models on the same dataset underscores its superior efficacy. The code is available for reproducing the experimental results at https://tamaslevente.github.io/ALLM/.

1 Introduction

Leukemia is a serious type of blood cancer marked by the uncontrolled and excessive generation of abnormal and immature white blood cells within the bone marrow. According to the Lymphoma and Leukemia Society (LLS, n.d.), between 2013 and 2017, leukemia was the sixth leading cause of cancer deaths in males and the seventh in females in the United States. In 2023, an estimated 13.900 males and 9.810 females may die from leukemia in the US (Blood Cancer CH, 2023). The 2020 report from the World Health Organization estimated 1,342 deaths in Romania alone (Leukemia in Romania, 2020). These statistics show the deadly nature of leukemia. Nevertheless, early diagnosis of this disease is helpful for the recovery of patients, particularly children (Bain et al., 2017). Consequently, early and accurate identification of leukemia is crucial to lowering its death rates.

Usually carried out by laboratory technicians, blood specimen analysis under a microscope is a vital and reasonably priced method among several leukemia diagnosis techniques (Das et al., 2021; Makem and Tiedeu, 2020; Paiva et al., 2018; Walker et al., 1994; Dorfman et al., 2018; Alexander and Mullighan, 2021). However, this procedure requires technicians to perform visual analysis and leucocyte classification, a task that is both labor-intensive and time-consuming. Researchers have developed several computational image analysis techniques to overcome these challenges and diagnose leukemia using blood smear pictures. Traditional computerised image analysis methods for leukemia diagnosis usually consist of preprocessing, segmentation, feature extraction, and classification (Mohapatra and Patra, 2010; Putzu et al., 2014; Rawat et al., 2015; Singhal and Singh, 2014; Rawat et al., 2017; El Houby, 2018; Mohammed and Abdulla, 2021; Bodzas et al., 2020; Mishra et al., 2019a; Mishra et al., 2019b; Abdulla, 2020; Bhattacharjee and Saini, 2016). Consequently, the efficacy of each phase is contingent upon the efficacy of the preceding stage. Deep learning architectures have proven to be more efficient and accurate for disease detection than traditional methods. They learn and extract complex features directly from the images without a previous segmentation step. A solitary deep learning model can execute both feature extraction and classification tasks in various domains (Eralp and Sefer, 2024). One shortcoming of such models is, however, that they require a large amount of data to yield good performance. We can address the lack of extensive datasets by implementing a transfer learning-based methodology (Das and Meher, 2021).

In this study, we applied we made adaptations to the MobileNet architecture (Howard et al., 2017) to identify the presence of acute lymphoblastic leukemia (ALL) in a collection of microscopic blood smear pictures. The Global Average Pooling, dropout layer, batch normalization layer, and dense layer are used to modify the MobileNet architecture to enhance its ability to differentiate between normal and leukemia blood cells. We applied L1 regularization to improve the model’s generalization. Also, a new dataset augmentation process was used, involving Gaussian noise and existing augmentation techniques. The following is a list of this study’s principal contributions:

1. The HSV color space is used, with the saturation removed and additional Gaussian noise;

2. The base MobileNet architecture performs compression to detect ALL;

3. Transfer of classification knowledge learnt on the ImageNet dataset to the acute lymphoblastic leukemia classification task;

4. Development of an efficient and robust model to small Gaussian noise;

5. Evaluation of the classification efficacy of the suggested acute lymphoblastic leukemia detection against contemporary methodologies.

We structure the remainder of the work into five sections. Section 2 presents related works, followed by the proposed technique in Section 3. Section 4 describes experimental validation and discussion. Section 5.2 delineates conclusions and future work.

2 Related work

2.1 Traditional methods

Over the years, researchers have developed traditional methods for diagnosing lymphoblastic leukemia. The authors focused mainly on the classification of normal or healthy leukocytes from abnormal or lymphoblast leukocytes. Mohapatra and Patra (2010) suggested a framework for the identification of acute leukemia. This approach begins by applying a selective filter to the blood smear picture and transform the resultant image into the L*a*b color space (Putzu et al., 2014). The K-means algorithm is subsequently employed on the transformed image to isolate the white blood cell nucleus from the other elements. Each nucleus of a white blood cell was segmented into a sub-image. Subsequently, features related to forms and textures was retrieved from the nucleus. The SVM classifier categorizes nucleus pictures as healthy or leukemic based on the retrieved attributes. The algorithm was evaluated by considering 108 blood smear images collected at Ispat General Hospital in Rourkela, Odisha, and at the University of Virginia. The authors reported a lymphoblast detection accuracy of 95%.

Putzu et al. (2014) also created a method for identifying and categorizing white blood cells in pictures of blood smears. The original RGB image is transformed into grayscale and CMYK color spaces, followed by the application of enhancement techniques, including histogram equalization and linear contrast stretching, to increase the image quality. The Zack algorithm is employed for thresholding to segment white blood cells. The morphological opening operator was employed to eliminate the residual undesirable objects. Ultimately, 30 morphological variables, four chromatic features, and 16 textural features were retrieved from the nucleus and cytoplasm regions to categorize cells as normal or aberrant via an SVM classifier and cross-validation. Putzu et al. (2014) suggested an approach that achieved an accuracy of 93% for 33 images from the ALL-IDB1 database (Mondal et al., 2021) acquired under the same conditions. The main limitation of this approach is that images acquired with different cameras and different lighting conditions are not considered.

Rawat et al. (2015) suggested an approach to differentiate lymphoblastic cells from healthy lymphocytes. Their method involved initially segmenting the leukocytes from other blood cells, followed by the separation of the isolated leukocytes into their nucleus and cytoplasm components. Subsequently, distinct texture characteristics from the grayscale co-occurrence matrix and shape features are retrieved from the nucleus and cytoplasm areas, respectively. The collected features were identified using a binary Support Vector Machine (SVM) to identify the presence of lymphoblast cells (leukemic cells). The method of Rawat et al. (2015) was tested by considering 196 images of the ALL-IDB2 database and a classification accuracy of 89.8% was obtained. That method is therefore limited in terms of detection accuracy.

Singhal and Singh (2014) conducted a comparative investigation of the performance of two types of texture feature descriptors. In their study, the original RGB image is initially transformed into the HSI color space, followed by the extraction of the white blood cell nucleus with the use of a manual threshold on the S component. Two categories of texture feature descriptors, specifically the local binary pattern (LBP) and the grayscale co-occurrence matrix are used for feature extraction, followed by the polynomial kernel SVM classifier for classification. The ALL-IDB2 images database was considered to evaluate the performance of this algorithm. The classification accuracy of the features that were extracted using the LBP was 93.84%. A classification accuracy of 87.30% was attained utilizing the GLCM features. They evaluated the performance of this algorithm using the ALL-IDB2 images database. Although giving an accuracy of detection in the order of 93%, this classification system is unsuitable for computer-aided diagnosis of leukemia due to the manual determination of the threshold for leukocyte nucleus extraction.

Rawat et al. (2017) proposed a hybrid hierarchical diagnostic support system that analyzes the blood smear image for rapid detection of acute lymphoblastic leukemia. This system not only distinguishes between healthy leukocyte cells and leukemia cells, but also categorizes ALL cells into subtypes. The leukocyte nucleus and cytoplasmic components are extracted from the original smear image in this system. The properties of texture, color, and form of the segmented nucleus and cytoplasm are retrieved for the classification of lymphoblasts. A PCA-based dimensionality reduction module is employed to decrease the size of the retrieved features. The classifiers SVM, KNN, PNN, ANFIS, and SSVM are employed in a hierarchical sequence for classification purposes. The computer aided diagnosis system of Rawat et al. (2017) evaluated with 260 images of ALL-IDB was able to classify cancerous leukocytes from healthy leukocytes while classifying cancerous leukocytes into subtypes according to the FAB classification with an average accuracy of 97.6%. This algorithm is powerful, but extremely slow to execute due to the number of components that need to be executed.

El Houby (2018) focused their interest on identifying healthy leukocytes from lymphoblast leukocytes. In their method, shape, texture, and color attributes were collected from nuclear regions, cytoplasm, and whole leukocytes respectively, previously segmented from other cells present in blood smear images and cropped into sub-images. Ant colony optimization (ACO) was employed to choose characteristics derived from the segmented cellular components to enhance classification performance. Classification was performed using decision tree (DT), K-nearest neighbor (K-NN), Naïve Bayes (NB), and support vector machine (SVM) algorithms. Their proposed classification system evaluated with 260 images of the ALL-IDB2 database showed classification accuracy of 96.25%, achieved with the DT classifier. The main shortcoming of this classification system is that it was tested with a small number of images.

Mohammed and Abdulla (2021) focused on the development of an automatic system to assist in the identification of ALL cells. The suggested methodology comprises two phases. The first phase emphasizes the segmentation of leukocytes. The second stage recovers characteristics such as shape, geometry, statistics, and discrete cosine transform from the segmented cells. They apply KNN, SVM, and NB to the retrieved characteristics to classify the segmented cells as either normal or pathological. This method’s efficacy was assessed using the ALL-IDB2 blood smear image database. The experimental results achieved a maximum accuracy of 97.45% with the SVM. Although performing well, this algorithm should be evaluated by considering a different image set for validation.

Bodzas et al. (2020) created an algorithm for leukemia identification, employing a three-tiered filtering procedure for the segmentation of the nucleus and cytoplasm of white blood cells. Additionally, the algorithm considered the detection and separation of agglomerated white blood cells. Sixteen shape and texture parameters were retrieved from the segmented regions to enable the classifier to differentiate between normal and diseased cells. The employed classifier was Support Vector Machine (SVM). The University Hospital in Ostrava’s Department of Haemato-Oncology supplied a private image collection for training and testing the system created by Bodzas et al. (2020). The image set consisted of 33 images, 18 of which were acquired from healthy people and 13 from people with ALL. According to the authors, their system achieved a classification accuracy of 96.72%. This algorithm requires an expansion of the database used for testing and training.

Looking at the traditional computer-vision methods presented so far, that they are mainly focused on segmentation and the extraction of specific features. However, nucleus and cytoplasm identification and extraction are a challenging task due to the contrast variations of different types of leucocyte, which leads to a low accuracy of the expected result. Also, it not trivial to extract a suitable feature set able to distinguish health leucocytes from cancer leucocytes.

2.2 Deep learning methods

Recent proposals have emerged for the detection and categorization of ALL using deep learning approaches (Shah et al., 2021; Deshpande et al., 2020; Loey et al., 2020) due to the advancements in artificial intelligence and extensive data analysis. Vogado et al. (2018) used pretrained CNN models (AlexNet, VGG_f, and CaffeNet) for feature extraction and employed SVM for classification in an ALL detection system. The hybrid dataset, which included ALL-IDB, CellaVision, and leukocyte mixed databases, yielded an average accuracy of 99.20%. In order to identify white blood cells in blood smear images and categorize them as either leukemia or healthy, Di Ruberto et al. (2020) presented an algorithm. The white blood cell detection step employed the H and S color components, Otsu’s thresholding method, a new object detection technique, and the watershed method. The AlexNet convolutional network was used to extract 3 different feature vectors of white blood cells previously segmented and cropped into a sub-image. The retrieved feature vectors are classified using three distinct linear SVMs, and their results are amalgamated through a voting process. The classification system was tested on 33 images of the ALL-IDB2 database and an accuracy of 94.1% was obtained. This classification system is limited because it was built and assessed based on smear images captured with the identical camera and under uniform illumination circumstances.

Shafique and Tehsin (2018) employed the pre-trained deep convolutional neural network AlexNet to identify LLA and categorize it into its subtypes (L1, L2, and L3). This algorithm involved modifying the architecture of the AlexNet network by substituting the final three layers of the pre-trained model with a new fully connected layer containing 1,024 neurones, succeeded by a ReLU layer and an additional fully connected layer, wherein all units were interconnected to the output probabilities of two classes via the softmax function. For the classification of the LLA into subtypes, the last fully connected layer with 2 output probability classes was changed to 4 output probability classes. The ALL-IDB2 database, comprising 260 smear images, was utilized for the assessment of their algorithm. The outcomes attained an average categorization accuracy of 96.06%. In their study, the removal of noise in the original image was omitted, yet the presence of noise results in erroneous features and therefore impacts on the performance of the classification system. On the other hand, a limited number of images was considered for training and testing, yet this has a negative effect on the learning of the deep network.

Claro et al. (2020) developed two deep convolutional neural networks (DCNs) designated Alert Net-R and Alert Net-X to diagnose myeloid and acute lymphoblastic leukemia. The Alert Net-R network was designed by inserting residual structures similar to those of the ResNet into the original Alert Net architecture. The Alert Net-X network was built using the technology implemented in Xception by Nvidia. In addition, a data augmentation procedure was employed to enhance the volume of the training dataset. The designed RNC has been trained and tested with 16 image databases including 2,415 images, and an overall accuracy of classification equal to 97.23% has been obtained. This accuracy is good, considering that it is obtained by considering a heterogeneous image database. However, its implementation requires high resources.

Hegde et al. (2019) assessed the impact of characteristics derived from conventional image processing and the pre-trained CNN AlexNet on neural network classifier. In their study, the neural network classifier was used to classify abnormal and normal WBC, also to classify normal WBC in their sub-types. Eralp and Sefer (2024), however, suggested a hybrid transfer learning approach in which ResNet18 and MobilenetV2 are hybridized based on their proposed weight factor. The ALL-IDB1 and ALL-IDB2 was considered to evaluate the performance of the methods. According to the authors, when the dataset is divided into 50% training and 50% testing, the computer-aided system’s performance is limited.

Nabilah et al. (2020) proposed a comparison analysis of three pre-trained convolutional neural networks, specifically VGG, GoogleNet, and AlexNet, for the identification of acute lymphoblastic leukemia. That study showcased VGG as the best architecture based on its testing and training accuracy. The main limitation is that only a few image samples were considered in this study. In contrast, Mondal et al. (2021) developed a weighted ensemble classifier for ALL detection by applying transfer learning to pre-trained CNNs including VGG-16, Xception, MobileNet, InceptionResNet-V2, and DenseNet-121. The dataset C-NMC-2019 was utilized to train and evaluate the constructed model, yielding an average accuracy of 86.2%.

Ullah et al. (2021) adopted the efficient channel attention module to enhance the VGG-16 architecture’s ALL detection. The developed model was assessed using the C-NMC-2019 dataset and attained an accuracy of 91.1%, indicating a need for enhancement. Jawahar et al. (2022) introduced a CNN model called ALNett, which is founded on a depth-wise convolutional architecture. On the training folder of the C-NMC-2019 dataset, the ALNett model achieved an F1_score of 0.96 and a classification accuracy of 91.31%. Magpantay et al. (2022) developed a transfer learning approach utilizing Yolov3 for the classification of ALL cells and normal cells. Only 300 images selected from the C-NMC-2019 dataset were considered in this study, and the model achieved a training accuracy value of 97.2% and an mAP value of 99.8% on testing images. This method considered few images of a large dataset for the training and testing the models. However, an effective and resilient leukemia diagnosis system must provide results for a substantial volume of smear images, including those from alternative leukocyte datasets.

Yolov5 was applied for detection and count of blood cell in Rohaziat et al. (2022). Priyanka et al. (Liu and Hu, 2022) proposed a model named LeuFeatx, an adapted, fine-tuned feature extractor model based on VGG16. LeuFeatx demonstrated promising performance both in the leukemia subgroup classification and the binary classification. The ALL-IDB2 dataset was utilized for binary classification, yielding an accuracy of 96.15%. Abhishek et al. (2025) proposed the fuzzification of pretrained convolutional neural networks with the Gompertz function; the developed methodology categorized blood smear pictures into five classifications: AML, CML, ALL, CALL, and normal. Similarly, VGG16 and XceptionNet models were combined for classification of four type of diabetic eye disease (Hasan et al., 2025). Election-Base Chameleon Swarn algorithm was used on multiscale adaptive and attention-base DCNN method for leukemia detection (Gokulkannan et al., 2024).

Existing methods for leukemia diagnosis based on transfer learning have not considered the complexity of the pre-trained CNN, including the parameter count. This leads to a resource-intensive model that is unsuitable for embedded systems and computers with limited performance. To address this limitation our proposed models are based on MobileNet (Howard et al., 2017), which has a smaller size and complexity models, and is therefore suitable for mobile applications and embedded systems. Conversely, MobileNet has superior classification accuracy compared to alternative lightweight approaches.

3 Proposed methodology

The overarching schematic of the suggested methodology is illustrated in Figure 1. The proposed methodology has two stages. The first stage includes dataset processing, modification, and augmentation. The second stage performs transfer learning classification using MobileNet. The sequel will provide explanations for all the components in the figure.

Figure 1

Flowchart illustrating a deep learning process. It begins with a database, which undergoes preprocessing and splitting into training, validation, and test sets. The training set goes through modification and augmentation. A MobileNet architecture is modified and combined with ImageNet weights to create MobileNet_M. This model is validated and results in a trained model for prediction. Predictions classify into

Figure 1. Proposed methodology of leukemia diagnosis.

3.1 Dataset description

The pictures used in this paper are from the public ISBI 2019 database available on Kaggle (Gupta and Gupta, 2019). The ISBI 2019 database contains blood smears with resolution of 450 × 450 × 3 pixels, designed to distinguish leukemic B lymphoblast cells (ALL) from normal B lymphoid precursors (HEM). The training, preliminary, and final test sets make up this database. The training set comprises 10,661 white blood cell images, categorized into 7,272 leukemic images (ALL) and 3,389 healthy images (Hem). These images are from 47 leukemia patients and 26 healthy patients. We divide the training set into three separate folders. The preliminary test set contains 1,219 ALL and 648 HEM images extracted from the blood smears of 13 leukemia patients and 15 healthy patients, respectively. The final test set included 2,586 white blood cell pictures from nine patients with acute lymphoblastic leukemia and eight healthy individuals.

Figure 2 illustrates the preprocessing steps and modifications applied to the database, which we discuss next:

Figure 2

Flowchart of image processing steps. Images are cropped and resized, then split into train (70), validation (10), and test (20) sets. Gaussian noise is added with sigma values of 3 and 9. Further processing includes HSV conversion and saturation removal, resulting in HEM and ALL datasets. Final outputs include image augmentation.

Figure 2. Preprocessing steps of the proposed approach.

3.2 Modification of dataset

In the study, the three separate folds of the ISBI 2019 database training set, which are individually stratified into HEM and ALL folds, were independently preprocessed and divided using percentages [70, 10, 20%] into training, then validation and test sets. Table 1 displays the distribution of each fold. All images in the dataset underwent cropping and scaling adjustments. We executed the cropping from the center of the original image to preserve the entire white blood cell. We resized the resulting image to 224*224*3 to meet MobileNet’s input requirements. Table 1 reveals an imbalanced class problem in the ISBI 2019 database. Indeed, the ratio of the total number of ALL pictures to the number of HEM pictures is consistently 2 across all folders in the training set. According to Mayouf et al. (2020), a dataset is qualified as slightly unbalanced when this ratio is in the range [1.5, 3]. Class imbalance in a database affects the model generalization in favor of the majority class (Johnson and Khoshgoftaar, 2019; Kaope and Pristyanto, 2023). There are two ways to rebalance data sets, namely (Orriols-Puig and Bernadó-Mansilla, 2009): one class is oversampled, while the majority class is undersampled. This study took into account the oversampling technique to lessen the disparity between the ALL and HEM classes in the training set for the three folders. This procedure was selected because, unlike the undersampling strategy, it does not diminish the size of the data collection. The oversampling process was based on converting each image of the HEM class of the train set into the HSV color space; then the saturation was removed from the resulting image. These two operations doubled the number of images in the HEM class, as shown in Table 2, without any duplication of information.

Table 1

Table 1. Train, test, and validation distribution over the different batches denoted as folds.

Table 2

Table 2. Images in the HEM class before and after class balance.

3.3 Data augmentation

Prior research has demonstrated the beneficial effects of image augmentation on enhancing the characteristics and diversity of a dataset. We therefore apply the following augmentation procedures to our data. Gaussian noise with a mean of zero and standard deviations of 3 and 9 was injected in equal proportions in the training dataset for augmentation. This step aimed to build a model that is insensitive to Gaussian noise. The sigma values were selected experimentally. Other augmentation steps, such as rotation of 45 degrees, horizontal flip, brightness range [0.4, 0.8], width shift range (0.1), height shift range (0.1), zoom range [0.8, 1], shear range (3), and rescale (1/255), were applied to generate new images during the training phase. Pixel values of all the images of our dataset were scaled to the range [0, 1]. Figures 3, 4 illustrate some image samples of the ISBI 2019 database and the augmentation.

Figure 3

Eight panels labeled (a) to (h) show progressively zoomed-in images of a purple, round, textured object against a black background. The object appears larger and more detailed in subsequent panels.

Figure 3. (a) Example of ALL image; (b) cropped and resized ALL image; (c) image with Gaussian noise using sigma = 3; (d) image with Gaussian noise using sigma = 9; (e–h) augmented images.

Figure 4

A series of ten visual panels labeled from (a) to (c) and (e) to (k), displaying various views a white blood cell nucleus. Panels (a), (b), (e), (f), and (h) show a purple-stained structure on a black background, while panels (c), (g), (i), and (k) depict the same structure in grayscale. Panel (j) combines purple and grayscale effects. Each panel presents a slightly different perspective or enhancement, emphasizing structural details.

Figure 4. (a) Example of HEM picture; (b) cropped and resized picture; (c) HSV picture with saturation removed; (e) picture with Gaussian noise using sigma = 3; (f) picture with Gaussian noise using sigma = 9; (g–k) augmented pictures.

3.4 MobileNet

The transfer learning technique is based on choosing a pre-trained model and fine-tuning it to solve a new classification problem. The main benefits of transfer learning are sharing knowledge and saving training time and resources (Sarkar et al., 2018). Due to its foundation in depthwise separable convolution, MobileNet was selected as the pre-trained convolutional neural network for classification, so it is extremely efficient and has a low computational cost compared with other standard convolution-based models.

MobileNet comprises convolutions, depthwise separable convolutions, batch normalization, ReLU activations, and fully connected layers (Ashwinkumar et al., 2021). Among these layers, depthwise separable convolution serves as the fundamental layer of the MobileNet model, minimizing both the number of parameters and computing expense by decomposing a normal convolution into a depthwise convolution. In Table 3, the initial MobileNet design is displayed. The batch normalization and ReLU activations follow each layer of the architecture, as illustrated in Figure 5. The MobileNet depthwise separable convolutional layer is broken into 3×3 depthwise convolution filters and 1×1 pointwise convolution. A single 3×3 depthwise convolution filter is applied to each image channel, followed by a 1×1 pointwise convolution to generate a linear combination of the output as per (Howard et al., 2017). The Mathematical expression of depthwise convolution with one filter per input channel can be defined as shown Equation 1:

\begin{array}{l} \hat{G} = \sum_{i, j} {\hat{K}}_{i, j, m} F_{k + i - 1, l + j - 1, m} & (1) \end{array}

where k and l are locations in the $m_{th}$ feature map, $\hat{K}$ represents the depthwise convolutional kernel of size $D_{k} \times D_{k} \times M$ , $\hat{G}$ is the output feature map obtained when the $m_{th}$ filter in $\hat{K}$ is applied to the $m_{th}$ channel in $F$ .

Table 3

Table 3. Architecture of MobileNet.

Figure 5

Diagram comparing two convolutional neural network architectures. (a) Shows a sequence: 3x3 Depthwise Convolution, Batch Normalization (BN), ReLU, 1x1 Convolution, BN, ReLU. (b) Shows: 3x3 Convolution, BN, ReLU.

Figure 5. (a) Depthwise separable convolutions consisting of depthwise and pointwise layers, succeeded by batch normalization and ReLU; (b) standard convolutional layer accompanied by batch normalization and ReLU.

3.5 MobileNet_M: extension of MobileNet

In the proposed work, we replaced the MobileNet average pooling, fully connected, and softmax layers with a set of layers appropriate for our classification problem. The obtained model was called MobileNet_M. Table 4 illustrates the structure of layers used to solve our classification problem. In this structure, the dropout layer is used to randomly deactivate a fraction of the neurons to avoid overfitting, and the Batch Normalization (BN) is employed to stabilize the distribution of the dataset during training, hence expediting model. Mathematically, we can define the BN for convolutional neural networks using Equation 2 (Bjorck et al., 2018):

\begin{array}{l} O_{b, c, i, j} = \frac{I_{b, c, i, j - μ_{c}}}{\sqrt{σ_{c}^{2} + ε}} + β_{c} \forall b, c, i, j and μ_{c} = \frac{1}{∣ B ∣} \sum_{b, i, j}^{\sum} I_{b, c, i, j} & (2) \end{array}

where i and j are the spatial location of the c channel feature map $O_{b, c, i, j}$ and $I_{b, c, i, j}$ are the BN’s output and input, respectively. Notation $μ_{c}$ represent the means activation of all images in the batch $b$ and $σ_{c}$ the standard deviation. B encompasses all activations within channel c across every feature b in the complete mini-batch and all spatial locations i,j. Finally, $γ_{c}$ and $β_{c}$ are parameters for channel-wise affine transformation.

Table 4

Table 4. The suggested MobileNet_M’s architecture.

The dense output layer contains two neurons and a SoftMax activation function that gives an output probability for each neuron. The SoftMax function uses a logistic transformation to map the vector of raw outputs from the neural network (z-scores) into probabilities p∈ [0, 1] as defined in Equation 3:

\begin{array}{l} softmax (z) = \frac{exp (z_{i})}{\sum_{j = 1}^{2} exp (z_{j})} & (3) \end{array}

3.6 Tunable hyper-parameters of MobileNet_M

This project involved training MobileNet_M using a substantial dataset of images. We initialized all MobileNet layers for transfer learning using the pre-trained MobileNet model from the ImageNet dataset. We initialized the last dense layer with two units using a uniform distribution. We added L1 weight regularization to the dense layer to enhance the model’s generalization, using a regularization parameter of 0.015. During the training phase, we set the starting learning rate at 0.001, and applied a reduction if we observed no improvement in validation loss over 5 epochs. We set the reduction factor to 0.1 and set the minimum learning to 0.000001. We implemented early halting when the model’s validation loss stopped decreasing after 20 epochs, indicating that the model has stopped learning meaningfully. All hyperparameters utilized throughout the training phase are presented in Table 5.

Table 5

Table 5. Training hyper parameters of the proposed model.

3.7 Evaluation metrics

We assessed the efficacy of the suggested model using criteria such as the confusion matrix, accuracy, precision, recall, F1 score, and AUC. The confusion matrix has the advantage of quickly determining the effectiveness of a classification system. The classification improves as the confusion matrix approaches a diagonal matrix. Accuracy is the proportion of correctly diagnosed leukemia cells (true positives) and healthy cells (true negatives) relative to the total number of cells. Recall is the proportion of true positives identified as opposed to those overlooked. The AUC is the two-dimensional area beneath the complete receiver operating characteristic (ROC) curve. Equations 4–7 provide the mathematical definitions for accuracy, precision, recall, and F1 score:

\begin{array}{l} Accuracy = \frac{TP + TN}{TP + TN + FP + FN} & (4) \end{array}

\begin{array}{l} Recall = \frac{TP}{TP + FN} & (5) \end{array}

\begin{array}{l} Precision = \frac{TP}{TP + FP} & (6) \end{array}

\begin{array}{l} F 1_score = \frac{2 \times Precision \times Re call}{Precision + Re call} & (7) \end{array}

TP, TN, FP, and FN denote true positives, true negatives, false positives, and false negatives, respectively. True positives (TP) refer to instances where the model accurately identifies leukemia. False positives occur when the model erroneously predicts the HEM class as the ALL class. False negatives (FN) occur when the model erroneously predicts the ALL class as the HEM class.

4 Experimental validation and discussion

4.1 Experimental setup

A server with two Intel Xeon Gold 6226R CPUs, four Nvidia A100 40GB GPUs, 756 GB of RAM and Ubuntu 20.04 from which a single card was effectively used for training was used to implement the proposed work. Python 3.9.16 was used with the tensorflow_gpu 2.4.1, keras 2.10, scikit_learn 1.2.2, numpy 1.23.4, and matplotlib 3.7.1 packages. Imgeio 2.30 and imgaug 0.4.0 libraries were employed for modification and Gaussian noise addition to the dataset. Gaussian noise with a mean of zero and sigma values of 2, 5, 6, and 10 was introduced to the test sample where the noise value is different per pixel and per channel (a different value for the red, green and blue channels of the same pixel). The aim was to evaluate our model’s sensitivity to Gaussian noise.

4.2 Performance analysis: training step

The proposed MobileNet_M model was trained and tested using the three folders of data to assert its efficiency in leukemia detection. As presented in the previous section, MobileNet_M is based on depthwise separable convolution; hence, the L1 weight regulation method was applied to its last dense layer to avoid overfitting. The model had 3,211,074 trainable parameters, which were trained with early stopping and validation loss as the monitoring parameter. Figure 6 depicts the loss and validation accuracy curves for each Fold of the training and validation sets. The graphic indicates that the validation loss begins at a high level and ultimately converges with the training loss across all Folds. We observe a similar trend in the 16th, 23th, and 27th epochs for Folds 0, 1, and 2, respectively. The training and validation accuracy starts with a low value, then increases progressively, and after a few epochs, no further significant improvement is observed. Also, in Figure 6, we can see that the proposed model converges fast: early stopping happens after 30 training epochs for Fold 0, after 60 epochs for Fold 1, and close to 40 epochs for Fold 2. Fold 0 achieves the best and fastest convergence. The average training accuracy was, respectively, equal to 95.83% for Fold 0, 96.60% for Fold 1, and 94.24% for Fold 2.

Figure 6

Graphs depicting accuracy and loss across three folds for training and validation sets. Each row displays two graphs: accuracy (left) shows a steady increase, while loss (right) shows a decrease over epochs. Fold 0 has 30 epochs, Fold 1 has 60 epochs, while Fold 2 has 35 epochs. Training performs better than validation in all cases.

Figure 6. Training and validation curves.

4.3 Performance analysis: prediction phase

We introduced Gaussian noise to the test sample, setting the sigma values at 2, 5, 6, and 10. The aim was to evaluate our model’s sensitivity to Gaussian noise. Figure 7 shows a sample test image with Gaussian noise. We computed the evaluation metrics for both noisy and clean datasets. Tables 6–8 present the average obtained result. From Table 6, we notice that for Fold 0, the proposed MobileNet_M models achieved the same accuracy value of 96% on both the clean and noisy test images for sigma values of 2 and 8. All these results illustrate the efficacy and resilience of the proposed model in accommodating fluctuations in the dataset (Folds) and additive Gaussian noise. The prediction time per image was 4.92 ms on a personal computer and 44.68 ms in the remote runtime Google Colab.

Figure 7

Three panels comparing images of a circular shape. The first panel, labeled

Figure 7. Gaussian noise with sigma = 10 on test sample image.

Table 6

Table 6. Test classification performance of noisy and clean images in Fold 0.

Table 7

Table 7. Test classification performance of noisy and clean images in Fold 1.

Table 8

Table 8. Test classification performance of noisy and clean images in Fold 2.

To assess the impact of class imbalance on the efficiency of the proposed MobileNet_M in this study, we compared the performance of MobileNet_M on imbalanced datasets without data augmentation and after correction. Table 9 shows the comparison. The HEM and ALL accuracy results show that class imbalance affects the generalization of MobileNet_M in favor of the ALL class.

Table 9

Table 9. Test classification performance of MobileNet_M train with imbalance class dataset without data augmentation and MobileNet_M after balancing the dataset.

4.4 Comparison with existing models on ALL

The proposed model was first of all compared to the ALNett model based on confusion matrix and accuracy. The test dataset containing clean images was used for this purpose. ALNett (Jawahar et al., 2022) is a newly established deep convolutional neural network designed for the classification of acute lymphoblastic leukemia. On the given data set, it has shown the highest F1_score and accuracy compared to the ResNet-50, AlexNet, VGG16, and GoogleNet transfer learning models. The accuracy of the developed models compared to ALNett models is shown in Table 10. According to the table, the suggested model’s average accuracy is higher than ALNett’s for each of the three folds. Figure 8 further elucidates this outcome through the confusion matrix, demonstrating that the proposed model surpasses the ALNett model in leukemia detection. For instance, in Fold 0, MobileNet_M identified 680 photos as true positives (TP) and true negatives (TN) with a classification accuracy of 96%, while ALNett classified 650 images as TP and TN with an accuracy of 92.70%.

Table 10

Table 10. Accuracy of prediction of leukemia by the proposed and ALNett.

Figure 8

Comparison of confusion matrices for a proposed model versus ALNett, 2022 across three folders. Each matrix shows true positives, false positives, false negatives, and true negatives for

Figure 8. Predicting confusion matrix of the proposed and ALNett models on clean test dataset.

Considering the three folds of the dataset, we also compared the average performance (recall, precision, accuracy and F1 score) of the proposed MobileNet_M with transfer learning models such as GoogleNet, ResNet-50, AlexNet and VGG16 reported in Jawahar et al. (2022), as shown in Figure 9. The proposed model obtained the highest recall, precision, accuracy and F1 score values. This comparison reveals the effectiveness of the proposed MobileNet_M.

Figure 9

Bar chart comparing recall, precision, accuracy, and F1-score for various transfer learning models: AlexNet, VGG16, GoogleNet, ResNet-50, and Proposed MobileNet_M. Proposed MobileNet_M shows the highest scores in all metrics, while GoogleNet has the lowest.

Figure 9. Comparison of common transfer learning models with the proposed MobileNet_M.

In other hand, Table 11 presents a comparison of the proposed model with various approaches evaluated on the ISBI 2019 database in recent years. Mathury et al. (2020) achieved 91% of F1_score by overcoming model generalization through the application of local spatial attention blocks learning, pointwise attention convolution layers, and Rademacher Mixup. YOLOv4 was implemented in Khandekar et al. (2021) for cell detection and leukemia classification and achieved F1_score value of 92%. However, as illustrated in Tables 6–8 the proposed MobileNet_M is more robust against additive Gaussian noise achieving better results than the MMA-MTL or YOLO4. This may be due to the proposed balanced database method with specific augmentation process. The results in Tables 10, 11 indicate that the suggested model outperformed contemporary state-of-the-art approaches, achieving an impressive average accuracy of 95.33% and an average F1 score of 95%. The L1 regularization method, the learning rate range, and early stopping callback were key parameters to obtain this performance.

Table 11

Table 11. F1_score comparison of proposed model with existing models.

5 Conclusion

This paper proposes a computationally efficient and high-performing model that is resilient to Gaussian noise for the classification of acute lymphoblastic leukemia using microscopic pictures. Thus, the discrimination between these cells is a very challenging task. The pre-trained MobileNet architecture was modified and fine-tuned to address this classification challenge. A new augmentation procedure was proposed both to avoid over-fitting and to build an efficient model. The MobileNet_M model was trained and evaluated using the C_NMC_2019 dataset (Mourya et al., 2019). This study achieved an overall test accuracy of 95.33% and an F1 score of 0.95. The suggested model’s effectiveness and robustness were demonstrated by the introduction of additional Gaussian noise to the test images. The proposed MobileNet_M model yields a better average performance compared to ALNet and several other competitive models.

Based on the results obtained, our suggested model is useful as a guide and second-opinion tool for laboratory technicians and hematologists in the diagnosis of acute lymphoblastic leukemia under a microscope. In the forthcoming period the proposed MobileNet_M model will be deployed on an embedded system or Android phone to build cost-effective devices for computer-assisted diagnosis of leukemia.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author.

Author contributions

MM: Writing – original draft, Software, Visualization, Investigation. LT: Resources, Methodology, Supervision, Investigation, Writing – review & editing. LB: Formal analysis, Project administration, Funding acquisition, Writing – review & editing.

Funding

The author(s) declare that financial support was received for the research and/or publication of this article. The paper was supported by the project “Romanian Hub for Artificial Intelligence-HRIA,” Smart Growth, Digitization and Financial Instruments Program, MySMIS no. 334906, and the travel and stay of Mimosette Makem was financed by AUF.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Generative AI statement

The author(s) declare that no Gen AI was used in the creation of this manuscript.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

References

Abdulla, A. A. (2020). Efficient computer-aided diagnosis technique for leukaemia cancer detection. IET Image Process. 14, 4435–4440. doi: 10.1049/iet-ipr.2020.0978

Crossref Full Text | Google Scholar

Abhishek, A., Deb, S. D., Jha, R. K., Sinha, R., and Jha, K. (2025). Ensemble learning using Gompertz function for leukemia classification. Biomed. Signal Process. Control 100:106925. doi: 10.1016/j.bspc.2024.106925

Crossref Full Text | Google Scholar

Alexander, T. B., and Mullighan, C. G. (2021). Molecular biology of childhood leukemia. Annu. Rev. Cancer Biol. 5, 95–117. doi: 10.1146/annurev-cancerbio-043020-110055

Crossref Full Text | Google Scholar

Ashwinkumar, S., Rajagopal, S., Manimaran, V., and Jegajothi, B. (2021). Automated plant leaf disease detection and classification using optimal MobileNet based convolutional neural networks. Mater. Today Proc. 51, 480–487. doi: 10.1016/j.matpr.2021.05.584

Crossref Full Text | Google Scholar

Bain, B. J., Bates, I., and Laffan, M. A. (2017). Dacie and Lewis pratical haemtology. twelfth Edn. Netherland: Elsevier Ltd.

Google Scholar

Bhattacharjee, R., and Saini, L. M. (2016). Robust technique for the detection of acute lymphoblastic leukemia. 2015 IEEE power, commun. inf. technol. conf. PCITC 2015- proc 2015, 657–662. doi: 10.1109/PCITC.2015.7438079

Crossref Full Text | Google Scholar

Bjorck, J., Gomes, C., Selman, B., and Weinberger, K. Q. (2018). Understanding batch normalization. Adv. Neural Inf. Proces. Syst. 2018, 7694–7705. doi: 10.48550/arXiv.1806.02375

Crossref Full Text | Google Scholar

Blood Cancer CH. (2023). Leukemia and lymphoma society. Available online at: https://www.lls.org/facts-and-statistics/facts-and-statistics-overview#Leukemia (accessed September 6, 2023).

Google Scholar

Bodzas, A., Kodytek, P., and Zidek, J. (2020). Automated detection of acute lymphoblastic leukemia from microscopic images based on human visual perception. Front. Bioeng. Biotechnol. 8, 1–13. doi: 10.3389/fbioe.2020.01005

Crossref Full Text | Google Scholar

Claro, M., Vogado, L., Veras, R., Santana, A., Tavares, J., Santos, J., et al. (2020). Convolution neural network models for acute leukemia diagnosis. Int. Conf. Syst. Signals, Image Process., 63–68. doi: 10.1109/IWSSIP48289.2020.9145406

Crossref Full Text | Google Scholar

Das, P. K., and Meher, S. (2021). An efficient deep convolutional neural network based detection and classification of acute lymphoblastic leukemia. Expert Syst. Appl. 183:115311. doi: 10.1016/j.eswa.2021.115311

Crossref Full Text | Google Scholar

Das, P. K., Meher, S., Panda, R., and Abraham, A. (2021). An efficient blood-cell segmentation for the detection of hematological disorders. IEEE Trans. Cybern., 52, 10615–10626. doi: 10.1109/TCYB.2021.3062152

Crossref Full Text | Google Scholar

Deshpande, N. M., Gite, S. S., and Aluvalu, R. (2020). A brief bibliometric survey of leukemia detection by machine learning and deep learning approaches. Libr. Philos. Pract. 2020, 1–23.

Google Scholar

Di Ruberto, C., Loddo, A., and Puglisi, G. (2020). Blob detection and deep learning for leukemic blood image analysis. Appl. Sci. 10, 1176–1183. doi: 10.3390/app10031176

Crossref Full Text | Google Scholar

Dorfman, L. E., Floriani, M. A., Oliveira, T. M. R. D. R., Cunegatto, B., Rosa, R. F. M., and Zen, P. R. G. (2018). The role of cytogenetics and molecular biology in the diagnosis, treatment and monitoring of patients with chronic myeloid leukemia. J. Bras. Patol. Med. Lab. 54, 83–91. doi: 10.5935/1676-2444.20180015

Crossref Full Text | Google Scholar

El Houby, E. M. F. (2018). Framework of computer aided diagnosis Systems for Cancer Classification Based on medical images. J. Med. Syst. 42:157. doi: 10.1007/s10916-018-1010-x

PubMed Abstract | Crossref Full Text | Google Scholar

Eralp, B., and Sefer, E. (2024). Reference-free inferring of transcriptomic events in cancer cells on single-cell data. BMC Cancer 24:607. doi: 10.1186/s12885-024-12331-5

PubMed Abstract | Crossref Full Text | Google Scholar

Gokulkannan, K., Mohanaprakash, T. A., Rose, J. D., and Sriman, B. (2024). Multiscale adaptive and attention-dilated convolutional neural network for efficient leukemia detection model with multiscale trans-res-Unet3+−based segmentation network. Biomed. Signal Process. Control. 90:105847. doi: 10.1016/j.bspc.2023.105847

Crossref Full Text | Google Scholar

Gupta, Anubha, and Gupta, Ritu. C_NMC_2019 dataset: ALL challenge dataset of ISBI 2019 (C-NMC 2019), cancer imag arch. (2019) Available online at: https://www.kaggle.com/datasets/avk256/cnmc-leukemia (accessed July 21, 2023).

Google Scholar

Hasan, N., Rabbi, E., Das, S., Siddique, N., and Wang, H. (2025). Biomedical signal processing and control DIA-VXNET: a framework for automated diabetic eye disease detection using transfer learning with feature fusion network. Biomed. Signal Process. Control. 100:106907. doi: 10.1016/j.bspc.2024.106907

Crossref Full Text | Google Scholar

Hegde, R. B., Prasad, K., Hebbar, H., and Singh, B. M. K. (2019). Feature extraction using traditional image processing and convolutional neural network methods to classify white blood cells: a study. Australas. Phys. Eng. Sci. Med. 42, 627–638. doi: 10.1007/s13246-019-00742-9

PubMed Abstract | Crossref Full Text | Google Scholar

Howard, A. G., Zhu, M., Chen, B., and Kalenichenko, D. (2017). MobileNets: efficient convolutional neural networks for mobile vision applications, arXiv: arXiv:1704.04861. doi: 10.48550/arXiv.1704.04861

Crossref Full Text | Google Scholar

Jawahar, M., H, S., and Gandomi, A. H. (2022). ALnett: a cluster layer deep convolutional neural network for acute lymphoblastic leukemia classification. Comput. Biol. Med. 148:105894. doi: 10.1016/j.compbiomed.2022.105894

PubMed Abstract | Crossref Full Text | Google Scholar

Johnson, J. M., and Khoshgoftaar, T. M. (2019). Survey on deep learning with class imbalance. J. Big Data 6, 1–54. doi: 10.1186/s40537-019-0192-5

Crossref Full Text | Google Scholar

Kaope, C., and Pristyanto, Y. (2023). The effect of class imbalance handling on datasets toward classification algorithm performance. MATRIK: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer 22, 227–238. doi: 10.30812/matrik.v22i2.2515

Crossref Full Text | Google Scholar

Khandekar, R., Shastry, P., Jaishankar, S., Faust, O., and Sampathila, N. (2021). Automated blast cell detection for acute lymphoblastic leukemia diagnosis. Biomed. Signal Process. Control. 68, 1–12. doi: 10.1016/j.bspc.2021.102690

Crossref Full Text | Google Scholar

Leukemia in Romania. (2020). World health rankings. Available online at: https://www.worldlifeexpectancy.com/romania-leukemia (accessed September 6, 2023).

Google Scholar

Liu, K., and Hu, J. (2022). Classification of acute myeloid leukemia M1 and M2 subtypes using machine learning. Comput. Biol. Med. 147:105741. doi: 10.1016/j.compbiomed.2022.105741

PubMed Abstract | Crossref Full Text | Google Scholar

LLS. (n.d.). Lymphoma survival rate | Blood cancer survival rates | LLS. Available online at: https://www.lls.org/facts-and-statistics/facts-and-statistics-overview (accessed December 29, 2021).

Google Scholar

Loey, M., Naman, M. R., and Zayed, H. H. (2020). A survey on blood image diseases detection using deep learning. Int. J. Serv. Sci. Manag. Eng. Technol. 11, 18–32. doi: 10.4018/IJSSMET.2020070102

Crossref Full Text | Google Scholar

Magpantay, L. D. C., Alon, H. D., Melegrito, M. P., and Fernando, G. J. O. (2022). A transfer learning-based deep CNN approach for classification and diagnosis of acute lymphocytic leukemia cells, 280–284.

Google Scholar

Makem, M., and Tiedeu, A. (2020). An efficient algorithm for detection of white blood cell nuclei using adaptive three stage PCA-based fusion. Inform. Med. Unlocked 20:100416. doi: 10.1016/j.imu.2020.100416

Crossref Full Text | Google Scholar

Mathury, P., Piplani, M., Sawhney, R., Jindal, A., and Shah, R. R.. “Mixup multi-attention multi-tasking model for early-stage leukemia identification,” in: ICASSP, IEEE Int. Conf. Acoust. Speech signal process. - Proc., IEEE, (2020): pp. 1045–1049.

Google Scholar

Mayouf, M. S., Dupin De Saint-Cyr, F., Mayouf, M., and Dupin, F. (2020). On data-preparation efficiency application on breast Cancer classification. France: IRIT: Institut de Recherche en Informatique de Toulouse.

Google Scholar

Mishra, S., Majhi, B., and Sa, P. K. (2019a). GLRLM-based feature extraction for acute lymphoblastic leukemia (ALL) detection Sonali, recent fin. Singapore: Springer.

Google Scholar

Mishra, S., Majhi, B., and Sa, P. K. (2019b). Texture feature based classification on microscopic blood smear for acute lymphoblastic leukemia detection. Biomed. Signal Process. Control. 47, 303–311. doi: 10.1016/j.bspc.2018.08.012

Crossref Full Text | Google Scholar

Mohammed, Z. F., and Abdulla, A. A. (2021). An efficient CAD system for ALL cell identification from microscopic blood images. Multimed. Tools Appl. 80, 6355–6368. doi: 10.1007/s11042-020-10066-6

Crossref Full Text | Google Scholar

Mohapatra, S., and Patra, D. (2010). Automated cell nucleus segmentation and acute leukemia detection in blood microscopic images. Int. Conf. Syst. Med. Biol. ICSMB 2010- Proc. 10, 49–54. doi: 10.1109/ICSMB.2010.5735344

Crossref Full Text | Google Scholar

Mondal, C., Hasan, M. K., Jawad, M. T., Dutta, A., Islam, M. R., Awal, M. A., et al. (2021). Acute lymphoblastic leukemia detection from microscopic images using weighted ensemble of convolutional neural networks, 1–31. doi: 10.48550/arXiv.2105.03995

Crossref Full Text | Google Scholar

Mourya, S., Kant, S., Kumar, P., Gupta, A., and Gupta, R. (2019) ALL challenge dataset of ISBI 2019 (C-NMC 2019). (Eds.) Gupta A and Gupta R. Singapore: Springer.

Google Scholar

Nabilah, S., Safuan, M., Tomari, M. R., Nurshazwani, W., and Zakaria, W. (2020). Investigation of white blood cell biomaker model for acute lymphoblastic leukemia detection based on convolutional neural network. Bulletin Electrical Eng. Info. 9, 611–618. doi: 10.11591/eei.v9i2.1857

Crossref Full Text | Google Scholar

Orriols-Puig, A., and Bernadó-Mansilla, E. (2009). Evolutionary rule-based systems for imbalanced data sets. Soft. Comput. 13, 213–225. doi: 10.1007/s00500-008-0319-7

Crossref Full Text | Google Scholar

Paiva, A. S., Paiva, H. D. D. O., Cavalcanti, G. B., Silveira, L. S., Silva, L. K., Gil, E. A., et al. (2018). Contribution of flow cytometry immunophenotyping in diagnostic of acute and chronic leukemias. Blood 132:5198. doi: 10.1182/blood-2018-99-118923

Crossref Full Text | Google Scholar

Putzu, L., Caocci, G., and Di Ruberto, C. (2014). Leucocyte classification for leukaemia detection using image processing techniques. Artif. Intell. Med. 62, 179–191. doi: 10.1016/j.artmed.2014.09.002

PubMed Abstract | Crossref Full Text | Google Scholar

Rawat, J., Singh, A., Bhadauria, H.S., and Virmani,. “Computer aided diagnostic system for detection of leukemia using microscopic images.” in: 4th International Conf. Eco-friendly Comput. Commun. Syst. Comput., Elsevier Masson SAS, (2015): pp. 748–756.

Google Scholar

Rawat, J., Singh, A., Bhadauria, H. S., Virmani, J., and Devgun, J. S. (2017). Classification of acute lymphoblastic leukaemia using hybrid hierarchical classifiers. Multimed. Tools Appl. 76, 19057–19085. doi: 10.1007/s11042-017-4478-3

Crossref Full Text | Google Scholar

Rohaziat, N., Tomari, M.R.M., and Zakaria, W.N.W.. “White blood cells type detection using YOLOv5,” in: 2022 IEEE 5th Int. Symp. Robot. Manuf. Autom. ROMA 2022, Institute of Electrical and Electronics Engineers Inc., (2022).

Google Scholar

Sarkar, D., Bali, R., and Ghosh, T. Hands-on transfer learning with Python: implement advanced deep learning and neural network models using TensorFlow and Keras. J. Mater. Process. Technol. 1 (2018) 1–8. Available online at: https://dl.acm.org/doi/10.5555/3294531 (accessed July 21, 2023).

Google Scholar

Shafique, S., and Tehsin, S. (2018). Acute lymphoblastic leukemia detection and classification of its subtypes using pretrained deep convolutional neural networks. Technol. Cancer Res. Treat. 17, 1–7. doi: 10.1177/1533033818802789

Crossref Full Text | Google Scholar

Shah, A., Naqvi, S. S., Naveed, K., Salem, N., Khan, M. A. U., and Alimgeer, K. S. (2021). Automated diagnosis of leukemia: a comprehensive review. IEEE Access 9, 132097–132124. doi: 10.1109/ACCESS.2021.3114059

Crossref Full Text | Google Scholar

Singhal, V., and Singh, P. (2014). Local binary pattern for automatic detection of acute lymphoblastic leukemia, 2014 20th Natl. Conf. Commun. NCC 2014, 1–15. doi: 10.1109/NCC.2014.6811261

Crossref Full Text | Google Scholar

Ullah, M. Z., Zheng, Y., Song, J., Aslam, S., Xu, C., Kiazolu, G. D., et al. (2021). An attention-based convolutional neural network for acute lymphoblastic leukemia classification. Appl. Sci. 11, 1–12. doi: 10.3390/app112210662

Crossref Full Text | Google Scholar

Vogado, L. H. S., Veras, R. M. S., Araujo, F. H. D., Silva, R. R. V., and Aires, K. R. T. (2018). Leukemia diagnosis in blood slides using transfer learning in CNNs and SVM for classification. Eng. Appl. Artif. Intell. 72, 415–422. doi: 10.1016/j.engappai.2018.04.024

Crossref Full Text | Google Scholar

Walker, H., Smith, F. J., and Betts, D. R. (1994). Cytogenetics in acute myeloid leukaemia. Blood Rev. 8, 30–36. doi: 10.1016/0268-960X(94)90005-1

PubMed Abstract | Crossref Full Text | Google Scholar

Keywords: leukemia classification, image processing, CNN, disease detection, data augmentation

Citation: Makem M, Tamas L and Bușoniu L (2025) A reliable approach for identifying acute lymphoblastic leukemia in microscopic imaging. Front. Artif. Intell. 8:1620252. doi: 10.3389/frai.2025.1620252

Received: 29 April 2025; Accepted: 26 June 2025;
Published: 17 July 2025.

Edited by:

Xian-Hua Han, Rikkyo University, Japan

Reviewed by:

Emre Sefer, Özyeğin University, Türkiye
Shanmugavalli Venkatachalam, Manipal Institute of Technology Bengaluru, India

Copyright © 2025 Makem, Tamas and Bușoniu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Levente Tamas, TGV2ZW50ZS5UYW1hc0BhdXQudXRjbHVqLnJv

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.