A Precice Segmentation Process to Detect Leukemia Using Image Processing
Abstract- In modern world, many challenging task can be done effectively by using image processing approach. In manual process, physicians can observe internal structures abnormalities of cell that are present in the image and analyze it, which is quite difficult for microscopic images. That’s why, an automatic processing is needed to detect different abnormalities. Cell is a thing that helps us to protect our body from different kinds of infections. Actually, the abnormality of white blood cells causes leukemia. In present world, this is very common and dangerous disease which is a cause of death. Leukemia is a type of blood cancer and if it is detected late, it will result in death. Now days image processing is used to detect leukemia. The overall detecting process using blood cell image includes filtering technique, clustering and detection. Segmentation which can identify specific cell is an important step in leukemia detection from image. Then, an automatic counting process of white blood cell (WBC) and red blood cell (RBC) used efficiently and effectively. Finally, a percentage is calculated using the count of WBC and RBC. Depending on the percentage, the decision is made whether it is leukemia or not. This paper depicts a comparison of two segmentation algorithm K-means and K-medoids which helps to detect leukemia appropriately.
Keywords— Leukemia, segmentation, K-means, K-medoids, image processing, detection.
Leukemia is a blood cancer that arises in blood stem cells. Stem cells are the elementary cells that grow up different types of cells that have different task. Leukemia is caused by the fast production of aberrant white blood cells. Aberrant white blood cells are unable to struggle against infection. Also damage the capacity of the bone marrow to produce red blood cells and platelets. Every human body has mainly three type of blood cells RBC (red blood cell), WBC (white blood cell), PLT (platelets) 1. Increasing rate of WBC and abnormalities of WBC structure are the reason for happening leukemia. Different types of approaches are applied to detect leukemia. Tathagata Hazra et al.1 In this paper, some filtering techniques, K- means algorithm for image preprocessing and segmentation is used. Finally they used Viola- jones object detection algorithm to detect and count WBC. Preeti Jagadev et al.2 Mainly focuses on the detection of leukemia and provides a broader range of leukemia classification into its four types. They stated three segmentation algorithms like K-means clustering algorithm, Marker controlled watershed algorithm, HSV color based segmentation algorithm. Further, they used SVM classifier for detection of leukemia. Khaled A. S. Abu Daqqa et al.3 They stated KNN algorithm for segmentation, decision tree and SVM for classification. Endawh Purtani et al.4 Work on detection of acute lymphocyte using K-nearest neighbor algorithm based on shape and histogram features. Shaikh Mohammed Bilal et al.5 Classified cell under morphological feature. Then used linear interpolation and discrete Fourier Transform to create a graph of the discontinuities to determine overlapping cells. In this paper our only attention is to detect leukemia. So we focus on to count WBC (leucocytes). There are two types of stem cell, myeloid and lymphoid. Myeloid stem cell arises myeloid blast. This myeloid blast is the reason for generating of RBC(erythrocytes), WBC(leucocytes) and PLT(platelets). Lymphoid stem cell also initiates lymphoid blast which will generate only the white blood cell (WBC). Bone marrow produces abnormal white blood cells (WBC) 1. These abnormal cells should die after some time but in reality they do not die and they become abundant in count. The normal white blood cells are obstructed by those aberrant white blood cells in doing their natural task. This type of condition is named as disease like “Leukemia”. It is shown in figure 1.
Fig. 1. Leukemia blood cell
This work basically done on segmentation step to detect leukemia properly. To understand results without difficulty, different segmentation algorithms performance are shown in graphical and tabular form.
Several methods that have been applied to done the task of finding leukemia cell and count it automatically. Our method that is proposed to detect leukemia from microscopic blood image is shown in Figure 2.
Image acquisition, first and an important step in any image processing approach. Intended tasks will not be achievable, if image has not been acquired satisfactorily. Also leukemia detection starts with this image acquirement stage. This work starts with 108 images of leukemic and non-leukemic patient from ALL-IDB dataset. All images in JPG format with same color depth and resolution.
After reading JPG image, which is in RGB color space, converted into grayscale image to reduce dimension of original image 7. Because it is easy to work with gray scale image. The rgb2gray standard MATLAB function transforms RGB images to grayscale by removing hue and congestion information and converts RGB values (3-dimensional) to grayscale values (1-dimensional) by forming a weighted sum of the R, G and B components.
Fig. 2. Proposed method for automated identification of leukemia
Preprocessing of image enhance the quality of image that can be improve further operations. In the taken image there may have some noise or unnecessary thing. With the removal of these things images may be more appropriate for further process. Enhancing antithesis, removing noise, isolating region is the main task of image preprocessing 1. Here median filtering is used to remove noise.
Segmentation is the main stage of this detection process. Because better segmentation gives better result. Segmentation is a method that separates the images into sections that have some similar characteristics 9. In blood cell there are mainly three categories of cell- red blood cell, white blood cell and platelets. As detection of leukemia only focuses on WBC so other components should be eliminated with the process of segmentation. For better segmentation generally clustering algorithms are used. There are several algorithms for clustering. Here used k-means and k-medoids clustering algorithms for segmentation.
In Image processing region filling is a morphological operation. With some specific colors, it fills the region in the image. Region filling or Hole filling operation helps to form perfect cell in image which ensure to get better result. In this case, a hole is an area of dark pixels surrounded by lighter pixels. MATLAB standard function like imfill is used for region filling. MATLAB imfill function uses an algorithm based on morphological reconstruction.
With the help of post processing different unnecessary things can be removed. Different filtering techniques can be used in post-processing step. Here Median filter used for blood image post-processing. In blood cell image post–processing step helps to make count WBC close to accurate.
Manually coun WBC from blood cell image is time consuming. In this work WBC is counted automatically. There are some standard functions in MATLAB to count WBC automatically. The Number of connected components (objects) in the image is found by the help of this standard function.
Counting WBC, the number of WBC in blood cell image is found. It is known that when the WBC is increased rapidly and in access amount then it causes leukemia. Compare the counting value with its normal value. If the number is much more than natural then it will be consider as leukemia. Based on this theory it can be classified the initial, normal and extreme stage.
In this work we used two popular clustering algorithm, one is k-means and another is k-medoids. Both are partition based iterative method.
Clustering finding natural grouping among objects. K-means is a partitioning method that partition objects into k number groups. K-Means algorithm is an unsupervised grouping algorithm 9 that characterized the data focuses into different classes in view of their characteristics different from one another. This algorithm finds centroids for every cluster. It is an iterative process. Here Euclidean distance is used to find the distances of objects to make cluster. Different clusters form in each iteration and find the better one. If cluster quality improve then the previous center update. Updating center stops after a certain iteration when no changes found in cluster quality. Finally the update center used to form final cluster. With several iteration a perfect cluster is acquired. K-means algorithm contains the following steps 11
Input: m= number of clusters to be formed,
D= data set
Output: m clusters.
(a) first randomly choice cluster centers as initial center.
(c)calculate the distance between each data point and cluster centers using Euclidian distance;
(d)assign the data points to the nearest cluster center;
(e) recalculate the new cluster center;
(f) until no update found
The randomly selection of initial cluster centers k, can be change this algorithm efficiency. K-means algorithm can be run multiple times to abate this effect.
K-medoids means partitioning around medoids. K-medoids clustering algorithm which is slightly modified from the K-means algorithm. K-medoids has the useful important characteristic which centers are situated among the data point themselves 10. K-means and K-medoids both effort to reduce the squared-error but the K-medoids algorithm is more robust to noise than K-means algorithm. In K-medoids, data points are selected to be the medoids. A medoid can be defined as that object of a cluster, whose average not similar to all the objects in the cluster is minimal. The basic idea of this algorithm is to first compute the K representative objects which are called as medoids. After getting the set of medoids, each object of the data set is imputed to the closest medoid. That is, object is assign into cluster, when medoid is closer than any other medoid. K-medoids algorithm contains the following steps 12
Algorithm: K-MedoidsInput: m = number of clusters to be formed,
D = data set.
Output: m clusters set.
(a) Initially select k random points from dataset D;
(b) repeat(c)Associate each data points to the closest medoid by using any of the most common distance metrics;
(d)For each pair of non-selected object and selected object, calculate the total swapping cost;
(e)swap selected object with non-selected if swapping cost;0;
(e) until no update found;
DATASET AND ENVIRONMENT
ALL-IDB1 dataset 13 is used here. The images of the dataset have been captured with an optical laboratory microscope coupled with a Canon PowerShot G5 camera. All images are in JPG format with 24 bit color depth, resolution 2592 x 1944. The images are taken with different magnifications of the microscope ranging from 300 to 500. All the task done using MATLAB 2017a version on windows platform.
The dataset (ALL-IDB1) contains 108 images of blood smear. Proposed methodology is applied to these images and found some resultant images. Original ALL-IDB1 dataset images are used as input then it is converted into gray level. After denoising segmentation is done using k-means clustering and k-medoids clustering algorithm. Finally count the WBC in segmented image. Images of different stages from original to resultant are shown in figure 3.
Fig. 3. Steps of leukemia detection
The segmentation output obtained from k-means and k-medoids are also shown in figure 3. Some differences between two segmentation algorithms are found. Here the differences are shown with the help of accuracy (how accurately can find leukemia) and time efficiency (how much time needed to run each algorithm). These differences are shown in table-I and table-II.
Accuracy ; Time effency of k-means algorithm
(2592 x 1944) RBC+WBC Manual Count (WBC) Automatic Count (WBC) Counting Accuracy (%) Percentage of Cancerous Cell Leukemia Stage Execution Time(Sec)
Image 1 110 17 15 88.23 15.08 Initial 0.31
Image 2 129 21 20 95 15.08 Initial 0.23
Image 3 110 35 35 100 31.82 Extreme 0.22
Image 4 85 12 11 91.67 12.94 Initial 0.18
Image 5 123 18 17 94.44 13.82 Initial 0.29
Image 6 147 21 16 76.19 10.88 Initial 0.27
Image7 83 9 8 88.89 9.62 Normal 0.19
Image 8 171 43 42 97.67 24.58 Initial 0.33
Accuracy ; Time effency of k-medoids algorithm
(2592 x 1944) RBC+WBC Manual Count (WBC) Automatic Count (WBC) Counting Accuracy (%) Percentage of Cancerous Cell Leukemia Stage Execution Time(Sec)
Image 1 109 17 16 94.12 14.63 Initial 0.94
Image 2 105 21 22 95.45 20.95 Initial 1.08
Image 3 91 35 35 100 38.46 Extreme 1.05
Image 4 154 24 21 87.50 13.63 Initial 0.95
Image 5 109 18 16 88.89 14.67 Initial 1.05
Image 6 81 21 22 95.45 27.16 Extreme 1.05
Image 7 104 9 10 90.00 9.56 Normal 0.95
Image 8 106 45 46 93.47 43.40 Extremes 0.97
Here percentage of cancerous cell get by making the percentage of WBC cell in total cell and leukemia stage is defined on it. If the percentage is less than 10 then it considered normal stage, greater than 10 but less than 25 is considered initial stage and more than 25 is considered as extreme stage 11. In the last column the execution time of each algorithm is given.
CONCLUSION AND DISCUSSION
Leukemia is a type of cancer which is essential to detect in initial stage. Detecting leukemia in early stages makes an opportunity for further diagnosis. The expected target of this paper is to find an appropriate segmentation process which can help to detect leukemia from blood cell image effectively. To detect leukemia from blood cell image, manual counting is time consuming and difficult to count WBC perfectly. Fundamentally, in various ways we can detect leukemia. For segmentation we compare K-means and k-medoids clustering algorithm. From both table it is found that the counting accuracy is more consummate in k-medoids than k-means. But need execution time more in k-medoids than k-means. The negligible limitations of K-means clustering algorithm is the random values are changes rapidly so it is difficult to get better cluster. By using k-medoids clustering algorithm, this limitation can be overcome easily. Additionally, for future work the method can be further extended by extracting feature from WBC that is present in the image. Then, the images can be classified by using some more adaptive classifier. This classification process will helps us to know about the types of leukemia and it will be easy to take necessary steps for saving life of leukemia patient.
1 Tathagata Hazra, Mrinal Kumar and Dr. Sanjaya Shankar Tripathy, “Automatic Leukemia Detection Using Image Processing Technique,” International Journal of Latest Technology in Engineering, Management ; Applied Science (IJLTEMAS) Volume VI, Issue IV, April 2017.
2 Preeti Jagadev and, Dr. H.G. Virani, “Detection of Leukemia and its Types using Image Processing and Machine Learning,” International Conference on Trends in Electronics and Informatics(ICEI), 2017.
3 Khaled A. S. Abu Daqqa, Ashraf Y. A. Maghari and Wael F. M. Al Sarraj, “Prediction and Diagnosis of Leukemia Using Classification Algorithms,” International Conference on Information Technology (ICIT), 2017.
4 Endah Purwanti and Evelyn Calista, “Detection of acute lymphocyte leukemia using knearest neighbor algorithm based on shape and histogram feature,” Journal of Physics: Conference Series, 2017 J. Phys.: Conf. Ser. 853 012011.
5 Shaikh Mohammed Bilal N and Sachin Deshpande, “Computer Aided Leukemia Detection using Digital Image Processing Techniques,” International Conference On Recent Trends in Electronics Information ; Communication Technology (RTEICT),IEEE, 2017.
6 Fahd Sabry Esmail, M. Badr Senousy and Mohamed Ragaie, “Predication Model for Leukemia Diseases Based on Data Mining Classification Algorithms with Best Accuracy,” International Journal of Computer and Information Engineering, Vol:10, No:5, 2016.
7 Hamali P.Vaghela, Hardik Modi, Manoj Pandya and M.B. Potdar “Leukemia Detection using Digital Image Processing Techniques”, International journal of applied information system, Volume 10 – No.1, November 2015.
8 J. V. Arati and R.C Thool, “Automatic Detection of Acute Lymphoblastic Leukemia Based on HIS Color Space,” IET Digital Library,2012.
9 Ms Chinki Chandhok, Mrs. Soni Chaturvedi and Dr.(Mrs.)A.A Khurshid, “An approach to Image Segmentation using K-means Clustering Algorithm”, UACEE International Journal of Artificial Intelligence and Neural Networks – Volume 2: Issue 2 ISSN 2250 – 3749,2012.
10 Norazam Arbin, Nur Suhailayani Suhaimi, Nurul Zafirah Mokhtar, Zalinda Othman, “Comparative Analysis between K-Means and K-Medoids for Statistical Clustering,” International Conference on Artificial Intelligence, Modelling and Simulation, 2015.
11 Md. Sabbir Ejaz, Dr. Md. Ali Hossain, Abdul Matin and Md. Tanvir Ahmed, “Performance Comparison of Partition Based Clustering Algorithms on Iris Image Preprocessing,” International Conference on Electrical ; Electronic Engineering (ICEEE),IEEE, 2017.
12 Vikekkumer, “leukemia cancer detection,” Online. Available:https://www.mathworks.com/matlabcentral/fileexchange/65389-leukemia-cancer-detection. Accessed 12 August 2018.
13 Fabio Scotti, “ALL-IDB Acute Lymphoblastic Leukemia Image Database for Image Processing,” Online. Available:https://homes.di.unimi.it/scotti/all. Accessed 12 August 2018.