Pectoral Muscle Removal in Digital Mammograms Using Region Based Standard Otsu Technique

ABSTRACT


INTRODUCTION
Cancer is a chronic disease posing serious challenges to health systems globally [1].The World Health Organization in 2020 recorded a mortality rate of about 10 million globally relating to cancer, and cases of approximately 70% occurred in low and middle-income countries [2].In Nigeria, about 102,000 thousand new cases of different cancer occur annually, with high number of recorded cases of breast cancer [3], [4].Breast cancer results from abnormal changes in the genes responsible for regulating cell growth in the breast tissues.It is the most common of cancers in the feminine gender because of the structure, orientation, and presence of developed breast muscles.Breast cancer can begin from any breast part; giving rise to the different progressions of breast cancer that depends on specific breast tissue affected and the degree of spread [5], [6].
The use of medical imaging machines has been an effective method used to reveal the inner structure of the human body mainly for clinical analysis, making diagnosis and treatment less cumbersome.Also, many body parts are not torn open with assumptions in surgery [7], [8].Imaging tools like Mammography, MRI, Ultrasound, and Thermography [5], [9] are utilized for breast screening for the detection of breast cancer.However, results associated with the medical images acquired depended on the expertise and visual interpretation of the radiologist.Computer-aided system designed for medical image assessment depends on the type of medical examination, part of the body affected, disease type, and desired findings from the image.The  ISSN: 2089-3272 IJEEI, Vol.11, No. 2, June 2023: 337 -348 338 computer-aided system helps resolve shortcomings associated with the manual interpretation of medical images [10], [11]; as the detection process is increased and the error due to human fatigue is drastically reduced.Computer-aided systems are categorized into two schemes Computer-Aided Diagnosis (CADx) and Computer-Aided Detection (CADe).The CADe focused on identifying potential abnormalities in the image and outlining suspicious Regions of Interest (ROI) for further examination by the radiologists.Although detailed characteristics of the tumor detected are unknown, the radiologist's attention is drawn to a tumor that may be unnoticed.CADx is targeted on the diagnosis, and classification of suspected regions as normal or malignant or benign [12], [13].
Mammography is usually the first option in detecting breast lesions because of its high sensitivity in detecting cancer at an early stage.Mammography uses a low dose of X-ray 0.4mSv (millisieverts or mSv is a unit of measure for radiation dose) [5], [9] to produce a mammogram (breast image) usually taken in two views which are: Cranio Caudal (CC) and Medio-Lateral Oblique (MLO).These views allow a degree of visualizing dense breasts to an extent, and the MLO view depicts more breast tissue and Pectoral Muscle (PM) area when compared with the CC view [14], [15].Thus, mammograms in MLO projection are used for this study.
Although, the presence of the PM is a criterion used to show proper positioning during image acquisition [15] and also, as an index to compare the symmetry between the left and right breast images [16].
The PM contains no relevant information during automatic intensity-based mammographic analysis like breast tissue density estimation and identifying cancer lesions [17].Thus, the removal of PM is an essential preprocessing step required for most CADx systems to minimize biased results and false positive cases [18], [19]; because the intensity of the pectoral muscle area is similar or even higher in some cases than breast tissue and lesions if present.Recently, different researchers proposed and developed various automated and semiautomated methods for PM removal in MLO view mammogram images.
Fadhil and Dawood [19] developed an algorithm based on Split Orientation Local Thresholding (SLOTH) to remove the PM area.The method was implemented by splitting the pre-processed mammogram image into four quadrants and the upper left or upper right portion with the PM region selected based on image orientation.A set threshold of 150 ws used to perform binarization on the selected quadrant, thereby eliminating the PM region.The image without PM was gotten by grouping the selected quadrant without PM with the rest of the quadrants split earlier.The algorithm was implemented using C# and tested using images 100, 110, and 322 from the MIAS database with an accuracy of 98%, 90.9%, and 93.2% obtained respectively.There were two limitations in this approach.First the algorithm applied on the selected quadrant implied that some images with some portions of the PM seen on other quadrants not selected are missed.Secondly, the manually set threshold of 150 used for all images considered in the experiment might not be optimal for some images.
Vagssa et al [20] assumed the PM border to be a straight line and developed a Hough transform algorithm for the deletion of the PM area in a mammogram.In this work, left image orientation was used to help simplify the algorithm, such that a right mammogram was flipped to the left.Also, the upper left quarter of the mammogram was used as a region of interest to define the Hough mask obtained using the Canny edge detection method and a Hough transformation equation defined in [21].Although the deletion accuracy rate obtained was 93.8%, the shape of the PM boundary for some mammograms is not always a straight line.
Yoon et al [16] proposed a nonlinear Random Sample Consensus Algorithm (RANSAC) to remove PM.In the PM segmentation, an oblique kernel was used to show the outline of the enhanced image and binarization was done using Otsu thresholding to detect the edges of the outline.Hough transform was used to connect lines having similar angles in the range of 100 to 170 for left MLO images and 280 to 350 for right MLO images, and the line with the longest length was selected as the corresponding PM outline which was further interpolated using the RANSAC algorithm to obtain an optimal PM outline.They obtained an acceptable rate of 92.2%, an average False Negative (FN) of 5.68% and average False Positive (FP) of 4.51%.
Makandar and Halalli [13] proposed a technique based on thresholding and a modified region-growing technique to remove pectoral muscle.The seed point for the region-growing algorithm was selected automatically by considering the orientation of the mammogram.100 images from the MIAS database were used to implement the proposed method and an accuracy of 97% was obtained.Also, the wiener, adaptive minmax, and median filter were compared and the PSNR of the Wiener filter was high and the RMSE and image quality index (IQI) were reduced.Thus, for preprocessing mammograms.Wiener filter and CLAHE were used to enhance image quality.
Sreedevi and Sherly [22] developed an algorithm for segmenting and eliminating pectoral muscle.In this work, global thresholding is used to identify pectoral muscles, the boundaries are detected using canny edge, and the regions are extracted using CCL.An accuracy of 90.06% was obtained when implemented on 161 images from the MIAS database.
In summary, the bounding box approach based on a defined region within the mammogram image has been widely used in literature as a criterion when developing PM removal algorithms.The main constraint of the works reviewed was the susceptibility of the methods used to eliminate PM from the defined region of the bounding box and the complexity of the method used in defining the PM boundary.In [19], [20], they tried to improve the bounding box by dividing preprocessed mammograms into four quadrants and considering only the upper quadrants in the vertical edge of the breast profile.Some images however, have a significant portion of PM present in the lower quadrant of the vertical edge of the breast region which was not accounted for.Also, the assumption of the PM boundary as a straight line as seen in [20] is not always the case.In this study, a new segmentation method called Region Based Standard Otsu thresholding is developed for the elimination of PM, and it considered desired regions based on vertical distance since the height of the PM is unknown and unique to different mammograms.Also from research, the choice of threshold value used can be obtained automatically, semi-automatically, and even manually.However, for this research the threshold value used was obtained automatically using Otsu thresholding.

RESEARCH METHOD AND MATERIALS
The methods used in this work combine two stages of digital image processing techniques: image enhancement and image segmentation.A detailed description of the methods employed in these stages are described in the sub sections below.

Image Enhancement Stage
The enhancement technique has various approaches, which are goal-driven to modify images so that the resulting image is more suitable than the former for specific applications.Digital Mammograms are susceptible to noise and naturally have poor contrast between the densities of different breast tissues and cancer tumors if present.As seen in Figure 2.2, a typical mammogram has various information, of which not all are relevant.This work combined different enhancement techniques to improve image contrast and remove noise, tape artifacts, and labels if present.

Thresholding
Thresholding is a segmentation technique that categorizes image pixel values; using a set threshold [7], [24].This method is described mathematically in Equation 2 Binarization thresholding segments a grayscale image into a binary image such that the background is set to '0' (black) and the foreground is set to '1' (white) following a defined threshold [25].Double thresholding is a modification of binarization such that the threshold criterion is a given range of threshold values.
Multi-thresholding segments object in a grayscale image into a limited number of gray levels defined by more than one threshold value or range of values.
The main parameter for optimal thresholding segmentation is the threshold value used.A threshold value can be obtained manually by random selection, or automatically from an image histogram, or using different statistical methods like mode, mean, variance, etc.There are different approaches in literature for choosing an optimal threshold value.

Otsu Thresholding
This method considers the image pixel value Probability Density Function (PDF) as a bimodal distribution, such that the background and foreground are defined clearly in the image histogram.With this assumption, it automatically finds the optimal value for global thresholding [7]; through an extensive search of possible thresholding values by calculating their corresponding weighted between-class variance.
The optimal threshold value is the threshold with the minimum variance within two classes or the maximum weighted between class variance.The intra-class variance or within-class variance can be defined using the equation of weighted variances of each cluster as in Equation 2.2.Also, the between-class variance is calculated using Equation 2 (2.4) The class variance is also given by Equation 2.5 and (2.5) The variable () is the mean of the image histogram.Also, the between-class variance   2 () can be further simplified in Equation 2.7 (2.7)

Pectoral Muscle Removal
The effective removal of PM is a function of the breast density type, quality of image contrast, size of the breast region, and PM area.The following characteristics were considered when developing the PM removal algorithm: • The PM is usually at the top corner of the vertical edge of the breast region.
• The PM always intersects the vertical edge of the breast region • The PM tissue has a higher intensity than the surrounding tissue.
• There is a gradual decrease in the PM width fro top to bottom.The threshold used was obtained by taking the average of the Standard Otsu for the breast region defined by points ABDE and the standard Otsu threshold for the Pectoral Muscle region defined by points ABCF as in Figure 2.3.And this average value obtained was used as the set threshold to perform binarization and CCL to eliminate the pectoral muscle region.For some MLO mammograms, the PM boundary is almost identical with the surrounding tissues, especially the lower portion of the PM.Although the optimal threshold obtained does not completely separate the PM for the breast tissue, an iterative threshold selection was used to optimize the segmentation process for such images.The algorithm applied in PM removal is illustrated in Table 2.1.Table 2.1.Pseudocode for PM removal 1. Img1 = Grayscale mammogram without noise, labels, and contrast enhanced 2. Th = Possible values of optimal threshold obtained using multi Otsu thresholding multithresh MATLAB function with gray levels of 5 3. A1 = Breast region rectangular border ABDE as the area obtained when the minimum value of Th is used to perform binarization 4. A2 = PM rectangular border ABCF as the area obtained when the median of Th is used to perform binarization 5. T1= Compute the Otsu threshold for region A1 using the graythresh MATLAB function 6. T2= Compute the Otsu threshold for region A2 also using the graythresh MATLAB

Method of Evaluating the Result of Proposed Algorithm
The area-based metric is a common quantitative method used in literature to evaluate the quality and goodness of segmented regions by comparing the area of the region segmented by an algorithm with the actual area as determined manually by an expert [26].Three important parameters True Positive area (TP), False Negative area (FN), and False Positive area (FP) are useful in defining some segmentation metrics indicators.From Figure 2.4, the TP area shows the segmented region that matches perfectly with the ground truth region determined by experts.For the FN area, the expert reference region is not present in the region segmented by the algorithm.Also, the FP area shows that some region segmented lies outside the expert manual contour [21], [26].The different metrics used to evaluate the quality and goodness of segmented regions are as follows. i.
False Positive Rate (FPR): is expressed mathematically in Equation 2.8 as the ratio of the extra pixels to the total number of pixels obtained.The FPR and FNR are used to evaluate the missed segmentation rate.Thus, the smaller the FPR and FNR, the better the segmentation algorithm.
iii.Jaccard Similarity Coefficient: It measures the similarity and diversity of two sample images: the ground truth segmentation by experts and the predicted segmentation, by computing the intersection or region overlap between them divided by their union [18].
Equation 2.10   is the region segmented manually by experts, and   is the region segmented by a proposed segmentation method.The value of (  ,   ) lies between 0 and 1.For a perfect match of the two images, the Jaccard similarity coefficient is 1.The Jaccard index can also be expressed in terms of TP, FP, and FN as shown in Equation 2.11.
The agreement index based on the Jaccard Similarity coefficient is shown in Table 2.2.

Experimental Set Up
The algorithms developed for this research were actualized using the MATLAB 2018a software version on a computer with a 2.16Hz-Intel Pentium processor and 4GB of RAM.And the result of the different steps obtained is shown below.

Results
The 16 image samples in Tables 3.1 were used to illustrate the experimental results obtained using the developed algorithm.A total of 322 images from the MIAS database were also analyzed.The segmented PM results obtained were compared with manually drawn contours by an expert radiologist in mammography.The quantitative metric in terms of False Positive Rate (FPR), False Negative Rate (FNR), and Jaccard index defined in Equation 2.8, 2.9, and 2.11 respectively were computed.Table 3.2 shows the results for each of the 16 images, also Table 3.3 gives a summary of the results for the 322 images in the MIAS dataset.

Discussion
In Table 3.1, the first column shows the image identifier as indexed in the MIAS database.From the third column of Table 3.1, although the original image contrast was enhanced, additive noise was also introduced.The fourth column shows that noise was removed as well as other trivial information present in the image: additive noise present in all image samples, scanning artifacts in 'mbd 104 and 117', low-intensity labels in ' mdb 120, 218, 58, and 28' high and intensity labels in 'mdb 117, 75, 104, 206, 170, 58, 05, 184, 195 and 193' were eliminated.For this work, the left orientation was used, so images in the right orientation like 'mdb 117, 75, 25, 213, 05, 195 and 193' were flipped to the left as seen in the fourth column.The fifth column shows a further enhanced image contrast and the last column show the resulting grayscale image after the removal of PM.
From Table 3.2, an average Jaccard index of 98.69% was obtained, this result shows that the concordance of the PM region segmented manually by an expert and by the proposed method was excellent.From Table 3.2 the FNR and FPR fall below 5%, and the average FNR and FPR are 1.24% and 1.03% respectively.These lower values show that the proposed segmentation algorithm was good and effective.Also, visual inspection of the results presented in Table 3.1 are very efficient.The overall performance of the algorithm for PM removal can be grouped into two categories: acceptable and unacceptable.Images are said to be acceptable if the PM is correctly removed or a majority of the PM area is removed.Also, an image is unacceptable if most or all of the pectoral muscle is not removed.Acceptable cases for our proposed method have a Jaccard similarity index greater than 81%, FNR less than 12%, and FPR less than 12%.Thus for the MIAS dataset, our proposed method produced an acceptable rate of 95.65%.Also, the average Jaccard similarity index of 93.2%, an average FPR of 3.54%, and an average FNR 5.68% were obtained.The proposed algorithm for PM removal used in this work achieved better results than previous works as seen in Table 3.4, due to the threshold selection that considered some regions in the mammogram image and took the average as against using the entire image threshold or manual selection of threshold value.Also, the result at the image enhancement stage contributed greatly to the success of the segmentation algorithm.

CONCLUSION
In this paper, a Region-Based Standard Otsu thresholding for the removal of PM has been presented.The proposed algorithm was accurate for variations in PM Size, boundary curvature, and tissue density.Also, the results obtained show that the proposed algorithm achieved a higher performance when compared with previous works.
The performance of the algorithm was evaluated using the Jaccard Similarity index, average FNR, and average FPR.Also, the algorithm was compared with other related works reviewed based on the acceptable rate, and it performed better for the wide range of data used.In the future, the image obtained after removing PM can be used for further image analysis using other image processing steps like CADx system design for breast cancer lesions detection in mammograms.Also, other segmentation methods can be applied to the region considered for this work.
in Digital Mammograms (Anusionwu et al) 339 Figure 2.1 shows the flowchart for PM removal algorithm used to efficiently implement this work.

Figure 2 . 1 .
Figure 2.1.Flowchart of PM removal algorithm2.1.Input Mammogram Image DatasetBreast images used in this work were gotten from the most widely used MIAS mammogram database[23].The MIAS database is an online digital mammography data gotten from an organization of UK research

Figure 2 . 1 .
Figure 2.1.Illustration of True Positive, False Positive and False Negative False Negative Rate (FNR) is expressed mathematically as shown in Equation 2.9.It is the ratio of the missing pixels to the total number of pixels obtained  =  + (2. 9) 8. Img2 = Binary image of Img1 using the imbinarize function with the computed value of Thr 9. Img3 = Apply binary morphological filters on Img2 using the imfill function 10.Extract PM region i.Cc1 = Find the connected objects in Img3 using bwconncomp function ii.L1= Create label matrix of Cc1 using the labelmatrix function iii.Mvl = Minimum column coordinate value for pixel value >0 in Cc1 iv.Mn = Search the Mv1+10 columns in the matrix L1 for mode from the label values v. BwPimg = select connected object with label Mn as the Pectoral Muscle 11.Check for optimal PM selection i. W = 0 ii.Store the maximum column coordinate value for pixel >0 in vector D for {( W+1) x20} consecutive rows in the BwPimg iii.W = Number of elements in vector D iv.For W < 3, repeat step 11ii.End v. Compare the consecutive elements of in vector D for a decrease vi.Y=0 vii.For step v = False and Y < 2 repeat step 8 and increase Thr by 0.05 End 12. Pimg=1-BwPimg 13.Fimg = Product of the binary image Pimg and Img1 using the immultiply function IJEEI ISSN: 2089-3272  Pectoral Muscle Removal in Digital Mammograms (Anusionwu et al) 343

Table 3 .
1. Results of the Different Steps of the Algorithm Developed

Table 3 .
2 Performance Evaluation of PM for 16 Sample Images

Table 3 .
3 Summary of Performance Evaluation of PM for 322 Sample Images

Table 3 .
4. Comparison of Proposed Algorithm for PM Removal with Previous Works