A Novel Approach for Improving Post Classification Accuracy of Satellite Images by Using Majority Analysis

ABSTRACT


INTRODUCTION
Uttarakhand has seen a number of fire incidents in the past.It is a Himalayan state, due to which the ecology is quite delicate.It has largest number of forest cover in the northern states.It is also a habitat wide range of flora and fauna that are preserved by sanctuaries, national park and bio-sphere reserve.The forest fire in state is classified in three types: 1) Surface fires, 2) Ground fires and 3) Crown fires.Surface fire spread very quickly and mostly consumes small vegetation and surface litter.Ground fire consumes the organic matter.It mostly reaches out to musk, duff, or peat available underneath the surface litter of the forest floor.The Crown fire burns the top of the trees and shrubs without having any close link with surface fire [1].
Change detection (CD) is essential for determining the changes a specific land cover underwent in the due course of time.For performing CD, image analysis and classification of satellite images is necessary [2].CD is contemplated as one of the biggest challenges in the field of remote sensing application areas [3,4].CD has crucial importance in various applications such as deforestation, forest burning, study of land cover dynamics, vegetation change and surveillance of shifting cultivation [5].These issues involve examining of larger topographical areas [6].CD is used to ascertain change in a geographical area by incorporating multiple images taken at different time instances [7].The considerable steps to comprehend the CD process are 1) image pre-processing, 2) finding difference image (DI), and 3) difference image analysis.Here, the image classification is done using supervised machine learning technique -Support Vector Machine (SVM).The pixels of same spectral signature are classified as one class [8][9][10][11][12].For postclassification, a hybrid technique is developed by combination of SVM and Major Analysis (MA) is proposed.In this technique, the unclassified pixels are compared with a group of pixels (neighbourhood) and based on the majority of the surrounding pixels, the unclassified pixel is labelled accordingly.The objective of this technique is to improve the accuracy of the classified image.Lastly, a CD is performed on an area that was severely affected during the forest fires in Nainital, Uttarakhand, India.

STUDY AREA
For the research purpose, an area of Nainital District, of Uttarakhand state, India; has been taken.It covers a geographical area of 195 km 2 .The study area is bounded by 29° 13' 22'' to 29° 18' 30" North latitude and 79° 13' 22" to 79° 29' 30" East longitudes as shown in Figure .1.The area has vast variety of terrain such as agricultural fields, dense forest, scarce forest, infrastructure, etc.The area has prominent towns such as Haldwani, Kaladhungi, Chausala, Basani and Fatehpur.According to the forest survey of India; Nainital, Pauri Garhwal and Almora are the most fire prone zones of the Uttarakhand state.For comparing the effect of fire from 2020 to 2021, multi-temporal imagery has been taken from Landsat 7. The images captured are dated 7 th April, 2020 and 26 th April, 2021.

PROPOSED METHODOLOGY
As shown in the Figure 2, the first and foremost step is to perform forest fire study to understand the nature of it, factors that cause it, impacts of the fire, and so on.Next it is important to decide the image dates from which we want the images to be.For this study two images have been chosen with span of one year inbetween them.The next step is to perform image pre-processing, in this, atmospheric corrections are made.Since the study area is in Nainital District, extraction needs to be done for that area from the satellite image, for this a mask is created using QGIS 3.18, and layered over the satellite image.This layering will give the exact coordinates of the study area, which is then cropped.Once the image is cropped, it needs to be classified into various classes based on the spectral signature reflected by the pixels.Here, seven different signatures have been identified and based on that supervised classification is performed having seven classed namely: Sediments, Agriculture, Forest, Scarce Forest, Shrub land, Barren land and Infrastructure.Afterwards, samples from each class are to be collected which will act as training data for SVM.After the sampling is completed, the classification takes place and the accuracy of the classified image is checked.The accuracy assessment is followed by the post classification technique: Majority Analysis (MA).This process will iterate until satisfactory accuracy is achieved.Finally, with images having higher accuracy are used for identifying the changes the area underwent due to the forest fires.

Algorithm
Step 1 Randomly determine a size of the group of pixels.
Step 2 Form an analysis window around pixel p.Let this be called as a neighbourhood of the misclassified pixel.
Step 3 Count the number of pixels belonging to different classes.
Step 4 If there is a majority of pixels, then label the misclassified pixel with the majority.a. Initialize count = 1 and majindex=0.b.
Iterate each pixel and maintain a count of majority pixel.c.Maintain a majority index, majindex.d.If next pixel is same as majority, then increment majindex by 1. e. Else decrement maj_index by 1. f.If count=0 then do; majindex = currpixel and count =1 Step 5 Traverse through neighbourhood and find the count of the majority element.If count is greater than half the size of neighbourhood then print pixel as majority of the labelled class.
Step 6 Repeat steps 4 and 5 till all pixels are labelled.

Tools and Technology
For supervised classification technique, Support Vector Machine (SVM) is used.The kernel used is Radial Basis Function (RBF) and the gamma value given is 0.125.For MA, the neighbourhood size is 5X7 and the pixel weight is 5.The pixel weight determines the number of times the unclassified pixel will iterate before it is labelled.For training set of SVM, the image is classified into seven different classes based on the spectral signatures: Sediments, Agriculture, Forest, Scarce Forest, Shrub Land, Barren Land and Infrastructure.The classified image is different from what normal eyes would perceive, table 2 shows the colour interpretation of the classified image.Table 1 shows the number of samples collected from each image.

RESULTS AND DISCUSSION
Initially the multi-spectral image was classified by machine learning algorithm, to improve the accuracy further, majority analysis was applied on the classified image.Figure 4 and 5 show the supervised classification and post-classification results.Some of the prominent differences are highlighted in Figure 6.For the year 2020, as shown in Figure 6    As shown in table 3, 4, 5 and 6, each row of the matrix is a number of predicted pixels in the class and each column of the matrix corresponds to the actual ground truth pixels.The sum of correct predictions for a class are assigned into the predicted column and expected row for that class value whereas the sum of incorrect predictions for a class are assigned to the expected row for that class value and the predicted column for that specific class value.By looking the values in red colour, we can say that the ground truth and predicted classes are achieved better in the majority analysis technique.Based on the detailed accuracy report overall accuracy and the kappa coefficient have been calculated for both the methods and there is definitely an increase when the post classification technique is applied on the SVM classified image.Based on the confusion matrix, Figure 7 and 8 represent the individual classification results of SVM and MA.By studying the comparison graph it can be concluded that for the image of 2020, there has been rise in the accuracy of the classified image for classes: sediments, agriculture, forest, scarce forest, shrub land, barren land and infrastructure by 13%, 6%, 5%, 17%, 7%, 10% and 30% respectively.Similarly, for image of 2021, there has been rise in the accuracy of the classified image for classes: sediments, agriculture, forest, scarce forest, shrub land, barren land and infrastructure by 10%, 5%, 26%, 11%, 19%, 9% and 17% respectively.Another comparison in terms of the capacity of the classification technique is also represented in the Figure 9 and 10.These charts show the performance of the existing method and the proposed technique.It can be observed that, the total percentage of the unclassified pixels was 11%, which reduced to 2% for image of 2020 and 0% for image of 2021 when MA method was applied on the classified image.

Change Detection
To identify the damage caused by the forest fires, change detection has been performed on the multi-temporal images.Table 7 shows the readings of the area that underwent changes due to the fires caused in the forest and nearby regions.The change statistics tables will list the initial state classes in the columns and the final state classes in the rows.The total class row shows the total number of pixels in each initial state class, and total class column shows the total number of pixels in each final state class.The total row column is a classby-class addition of all final state pixels that come under the selected initial state classes.The class changes row displays the total number of initial state pixels that changed from one class to another class.The image difference row is the difference of total number of correctly classed pixels in the two images, calculated by subtracting the initial state class totals from the final state class totals.The positive value of image difference represents increase in the number of pixels and negative value means there is decrease in the pixel count.A comparison of the effects of before fire and after fire has been shown in Figure 11.It is evident that, due to forest fires there has been a decline in shrub lands, forest and scarce forest.On the contrary, the barren land has increased as an after effect.The sediments i.e. ashes and debris from the fire has settled into the lower lying areas.The health of the vegetation has also been severely affected as a consequence.Figure 12 shows the percentage of increase or decrease due to increased number of forest fire in State of Uttarakhand in last year, changes observed are: • Sediment settlement has increase up to 320%.
• Very minor change is observed in agricultural lands with 4% change.
• As severe fires were reported, the barren land coverage has increased up to 90%, on the contrary forest, scarce forest and shrub lands have decreased up to -40%, -20% and -70% respectively.• Since some of the town areas were also affected by the forest fires, the human settlement as also reduced to -33%.

CONCLUSION
From this study, we can say that our proposed post-classification technique yields better accuracy than standalone SVM.The Support Vector Machine (SVM) classification is first done and then to improve the accuracy of the classified image, a post-classification technique called Majority Analysis (MA) is applied.When the multi spectral image is classified using SVM only the accuracy achieved is 89.35% and 88.52% for before and after images respectively.But when during post-classification by applying the MA technique, the unclassified pixels were classified into respective other classes and the boundary of the classes also smoothened out.This has led to higher accuracy rate for before and after images 98.71% and 99.76% respectively.The change detection study showed a drastic increase in the barren land due to the FF and on the contrary, the forest, scarce forest and the shrub land area has decreased.For future scope, this method can be clubbed with other existing machine learning techniques and other natural calamities such as earthquake, volcanic eruption, etc. can be used for identifying the changes.

Figure 1 .
Figure 1.Location of Nainital District -Study Area Panchromatic sharpening (PAN sharpening) is a famous technique that is widely used to improve the quality satellite image.It combines low resolution colour bands with corresponding high resolution grayscale bands.To improve the image resolution, PAN sharpening of the images is performed for stronger visibility of the pixels as shown in Figure.3.Panchromatic sharpening utilizes spatial information available in the high-resolution grayscale band and the colour information present in the multispectral bands to create a high-resolution colour image.This will prominently lead to increase of resolution of the colour information in the data set.The range of the panchromatic band is 15m, whereas other low-resolution bands have range of 30m.But the panchromatic band will capture the light twice as much as compared with the other bands in the Landsat 8 satellite.This results into much sharper images.

Figure 2 .Figure 3 .
Figure 2. Flow of the proposed system -1(a), 1(b) and 3(a), 3(b), it is clearly seen that after applying majority analysis technique, the sediments were more clearly classified.Similarly comparing 2(a) and 2(b), the scarce forest area is correctly classified instead of forest area.

Table 2 .
Colour interpretation of the classified image

Table 5 .
437accuracy for images in year 2020 for the proposed technique is 98% and kappa coefficient is 0.97 as compared to only 89% accuracy and kappa coefficient 0.87 is achieved in alone SVM classification.Similarly for the images in 2021, the overall accuracy for the proposed technique is 99% and kappa coefficient is 0. as compared to only 88% accuracy and kappa coefficient 0.86 is achieved in alone SVM classification.Confusion matrix of SVM -2021 A Novel Approach for Improving Post Classification Accuracy… (S.Patel and P. Swaminarayan)

Table 7 .
Change detection