The Prediction of Earthquake Building Structure Strength: Modified K-Nearest Neighbour Employment

Okfalisa Okfalisa, Septian Nugraha, Saktioto Saktioto, Zahidah Zulkifli, S.S.M. Fauzi Department of Informatics Engineering, Faculty Science and Technology, Universitas Islam Negeri Sultan Syarif Kasim Riau, Pekanbaru, Indonesia Department of Physics, Faculty of Mathematics and Natural Sciences, Universitas Riau, Pekanbaru, Indonesia Department of Information Systems, Kuliyyah of Information Communication and Technology, International Islamic University Malaysia, Malaysia Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA Perlis Branch, Arau Campus 02600 Arau, Perlis, Malaysia


INTRODUCTION
Technological advancement has taken into account the control of various parts and human life fields, including the concrete building construction. Theoretically, the design planned of multi-stories construction should be equipped with structural document analysis and building Information in supporting the outlined instrument towards the physical development stage [1]. Building Information Modelling (BIM) integrates intelligent objects with the data regarding a specific component from geometric characteristics to provide the building information [1]. Previous researchers consider BIM an evolving technology that proposes excellent benefits for the architecture, engineering, and construction industry [2]. The most significant contribution of BIM in structural engineering activities is to better analyze situations through simulation, the coordination, consistency of data, and visualization [3] and [4]. The reduction of design, drafting errors, and costs were identified due to improved productivity [5][6][7]. BIM offers facilities in developing, executing, and managing  [8]. However, big data's emergence becomes a challenge for BIM to outdated data usage and experience-based decision-making [9]. The heterogeneity information, storage complexity, and users' specialized functions tend to lead to non-intuitive data, inaccurate data, tedious and costly data [10]. Thus, inaccurate records deliver confused data analysis processes, lower data quality, and negative implications [11]. Lin et al. [11] have found that data mining processes can enhance BIM data. This data mining has the potential to find latent patterns, as well as a prediction based on the rise of information requirements [8]. Nowadays, data mining's successful application in solving extensive data analytical for many fields of study, especially in the building environment, has been pointed out with or without support by the BIM platform. It successfully classifies and predicts the big data as the purpose of analytical problems. Yu et al. [12] proposed a data mining framework, including classification, clustering, and association rule mining, to analyze building-related data more efficiently. Xiao and Fan [13] applied the clustering analysis to identify the tallest building's typical power consumption patterns in Hongkong. A big data analysis framework has been effectively retrieved and calculated the required information from dynamic BIM by Hung-Ming et al. [14]. A. Costa et al. [15] proposed a novel integrated toolkit designed to analyze performance monitoring and analysis building operation and energy performance. Herein, BIM applied as methods and technology in structured performance definitions. The above studies showed that data mining techniques are valuable knowledge discovery to obtain better building operational performance. Data mining had a tremendous potential technology to discover hidden knowledge in large data sets, and this method had significant superiority in prediction accuracy [16,17]. Data mining techniques can accelerate the analysis to be more accurate, reliable, and computationally efficient models [12].
The impact of the earthquake is ravaging. Thus, it terminates the socio-economic activities of a region within a short period. Due to the earthquake's effect, the consideration of structure configuration, the type of material used, and the structural building system are substantial [18]. Magnitude earthquake loads on building structures depend on the horizontal force, vertical force, torque earthquake moment in the structures, weight, and stiffness of the structural material, configuration and structural system, vibration time, ground conditions, earthquake zones, and earthquake behavior. Dynamic analysis procedures are hard to predict and require classifying earthquake loads and a seismic response to the building structures [19]. Besides the structure's construction details, the uncertainty of building structures' collapse capacity during the earthquakes is also influenced by concrete compression strength, steel tension strength, earthquake energy content and frequency, and structure resistance model [20]. Thus, it triggers many efforts and risks involving uncertainty in the seismic responses.
In Indonesia, the seismic responses are arranged based on earthquake resilience planning guidelines for home and building structures in SNI 03-1726-2002. SNI was in line with the international seismic code as a seismic-resistant structure design, describing structural analysis performance despite the build-structure weight, considerable dead weight, and earthquakes' retention for sensitiveness areas. Thus, building workforce behaviors and deformations controlled are exhibited [21]. Moreover, the building's un-properly designed will cause discomfort and damage to non-structural components, including partition walls, windows, and doors, thus block the evacuation passage [22]. Unfortunately, SNI 03-1726-2002 was outdated [23] and replaced by SNI 03-1726-2012 through the increasing of peak ground acceleration (PGA) from 10% probability into 2% [20,21]. Therefore, this research adopted SNI 03-1726-2012 to provide seismic resistant structure design as the primary data.
Previous research has been discussed on building structures' seismic response using artificial intelligence techniques, including machine learning and intelligent data analysis or big data analysis. Maram et al. [24] predicted the seismic behavior of reinforced concrete buildings using artificial neural networks (ANN). An ANN algorithm was also studied to consider structural vibration for the uncertainties in calculating damage identification data [25]. By emphasizing the type of soil, Karbassi et al. [26] applied decision tree algorithms to predict the stable reinforced building structure and found 95% accuracy for hard soil cases and 97% for soft soil cases. Y. Zhang et al. [27] investigated the safety of building structures post-earthquake using the machine learning method, the Classification and Regression Tree (CART), and Random Forest algorithms. This assessment provided 91% and 88% accuracy for the safety state in response and damage patterns respectively. In the case of a prestressed concrete bridge subject to earthquakes, Pei and Smyth [28] have been successfully investigated a feedforward neural network. Abd-elhamed et al. [18] proposed a Logical analysis of data (LAD) to simulate and blindly predict the dynamic response behavior of building structures against the earthquake loads. Nevertheless, the number of input variables of the above studies is still relatively small and has been pre-defined based on domain knowledge. To date, this research selected the influenced variables in the big series dataset of building resistance subjected to the earthquake and then used to map into the class prediction as outputs. A vast data set mining will be analyzed using the computationally intensive method to explore the strength of databases and possible integration as valuable information for BIM. A particular prediction task requires intelligent exploratory data analysis. Thus, it leads to a reliable and accurate algorithm. K-Nearest Neighbors (KNN) is a data mining algorithm for non-parametric data classification that classifies and predicts big data for analytical problems. KNN becomes one of the most popular neighborhood classifiers. It provides a more flexible approach, complex, ease of understands, and interpretation [29], but it delivers highly competitive results [30]. The other advantages of KNN are including robust to noise training data and more effective in extensive training data [31]. Comparing to the Bayes algorithm, SVM, Fisher's linear discriminant analysis, Partial least squares discriminant analysis, Classification tree, Random Forest, and other Euclidean distance calculations, KNN revealed better efficiency and performance [32][33][34] even compared to SVM [35]. The KNN algorithm, therefore, allows the whole classification features to be determined. It thus affects classification variance and lowers precision [36].
Moreover, the computation cost is relatively high due to each query instance, large memory load, low accuracy rate in multidimensional datasets, the unclear of distance-based learning [32]. As the advancement of the KNN, the modified KNN is introduced. It operates concerning the weighted data training voting values, which fail to handle the KNN calculation [36]. This method assigns the class label of the data according to k validated data points of the data train set and eliminate those instances which fail the validity test. Thus, it overcomes the low accuracy and disadvantages of KNN [37]. Hamid et al. [31] evaluated KNN and MK-KNN on five different data sets. The result indicated the improvement of accuracy in comparison with the KNN method.
This study is applying the MK-NN to predict the performance level of the building structure in 2-story offices. The proposed data mining method has never been utilized in predicting the classification of building structure resistance subjective to the earthquake. Therefore, it provokes a new challenge in this research. As a case study, 6663 data from building construction in the area of Bangkinang city, in Riau Province, Indonesia, was calculated. The earthquake history, time, concrete quality, displacement, velocity, and acceleration became consideration variables in classifying the destruction into two classes, including Safe and Immediate Occupancy (IO) classes. A software prototype has been created to calculate the data perceived in the MK-NN algorithm automatically. The software can be utilized as a consideration tool in preparing new buildings and a recommendation system for concrete construction companies, particularly against an earthquake. The advantages mining analysis method in the database provides pertinent information on the integration of BIM.

RESEARCH METHOD
Data mining is a branch of science that studies methods for finding useful information from big data. It utilizes statistical techniques, mathematics, artificial intelligence, and machine learning to extract and interpret useful information from various large-sized data and presents it for sophisticated, valuable information and knowledge. Data mining can solve the problems related to the classification, regression, clustering, and association rule learning of data based on the purpose of analysis. The data pattern is descriptively and predictively displayed in this approach.
Data mining in the concept of Knowledge Discovery in Database (KDD) can analyze the data by applying algorithms to generate a list of patterns, models, and data [38]. It follows an interactive stage and lets the user to directly involved in the knowledge base (See Figure 1). KDD provides some benefits to potentially useful tasks, leads to useful insight, understandable immediately or after some post-processing [38]. KDD processes iterative and interactive sequence activities, including selecting a subset variable on which discovery has to be performed; Preprocessing-clean data by removal noise modeling, handling missing data field, and miscoding and accounting time sequence information. The dirty sets of big data can lead to inaccurate analytics, uncertain outcome, and unpredictable conclusions [39]; Transformation-reducing and projecting data as a specific task performed; Datamining-extracting interesting pattern by applying some methods (e.g., summarization, classification, clustering, and regression); Interpretation/Evaluation-visualizing the pattern to interpret and extract the knowledge. Figure 1 shows the clarification.  [40]. In a nutshell, the KNN algorithm is one of the lazy learning categories commonly used to predict data. This method classifies the objects based on learning the nearest k's value of data.
The distance of each sample of training data (x) against data tr (y) is calculated based on the following Euclidean Distance equation [41].
where: d = the distance between the points in training data x and the points in testing data y that will be classified, where x=x 1 ,x 2 ,…,x i and y=y 1 ,y 2 ,…,y i i = the value of the attribute p = an attribute dimension.
MK-NN is implemented by considering weight voting values in data training to pursue several merits and demerits of KNN. The validity of data training and weight voting using the following formulae is the various processes for the advancement of KNN [8].
(2) K = number of closest points. label(x) = class x. label (Ni(x)) = The class label nearest point x. The S function calculates the resemblance between x point and the i th data from the nearest neighbor.
where a is the class in the training data, and b is another one.
The following formula is used for weight voting [6].
W (i) = weight voting calculation Validity (i) = validity value Where d e is the Euclidean distance of Equation 1. The accuracy value calculation in the confusion matrix is required to determine the classification's success rate. The formulas in Equations 5 and 6 are given [13]. of M-KNN. Finally, the MK-NN prediction procedures mechanism was then covered by system development that adopted PHP programming languages and MySQL for the database. The MK-NN prediction system was designed as simulation tools for mining the earthquake data that can be embedded in BIM in order to provide a better analysis of situations in structural engineering activities. The architecture of the MK-NN prediction system was then explained as the outcome of this mechanism. To date, the Blackbox and UAT testing were carried out in ensuring the reliability of the MK-NN prediction system.

RESULTS AND DISCUSSION
As shown in Figure 2, the research flow activities were carried out by identifying previous research reviews mainly addressed in K-NN and MK-NN. The algorithm was investigated and developed as case studies for predicting the structural strength of concrete buildings against earthquakes. In the next activity, 6663 data from two-story office buildings structure in Bangkinang City, Riau Province, Indonesia, were collected and analyzed based on the SNI 03-1726-2012. A simulation tool using civil engineering software, namely SAP2000, is used to generate structural systems from earthquake data and SNI 03-1726-2012. To date, the civil engineering experts from Universitas Riau was engaged for validation.
This classification makes use of parameters input viz. the time history-time (seconds), the concrete quality (f'c), the displacement in point 118 (direction x, y, z), and point 124 (direction x, y, z), the velocity in point 118 (direction x, y, z), and point 124 (direction x, y, z), and acceleration in point 118 (direction x, y, z), and point 124 (direction x, y, z). Generally, the k value is determined in an odd number to avoid the appearance of similar distance values during the classification process. Herein, the values were set in 1, 3, 5, 7, 9, and 11. The cascading algorithm analysis of MK-NN was performed by providing two output classes, namely "Safe" or "Immediate Occupancy" (IO). The confusion matrix was used to measure its precision as a test of data and simulation at 10:90, 20:80, 30:70 [14]. Due to its speed, simplicity, and versatility, the above simulation was trained and tested with a high estimate of different data [14].

KDD flow process: data selection and transformation
The KDD method is followed in Figure 1. Data were cleaned, as then missing values could be eliminated. This data is conveniently complete and ready for review. The selection operation eliminated attributes, such as time history, time, and concrete quality (Fc'), which were indirectly evaluated during mining (see Figure 3). Figure 3 explained KDD's transformation process from the primary data generated by SAP2000 to execute data mining.

MK-NN process
MK-NN analysis was initiated by calculating Euclidean distance in Equations 1 and 2 between trained and tested data. The determination of S as in Equation 3 was then followed. Equation 4 has been carried out for the weight calculation. The calculation led to the exhibition of Table 1   The dominant class was cross-checked between the actual and predicted class and the majority-weight voting class's effectiveness. Equations 5 and 6 have been measured regarding the confusion matrix, with the class prediction accuracy. For example, Table 2 shown only nine data (1.35 percent) were found in the "False" from the prediction of 666 tested ratios 10:90.

Comparative analysis of testing results
The confusion matrix evaluated the accuracy and error rate measurement values based on Equation 5 and 6, as presented in Table 3. It explained the calculation values for simulation 90:10 with 5973 trained dan 666 tested data, for k=1. The formula determined the values of TP=651, FN=6, FP=3, and TN=6. Thus, the accuracy was figured at 98.65%, and the error-rate at 1.35%. The comparative category applied for 90:10, 80:20, and 70:30, and the value of k=1, 3, 5, 7, 9, and 11, respectively. The recapitulation appraisal has contributed to the presentation of Table 4 and Table 5. Table 4 exhibits the maximum precision found in k=1 (98.85%) in either ratio 30:70 and k=11(98.30%) in even a ratio of 30:70 for the lowest sensitivity. Table 5 clarified that the lower error rate for k=1 (1.15 percent) was discovered at 30:70, and the highest error rate of 30:70 for k=11 (1.70 percent) is unveiled. This finding has shown that MK-NN is an improved predictive process with minimum accuracy at 98.30 percent. Table 6 and 7 construed the testing result of the confusion matrix for KNN. The highest accuracy initiated at k=9 and ratio 30:70 (97.83%). Meanwhile, the lowest error rate got going on 1.84% with a similar k value and ratio. To support this finding, Okfalisa et al. [42] have successfully conducted a comparative study between KNN and M-KNN for classifying 7395 records data of the conditional cash transfer implementation unit. The testing result using the confusion matrix identified the highest accuracy of KNN achieved into 94.95%. Meanwhile, the highest accuracy of M-KNN brought about 99.51%. Hamid et al. [31] experimented that the M-KNN method significantly outperforms the KNN method up to 3.2%, using the different choices of value k over five different datasets. The case study was arranged with two-class data set, 34 features, and 351 sample points. In a nutshell, M-KNN succeeded in proposing a very significant increase in accuracy by applying weighted KNN [16].
In contrast with the previous investigation, ANN has achieved only 95% accuracy with comparable primary data resources [43]. In a nutshell, the reliable prediction for the earthquake cases significantly impends above the shape of the loading vector used in the pushover analysis for such buildings condition such as numbers of story, PGA values, earthquake region, earthquake surrounding environment, and others earthquake parameters [26] [27]. Meanwhile, machine learning reconnoiters algorithm prediction accuracy related to different training values for each attribute. Consequently, the algorithms can be easily updated and improved [27]. Herein, this study showed how M-KNN reveals the likelihood values that could be enhancing the accuracy of reinforcement concrete buildings prediction, subjected to the earthquakes [31] [37].

Method testing through software MK-NN prediction
The software application was designed developed by following detailed analysis. It begins with the KDD analysis, MK-NN calculation, and confusion matrix assessment phase (see Figure 4). The application can be used to measure and present the knowledge analysis in the data mining platform automatically. The end-user takes in the generated earthquake data and is then processed by following the KDD steps activities inclusive of Euclidean and weighted voting calculation. The MK-NN algorithm will inaugurate the readily used data and classify it as a parameter description. The MK-NN system interface displays the classification calculation results together with the explanation of accuracy and error rate level. This MK-NN prediction system is adequate to accomplish 6663 data subjected to this case study. However, the MK-NN prediction system was constructed dynamically to enclose the various data and earthquake parameters. Figure 5 demonstrated the example of the MK-NN prediction system interface for Euclidean calculation.  The equivalence class dividing technique was performed in BlackBox testing. The Blackbox verified three primary interfaces in conjunction with the login page, data management page, and evaluation page. The testing reveals that the entire functions in the application can run as well as expected. Via the distribution of questionnaires to 30 interviewees in the civil engineering field, the user acceptance test (UAT) was carried. The questionnaire was designed in five Linkert scales from strongly agreed and strongly not agreed to ask the interface, contribution, and system functionality. As a result, eighty-seven percent of respondents strongly agreed that this application helped them employ the MK-NN method to predict building structure concrete, and eighty-eight percent of respondents on the device interface were extremely user friendly.

CONCLUSION
This study was successfully applied to the MK-NN approach for preventing earthquakes from the quality of concrete structures. Three significant parameters, meaning displacement, velocity, and acceleration of data set buildings, were analyzed and rated at 98.85 percent and 1.15 percent, respectively, with supreme precision and error rates. The confusion matrix calculation indicated that the comparative ratio of 30:70 in k=1 of the dataset revealed promising results in prediction. A comparative analysis revealed that MK-NN could improve the accuracy and error rate of KNN up to 1.02% and 0.69%, respectively. A prototype application has been successfully developed and tested to measure the enormous size of data using MK-NN. This tool gave the prediction of the concrete construction structure. It may thus play an essential role in the preparation and