Classifications of Arabic Customer Reviews Using  Stemming and Deep Learning

Hawraa Fadhil Khelil; Mohammed Fadhil Ibrahim; Hafsa Ataallah Hussein

doi:10.52549/ijeei.v12i3.5452

Classifications of Arabic Customer Reviews Using Stemming and Deep Learning

Hawraa Fadhil Khelil, Mohammed Fadhil Ibrahim, Hafsa Ataallah Hussein

Abstract

With the emergence of AI text-based tools and applications, the need to present and investigate text-processing tools has also been raised. NLP tools and techniques have developed rapidly for some languages, such as English. However, other languages, like Arabic, still need to present more methods and techniques to present more explanations. In this study, we present a model to classify customer reviews which are written in Arabic. The HARD dataset is used to be adopted as the dataset. Three Deep Learning classifiers are adopted (CNN, LSTM, RNN). In addition to that, three stemmers are used as text processing techniques (Khoja, Snowball, Tashaphyne). Furthermore, another three feature extraction methods were utilized (TF-IDF, N-gram, BoW). The results of the model presented several explanations. The best performance resulted from using (CNN+ Snowball+ N-Gram) with an accuracy of (%93.5). The results of the model stated that some classifiers are sensitive toward using different stemmers, also some accuracy performance can be affected if there are different feature extraction methods used. Either stemming of feature extraction has an impact on the accuracy performance. The model also proved that the dialectal language could cause some limitations since different dialects can give conflict meaning across different regions or countries. The outcomes of the study open the gate towards investigating other tools and methods to enrich Arabic natural language processing and contribute to developing new applications that support Arabic content.

Keywords

Customer Reviews; Arabic Text Classification; Text Stemming; Text Feature Extraction;

References

M. M. Almanea, “Automatic Methods and Neural Networks in Arabic Texts Diacritization: A Comprehensive Survey,” IEEE Access, vol. 9, no. Dl, pp. 145012–145032, 2021, doi: 10.1109/ACCESS.2021.3122977.

O. Oueslati, E. Cambria, M. Ben HajHmida, and H. Ounelli, “A review of sentiment analysis research in Arabic language,” Future Generation Computer Systems, vol. 112, 2020, doi: 10.1016/j.future.2020.05.034.

M. O. Hegazi, Y. Al-Dossari, A. Al-Yahy, A. Al-Sumari, and A. Hilal, “Preprocessing Arabic text on social media,” Heliyon, vol. 7, no. 2, 2021, doi: 10.1016/j.heliyon.2021.e06191.

T. K. Yeow and K. H. Gan, “Improving Comparative Opinion Mining Through Detection of Support Sentences,” 2022. doi: 10.4018/978-1-7998-9594-7.ch004.

R. Kibble, “Introduction to natural language processing Undergraduate study in Computing and related programmes,” Roeper Rev, vol. 1, no. 2, 2013.

M. B. Ressan and R. F. Hassan, “Naïve-Bayes family for sentiment analysis during COVID-19 pandemic and classification tweets,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 28, no. 1, 2022, doi: 10.11591/ijeecs.v28.i1.pp375-383.

S. L. Marie-Sainte, B. S. Alnamlah, N. F. Alkassim, and S. Y. Alshathry, “A new system for Arabic recitation using speech recognition and Jaro Winkler algorithm,” Kuwait Journal of Science, vol. 49, no. 1, 2022.

I. Guellil, H. Saâdane, F. Azouaou, B. Gueni, and D. Nouvel, “Arabic natural language processing: An overview,” Journal of King Saud University-Computer and Information Sciences, vol. 33, no. 5, pp. 497–507, 2021.

H. Elzayady, M. S. Mohamed, K. M. Badran, and G. I. Salama, “Detecting Arabic textual threats in social media using artificial intelligence: An overview,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 25, no. 3, pp. 1712–1722, 2022.

X. Deng, Y. Li, J. Weng, and J. Zhang, “Feature selection for text classification: A review,” Multimed Tools Appl, vol. 78, no. 3, 2019, doi: 10.1007/s11042-018-6083-5.

Ahmed Burhan Mohammed, “Decision Tree, Naïve Bayes and Support Vector Machine Applying on Social Media Usage in NYC / Comparative Analysis,” Tikrit Journal of Pure Science, vol. 22, no. 9, 2023, doi: 10.25130/tjps.v22i9.881.

M. F. Ibrahim, M. A. Alhakeem, and N. A. Fadhil, “Evaluation of Naïve Bayes Classification in Arabic Short Text Classification,” Al-Mustansiriyah Journal of Science, vol. 32, no. 4, pp. 42–50, Nov. 2021, doi: 10.23851/mjs.v32i4.994.

M. E. M. Abo et al., “A multi-criteria approach for arabic dialect sentiment analysis for online reviews: Exploiting optimal machine learning algorithm selection,” Sustainability (Switzerland), vol. 13, no. 18, 2021, doi: 10.3390/su131810018.

N. T. Mohammed, E. A. Mohammed, and H. H. Hussein, “Evaluating Various Classifiers for Iraqi Dialectic Sentiment Analysis,” in Lecture Notes in Networks and Systems, 2023. doi: 10.1007/978-981-19-1412-6_6.

A. Karimi, L. Rossi, and A. Prati, “AEDA: An Easier Data Augmentation Technique for Text Classification,” Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021, pp. 2748–2754, 2021, doi: 10.18653/v1/2021.findings-emnlp.234.

H. A. Almuzaini and A. M. Azmi, “Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization,” IEEE Access, vol. 8, pp. 127913–127928, 2020, doi: 10.1109/ACCESS.2020.3009217.

L. Zhang, W. Jiang, and Z. Zhao, “Short-text feature expansion and classification based on non-negative matrix factorization,” in Machine Learning for Cyber Security: Third International Conference, ML4CS 2020, Guangzhou, China, October 8–10, 2020, Proceedings, Part III 3, Springer, 2020, pp. 347–362.

H. Elzayady, K. M. Badran, and G. I. Salama, “Arabic Opinion Mining Using Combined CNN - LSTM Models,” International Journal of Intelligent Systems and Applications, vol. 12, no. 4, pp. 25–36, 2020, doi: 10.5815/ijisa.2020.04.03.

A. M. Bdeir and F. Ibrahim, “A framework for arabic tweets multi-label classification using word embedding and neural networks algorithms,” in Proceedings of the 2020 2nd International Conference on Big Data Engineering, 2020, pp. 105–112.

G. Lu, J. Gan, J. Yin, Z. Luo, B. Li, and X. Zhao, “Multi-task learning using a hybrid representation for text classification,” Neural Comput Appl, vol. 32, no. 11, pp. 6467–6480, 2020, doi: 10.1007/s00521-018-3934-y.

W. Cherif, A. Madani, and M. Kissi, “Text categorization based on a new classification by thresholds,” Progress in Artificial Intelligence, vol. 10, no. 4, pp. 433–447, 2021, doi: 10.1007/s13748-021-00247-1.

M. M. Saeed and Z. Al Aghbari, “ARTC: feature selection using association rules for text classification,” Neural Comput Appl, vol. 34, no. 24, pp. 22519–22529, 2022, doi: 10.1007/s00521-022-07669-5.

S. M. Alzanin, A. M. Azmi, and H. A. Aboalsamh, “Short text classification for Arabic social media tweets,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 9, pp. 6595–6604, 2022.

S. K. Prabhakar, “Models with Multihead Attention,” vol. 2021, 2021.

A. Elnagar, Y. S. Khalifa, and A. Einea, “Hotel Arabic-reviews dataset construction for sentiment analysis applications,” Intelligent natural language processing: Trends and applications, pp. 35–52, 2018.

H. El Rifai, L. Al Qadi, and A. Elnagar, “Arabic text classification: the need for multi-labeling systems,” Neural Comput Appl, vol. 34, no. 2, 2022, doi: 10.1007/s00521-021-06390-z.

Y. S. and E. A. Elnagar Ashraf and Khalifa, “Hotel Arabic-Reviews Dataset Construction for Sentiment Analysis Applications,” in Intelligent Natural Language Processing: Trends and Applications, A. E. and T. F. Shaalan Khaled and Hassanien, Ed., Cham: Springer International Publishing, 2018, pp. 35–52. doi: 10.1007/978-3-319-67056-0_3.

Hawraa Fadhil Khelil, Mohammed Fadhil Ibrahim, Hafsa Ataallah Hussein, and Raed Kamil Naser, “Evaluation of Different Stemming Techniques on Arabic Customer Reviews,” Journal of Techniques, vol. 6, no. 1, pp. 103–111, Feb. 2024, doi: 10.51173/jt.v6i1.2313.

S. Alyami, A. Alhothali, and A. Jamal, “Systematic literature review of Arabic aspect-based sentiment analysis,” Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 9, pp. 6524–6551, 2022.

O. Oueslati, E. Cambria, M. Ben HajHmida, and H. Ounelli, “A review of sentiment analysis research in Arabic language,” Future Generation Computer Systems, vol. 112, pp. 408–430, 2020.

M. Alhanjouri, “Pre Processing Techniques for Arabic Documents Clustering,” International Journal of Engineering and Management Research, no. 2, pp. 70–79, 2017.

B. Jurish and K.-M. Würzner, “Word and Sentence Tokenization with Hidden Markov Models,” Journal for Language Technology and Computational Linguistics, vol. 28, no. 2, pp. 61–83, 2013, doi: 10.21248/jlcl.28.2013.176.

I. A. El-Khair, “Effects of Stop Words Elimination for Arabic Information Retrieval: A Comparative Study,” pp. 1–15, 2017.

A. Alajmi, E. M. Saad, and R. R. Darwish, “Toward an ARABIC stop-words list generation,” Int J Comput Appl, vol. 46, no. 8, pp. 8–13, 2012.

T. Kanan, O. Sadaqa, A. Almhirat, and E. Kanan, “Arabic light stemming: A comparative study between p-stemmer, khoja stemmer, and light10 stemmer,” in 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), IEEE, 2019, pp. 511–515.

K. Tan, C.-P. Lee, K. Lim, and K. Anbananthen, “Sentiment Analysis With Ensemble Hybrid Deep Learning Model,” IEEE Access, vol. PP, p. 1, Jan. 2022, doi: 10.1109/ACCESS.2022.3210182.

A. M. Alayba, V. Palade, M. England, and R. Iqbal, “Improving Sentiment Analysis in Arabic Using Word Representation,” in 2nd IEEE International Workshop on Arabic and Derived Script Analysis and Recognition, ASAR 2018, IEEE, 2018, pp. 13–18. doi: 10.1109/ASAR.2018.8480191.

T. Kanan, O. Sadaqa, A. Almhirat, and E. Kanan, “Arabic Light Stemming: A Comparative Study between P-Stemmer, Khoja Stemmer, and Light10 Stemmer,” in 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), IEEE, Oct. 2019, pp. 511–515. doi: 10.1109/SNAMS.2019.8931842.

K. Abainia, S. Ouamour, and H. Sayoud, “A novel robust Arabic light stemmer,” Journal of Experimental & Theoretical Artificial Intelligence, vol. 29, no. 3, pp. 557–573, May 2017, doi: 10.1080/0952813X.2016.1212100.

F. E. Zamani, K. Umam, W. D. I. Azis, and W. S. Abdillah, “Analysis and implementation of computer-based system development of stemming algorithm for finding Arabic root word,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Dec. 2019. doi: 10.1088/1742-6596/1402/6/066030.

M. El-Masri, N. Altrabsheh, and H. Mansour, “Successes and challenges of Arabic sentiment analysis research: a literature review,” Soc Netw Anal Min, vol. 7, no. 1, p. 54, Dec. 2017, doi: 10.1007/s13278-017-0474-x.

Y. A. Alhaj, J. Xiang, D. Zhao, M. A. A. Al-Qaness, M. Abd Elaziz, and A. Dahou, “A Study of the Effects of Stemming Strategies on Arabic Document Classification,” IEEE Access, vol. 7, 2019, doi: 10.1109/ACCESS.2019.2903331.

M. O. Alhawarat, H. Abdeljaber, and A. Hilal, “Effect of stemming on text similarity for Arabic language at sentence level,” PeerJ Comput Sci, vol. 7, p. e530, May 2021, doi: 10.7717/peerj-cs.530.

A. Oussous, A. A. Lahcen, and S. Belfkih, “Impact of Text Pre-processing and Ensemble Learning on Arabic Sentiment Analysis,” Proceedings of the 2nd International Conference on Networking, Information Systems & Security, 2019.

X. Li, Z. Li, H. Qiu, G. Hou, and P. Fan, “An overview of hyperspectral image feature extraction, classification methods and the methods based on small samples,” Applied Spectroscopy Reviews, vol. 58, no. 6. 2023. doi: 10.1080/05704928.2021.1999252.

M. Avinash and E. Sivasankar, “A study of feature extraction techniques for sentiment analysis,” in Advances in Intelligent Systems and Computing, 2019. doi: 10.1007/978-981-13-1501-5_41.

X. Chen, Y. Xue, H. Zhao, X. Lu, X. Hu, and Z. Ma, “A novel feature extraction methodology for sentiment analysis of product reviews,” Neural Comput Appl, vol. 31, pp. 6625–6642, 2019.

H. Najadat, M. A. Alzubaidi, and I. Qarqaz, “Detecting Arabic Spam Reviews in Social Networks Based on Classification Algorithms,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 21, no. 1. Association for Computing Machinery, Jan. 01, 2022. doi: 10.1145/3476115.

R. Ahuja, A. Chug, S. Kohli, S. Gupta, and P. Ahuja, “The impact of features extraction on the sentiment analysis,” Procedia Comput Sci, vol. 152, pp. 341–348, 2019.

A. Madasu and S. Elango, “Efficient feature selection techniques for sentiment analysis,” Multimed Tools Appl, vol. 79, no. 9–10, pp. 6313–6335, 2020, doi: 10.1007/s11042-019-08409-z.

J. Mutinda, W. Mwangi, and G. Okeyo, “Lexicon-pointed hybrid N-gram Features Extraction Model (LeNFEM) for sentence level sentiment analysis,” Engineering Reports, vol. 3, no. 8, 2021, doi: 10.1002/eng2.12374.

F. Shannag, B. H. Hammo, and H. Faris, “The design, construction and evaluation of annotated Arabic cyberbullying corpus,” Educ Inf Technol (Dordr), vol. 27, no. 8, pp. 10977–11023, Sep. 2022, doi: 10.1007/s10639-022-11056-x.

M. Alhawarat and A. O. Aseeri, “A Superior Arabic Text Categorization Deep Model (SATCDM),” IEEE Access, vol. 8, pp. 24653–24661, 2020, doi: 10.1109/ACCESS.2020.2970504.

L. Zhang, W. Jiang, and Z. Zhao, “Short-text feature expansion and classification based on nonnegative matrix factorization,” International Journal of Intelligent Systems, vol. 37, no. 12, pp. 10066–10080, 2022, doi: 10.1002/int.22290.

S. Larabi-Marie-Sainte, B. S. Alnamlah, N. F. Alkassim, and S. Y. Alshathry, “A new framework for Arabic recitation using speech recognition and the Jaro Winkler algorithm,” Kuwait Journal of Science, vol. 49, no. 1, 2022, doi: 10.48129/KJS.V49I1.11231.

S. Boukil, M. Biniz, F. El Adnani, L. Cherrat, and A. E. El Moutaouakkil, “Arabic text classification using deep learning technics,” International Journal of Grid and Distributed Computing, vol. 11, no. 9, pp. 103–114, 2018, doi: 10.14257/ijgdc.2018.11.9.09.

T. Kanan and E. A. Fox, “Automated arabic text classification with P‐S temmer, machine learning, and a tailored news article taxonomy,” J Assoc Inf Sci Technol, vol. 67, no. 11, pp. 2667–2683, 2016.

W. Alabbas, H. M. Al-Khateeb, and A. Mansour, “Arabic text classification methods: Systematic literature review of primary studies,” Colloquium in Information Science and Technology, CIST, vol. 0, no. x, pp. 361–367, 2016, doi: 10.1109/CIST.2016.7805072.

S. Bodapati, H. Bandarupally, R. N. Shaw, and A. Ghosh, “Comparison and analysis of RNN-LSTMs and CNNs for social reviews classification,” Advances in Applications of Data-Driven Computing, pp. 49–59, 2021.

D. Alsaleh and S. Larabi-Marie-Sainte, “Arabic Text Classification Using Convolutional Neural Network and Genetic Algorithms,” IEEE Access, vol. 9, pp. 91670–91685, 2021, doi: 10.1109/ACCESS.2021.3091376.

B. Jang, M. Kim, G. Harerimana, S. Kang, and J. W. Kim, “Bi-LSTM model to increase accuracy in text classification: Combining Word2vec CNN and attention mechanism,” Applied Sciences, vol. 10, no. 17, p. 5841, 2020.

M. Ahmed, P. Chakraborty, and T. Choudhury, “Bangla document categorization using deep RNN model with attention mechanism,” in Cyber Intelligence and Information Retrieval: Proceedings of CIIR 2021, Springer, 2022, pp. 137–147.

J. Du, C.-M. Vong, and C. L. P. Chen, “Novel efficient RNN and LSTM-like architectures: Recurrent and gated broad learning systems and their applications for text classification,” IEEE Trans Cybern, vol. 51, no. 3, pp. 1586–1597, 2020.

X. Li and H. Ning, “Ce text classification based on hybrid model of CNN and LSTMhines,” in Proceedings of the 3rd International Conference on Data Science and Information Technology, 2020, pp. 129–134.

J. Han, J. Pei, and H. Tong, Data mining: concepts and techniques. Morgan kaufmann, 2022.

Full Text: PDF

Refbacks

There are currently no refbacks.

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)
ISSN 2089-3272

This work is licensed under a Creative Commons Attribution 4.0 International License.

View IJEEI Stats

Username
Password
Remember me