Sentiment Analysis in Karonese Tweet using Machine Learning
Abstract
Recently, many social media users expressed their conditions, ideas, emotions using local languages on social media, for example via tweets or status. Due to the large number of texts, sentiment analysis is used to identify opinions, ideas, or thoughts from social media. Sentiment analysis research has also been widely applied to local languages. Karonese is one of the largest local languages in North Sumatera, Indonesia. Karo society actively use the language in expression on twitter. This study proposes two things: Karonese tweet dataset for classification and analysis of sentiment on Karonese. Several machine learning algorithms are implemented in this research, that is Logistic regression, Naive bayes, K-nearest neighbor, and Support Vector Machine (SVM). Karonese tweets is obtained from timeline twitter based on several keywords and hashtags. Transcribers from ethnic figures helped annotating the Karo tweets into three classes: positive, negative, and neutral. To get the best model, several scenarios were run based on various compositions of training data and test data. The SVM algorithm has highest accuracy, precision, recall, and F-1 scores than others. As the research is a preliminary research of sentiment analysis on Karonese language, there are many feature works to improvement.
Keywords
Karonese; Sentiment analysis; Logistic regression; Naïve Bayes; K-nearest neighbor; Support Vector Machine
Full Text:
PDF
Refbacks
- There are currently no refbacks.
Indonesian Journal of Electrical Engineering and Informatics (IJEEI)
ISSN 2089-3272
This work is licensed under a Creative Commons Attribution 4.0 International License.