Automated Learning of Hungarian Morphology for Inflection Generation and Morphological Analysis

Gabor Szabo, Laszlo Kovacs

Abstract


The automated learning of morphological features of highly agglutinative languages is an important research area for both machine learning and computational linguistics. In this paper we present a novel morphology model that can solve the inflection generation and morphological analysis problems, managing all the affix types of the target language. The proposed model can be taught using (word, lemma, morphosyntactic tags) triples. From this training data, it can deduce word pairs for each affix type of the target language, and learn the transformation rules of these affix types using our previously published, lower-level morphology model called ASTRA. Since ASTRA can only handle a single affix type, a separate model instance is built for every affix type of the target language. Besides learning the transformation rules of all the necessary affix types, the proposed model also calculates the conditional probabilities of the affix type chains using relative frequencies, and stores the valid lemmas and their parts of speech. With these pieces of information, it can generate the inflected form of input lemmas based on a set of affix types, and analyze input inflected word forms. For evaluation, we use Hungarian data sets and compare the accuracy of the proposed model with that of state of the art morphology models published by SIGMORPHON, including the Helsinki (2016), UF and UTNII (2017), Hamburg, IITBHU and MSU (2018) models. The test results show that using a training data set consisting of up to 100 thousand random training items, our proposed model outperforms all the other examined models, reaching an accuracy of 98% in case of random input words that were not part of the training data. Using the high-resource data sets for the Hungarian language published by SIGMORPHON, the proposed model achieves an accuracy of about 95-98%.

Keywords


morphology; agglutination; inflection generation; morphological analysis; automated learning

Full Text: PDF

Refbacks

  • There are currently no refbacks.


 

Indonesian Journal of Electrical Engineering and Informatics (IJEEI)
ISSN 2089-3272

Creative Commons Licence

This work is licensed under a Creative Commons Attribution 4.0 International License.

web analytics
View IJEEI Stats

Error. Page cannot be displayed. Please contact your service provider for more details. (11)