Journals Information
Universal Journal of Educational Research Vol. 8(10), pp. 4996 - 5004
DOI: 10.13189/ujer.2020.081073
Reprint (PDF) (237Kb)
Building a Corpus for Vietnamese Text Readability Assessment in The Literature Domain
An-Vinh Luong 1,2,*, Diep Nguyen 3,2, Dien Dinh 1,2
1 Computational Linguistics Center, University of Science, Ho Chi Minh City, Vietnam
2 Vietnam National University, Ho Chi Minh City, Vietnam
3 Department of Linguistics, University of Social Sciences & Humanities, Ho Chi Minh City, Vietnam
ABSTRACT
Text readability is a measure of how easy or difficult it is to read a text. This readability factor plays a crucial role in the processes of drafting and comprehending the texts, affecting the choice of proper texts for reading. Studies on the readability of text have started since the late nineteenth century and there have been many practical applications. However, these studies are mainly performed in English and other popular languages. In Vietnamese, the study of the text readability is still relatively untapped and has only received attention in recent years in the process of improving the curriculum and teaching methods. Recent studies on the readability of text in Vietnamese language are still limited, the main reason was largely due to the lack of text resources, which are corpora graded accordingly to difficulty levels. Therefore, in this study, we focused on building a corpus for assessing the readability of Vietnamese texts in the literature domain through the process of collecting, processing and evaluating documents. The result is that we have built up a corpus of 1,825 Vietnamese texts, divided into four levels of difficulty (Very easy, Easy, Medium and Difficult). Experiments with the existing Vietnamese readability assessment methods show that the built corpus is reliable and usable for further research on the text readability.
KEYWORDS
Text Readability, Vietnamese Language, Vietnamese Text Readability, Text Readability Corpus
Cite This Paper in IEEE or APA Citation Styles
(a). IEEE Format:
[1] An-Vinh Luong , Diep Nguyen , Dien Dinh , "Building a Corpus for Vietnamese Text Readability Assessment in The Literature Domain," Universal Journal of Educational Research, Vol. 8, No. 10, pp. 4996 - 5004, 2020. DOI: 10.13189/ujer.2020.081073.
(b). APA Format:
An-Vinh Luong , Diep Nguyen , Dien Dinh (2020). Building a Corpus for Vietnamese Text Readability Assessment in The Literature Domain. Universal Journal of Educational Research, 8(10), 4996 - 5004. DOI: 10.13189/ujer.2020.081073.