CORPUS ANALYSIS IN THE EXPRESSION OF STATUS VERBS
Keywords:
EXPRESSION, CORPUS ANALYSISAbstract
A corpus is a language resource consisting of a large and systematized set of texts. In corpus linguistics, they are used to perform statistical analyses, to test views, linguistic phenomena or theoretical rules within a specific language or a specific section of the language. A corpus can consist of textual data in one language or several languages. A corpus usually means a textual corpus, but nowadays corpora are no longer just texts. Therefore, instead of the word corpus, we use the concept of text corpus. Corpora are annotated to make language research more efficient. For example, one type of corpus annotation is word tagging (POS-tagging). This means tagging based on the category of the word and the categories of this category. That is, the word "kutdim" carries the following information: verb, singular, tense, person-number. The same information is attached to the word through tags. Another form of annotation is lemmatization, which is to indicate the base form of a word. For example, the base of the words "kutdim", "kutgandim", "kutganimga" is the same - the verb "to wait". This is called lemmatization. The concepts of root and base should not be confused here. For example, the word "bostirma" is formed in the form "bostir+ma", but we cannot consider the word “bostir” as a lemma in its rooting, “bostirma” is a single word. If we need to root the words "bostirmada", "bostirmaga", "bostirmaning", then it will be correct to take the word suppression. In simple terms, a lemma is a part of a word that omits form-forming suffixes.