Advanced Linguistic Analysis

treebanking

Humans don’t have to think about how to parse and understand their native language. They know whether a word is a true word, they know whether a sentence is meaningful, and they can use context to disambiguate. Computers, however, need to be told explicitly what a verb and a noun are, how sentences are formed, and the meaning of words and sentences. This kind of linguistic analysis enables the automatic processing and parsing of natural text.

Linguistic analysis is the description of language with regards to its morphological, syntactical, and semantic structures. Morphology describes the internal structures of words and how they can be modified, syntax describes how words combine to form grammatical sentences, and semantics is the study of the meaning of words and phrases and how these combine to form the meanings of sentences. For language to sound natural, understanding detailed syntactic and semantic representation is necessary. Deep linguistic processing provides a knowledge-rich analysis of language through manually developed grammars and language resources.

Building fundamental natural-language processing (NLP) components that are critical to any other downstream analysis takes a great amount of specialized linguistic resources and expertise. At Appen, we provide those resources in 140 languages. We provide resources such as tagged data (lexical and part-of-speech) , morphological rules, and treebanks for the development of these underlying components in any language.