A Brief Introduction to NLP with Phoebe Liu
Have you ever interacted with a chatbot? Or requested something from a virtual assistant like Siri, Alexa, or your car’s infotainment system? What about translating something online? Most of us have interacted with these types of artificial intelligence (AI) before, and never stopped to contemplate the ease with which we could communicate our needs and receive an appropriate response. But a quick pause to reflect on the complexities of human language, and isn’t it a wonder that machines can communicate with us at all?
It’s all thanks to natural language processing. But what is natural language processing (NLP)? Natural language processing is the technology used to teach computers how to understand and generate appropriate responses in a human-life manner. With NLP, machines learn to read, decipher, and interpret written and spoken human language, as well as create narratives that describe, summarize, or explain input (structured data) in a human-like manner. NLP is the driving force behind many AI solutions you interact with regularly and enables comprehension between humans and machines.
Today, NLP is becoming increasingly popular thanks to tremendous improvements in data access and increases in computational power.
Why Natural Language Processing is Difficult
NLP can be challenging. But why is natural language processing difficult? A computer’s native language, at its base level, is simply a collection of millions of ones and zeros, a binary assortment of yes’s and no’s. Computers don’t think contextually like humans – they think logically. When you speak to an AI-powered computer, that machine must somehow understand and interpret what was said, calculate an appropriate response, and convert that response to human (or natural) language—all in a matter of milliseconds. It’s hard to imagine the level of processing power required for this feat, and computers are doing this all the time.
The intricacies of natural language shouldn’t be understated, either. Humans express themselves in an infinite number of ways. There are hundreds of languages and dialects, and each has its own syntax rules and slang that may vary whether the language is written or spoken. Individuals also write and speak differently from one another. Some may talk with a lisp, for instance, or write with abbreviations. For a computer to understand all of these deviations, it must have encountered them before. It must be trained on similar data. Another challenge is that the training corpus should be in the same domain for the intended application. For example, the conversation collected in a medical environment is different from that of the customer support domain, making data collection all more challenging as it is hard, but necessary, to gather data from the right domain.
These factors all contribute to the difficulty involved in the implementation of NLP. You must have access to large amounts of natural language data so a computer is prepared for a vast range of interactions. The computational power to service those interactions and bridge the gap between ones and zeros and natural language is critical. It’s little wonder that NLP has only recently become a prominent part of machine learning.
NLP breaks down language into shorter segments to understand relationships between the segments and how they connect to create meaning. The two language components are syntax (the arrangement of words in a sentence such that they make grammatical sense) and semantics (the meaning conveyed by the text). Within each category are core NLP techniques:
These are a few standard methods machines use to analyze syntax:
- Segmentation: Breaking a sentence down into smaller pieces.
- Lemmatization: Reducing a word to its base and grouping together similarly-based words.
- Part-of-speech tagging: Identifying the part-of-speech for each word.
- Stemming: Removing affixes and suffixes of words to obtain root word.
Note that these are just a selection of the many approaches to syntactic analysis.
The following are two popular methods machines use to analyze meaning:
- Named entity recognition: Identify preset groups (such as people and places) and categorize them.
- Word sense disambiguation: Give meaning to a word based on context.
A machine may use a combination of the above techniques to derive syntax and semantics from a given text.
What Can Natural Language Processing Do?
NLP has many use cases. It helps scale language-related tasks by enabling machines to carry out repetitive tasks that would otherwise be done by humans. A variety of industries use NLP, including:
- Social media analytics: NLP can track sentiments about brands, products, or specific topics and determine how customers make choices. It can also filter out fake news by detecting political bias.
- Text-to-speech applications: Text-to-speech apps provide information in more ways for greater inclusivity, as well as create richer interactive experiences for call centers, video games, and language education domains.
- Personal assistants and chatbots: NLP enables AI to communicate with people for routine questions and transactions, freeing humans for more high-level, strategic efforts.
- Search queries. Especially useful in eCommerce, NLP helps identify key search terms to drive more relevant search results.
- Language translation: NLP is used to translate across a full range of languages and dialects.
- Information extraction: Used, for instance, in healthcare for patient records, data extraction via NLP is vital for distilling critical information quickly.
While this list is by no means exhaustive, it illustrates the incredible progress already made in natural language processing. The transformative power of NLP will continue to color our interactions with technology. Undoubtedly, we’ll see more breakthroughs in this space as we further bridge the gap between human and machine communications.
Insight from an Appen NLP Expert – Phoebe Liu
At Appen, we rely on our team of experts to help you build models utilizing NLP that enables a quality customer experience. Phoebe Liu, one of our senior data scientists, who was also a speaker at the O’Reilly and KDD conferences, featured on BBC and Al Jazeera documentary series for work in conversational robotics, and winner of the Best Picture award at the 2018 Robot Film Festival, works to ensure Appen customer NLP models are executed successfully. Phoebe’s top three insights on natural language processing include:
1. The most successful projects start with understanding the business problem and requirements. This helps define how you should collect data, who should annotate your training corpora, and determine whether domain experts or linguists are needed in the data collection process. Adopt a clear, unambiguous definition of the problem and the role of NLP in that solution.
2. Ensure user satisfaction through user testing. For automatic speech recognition, test with speakers with different accents and different ways of saying the same thing. For NLU in chatbot and voice AI, test with users who interact naturally as if they were to chat with another human. The more you conduct user testing in real-world situations, the smoother the interaction will be between your users and the NLP system.
3. ML models are not magic – Design “fallback” methods when NLP does not yield 100% accurate results. NLP is still an evolving field that requires domain expertise and good training corpora to implement properly. Be sure to have a backup plan and manage the NLP output (think human-in-the-loop) for those critical times when NLP falls short.
What Appen Can Do For You
At Appen, our natural language processing expertise spans over 20 years, over which time we have acquired advanced resources and expertise on the best formula for successful NLP projects. Thanks to the support of our team with experts like Phoebe, the Appen Data Annotation Platform, and our crowd, we give you the high-quality training data you need to deploy world-class models at scale. Whatever your NLP needs may be, we are standing by to assist you in deploying and maintaining your AI and ML projects.