Natural Language Processing – Connecting The World through Language With AI

As very young children, before we can even walk or talk, we’re listening. We’re hearing the sounds and vocalizations made by other people. We’re attaching those combinations of sounds to meanings, such as ‘mother’ and ‘door’, and learning to read the facial expressions of those around us to deepen our understanding of blocks of words. Then we go to school, start interacting with other types of language representation — like cartoons, TV, tablets and mobile phones, as well as books, where we refine our understanding of language.

What is a natural process for the majority of people, is incredibly difficult for computers. Languages are complex data types, with flexible formal rules and lots of exceptions, and they are exceptionally hard to understand when they lack context and intent. Imagine a child comes into your room and says “Door!”. Without context (why are they saying this? Is the door open?) and intent (do they want me to close it?), it’s almost impossible to know how to respond in an appropriate way.

It’s no wonder that it’s taken decades of slow, tedious work to train AI to “understand” language. As machine learning capabilities grow, so does our ability to improve Natural Language Processing or NLP.

As AI and NLP technology gets better and better, it’s being applied in different ways to make the world a better place.

natural language processing

What is NLP?

NLP or Natural Language Processing is the manipulation of language by software. During processing, language is broken down into parts by the software so that it can be understood and interpreted. This can be done with speech or text, depending on the software. When combined with AI and machine learning, NLP data sets have grown exponentially, which allows the technology to do more and to do it better.

The first iterations of NLP began more than 50 years ago and evolved out of the field of linguistics. Today, the most common example of NLP technology is right in your purse or pocket. Smart assistants in your home or on your smartphone use NLP and AI to provide a voice-driven interface for intelligent search.

The next time you call out to Alexa, Siri, Google, Bixby, or any other virtual assistant — remember, you’re using a technology that was decades in the making and it wouldn’t be possible without advanced AI.

NLP and AI Projects Making the World a Better Place

In the beginning, NLP, like linguistics, was a way of developing a deeper understanding of language. As the field grows and AI technology improves, NLP can be scaled for use by a number of different industries while making the world a better and more efficient place.

As AI data-handling improves and access to huge amounts of computational power becomes commonplace, NLP and AI applications will continue to expand in scope. And, when done well with a partner that understands data storage, transformation, and labeling, the technology can benefit so many people.

Here are a few notable examples of how companies are combining an understanding of data, AI, and NLP to make the world a better place.

AI and NLP for Healthcare

With its mountains of undigitized data and handwritten notes, NLP use cases are booming in healthcare. NLP is not only being used to improve healthcare, it’s working to bring down costs. With AI and automation, NLP can be used to do rote, repetitive jobs while the humans get back to caring for one another.

Most health data is in text form, in doctors’ notes, clinical trial reports, and patient medical records. NLP is currently being used to speed up the process of digitizing paper medical records, which will make sharing those records with patients and other doctors faster and more comprehensive.

Once records have been digitized, tools such as Amazon Comprehend Medical can be used to interpret those records and look for patterns to improve diagnosis. NLP enables the recognition and prediction of diseases through digitized health records. This can lead to earlier and more accurate diagnoses.

Where Amazon Comprehend Medical truly shines is in its ability to extract and organize data. Automated, rule-based data organization won’t work, because it doesn’t understand context, which leaves the data insufficiently structured and unusable. With Amazon Comprehend Medical, the extracted data can be compared to medical ontologies (abstract knowledge structures), to understand and build relationships from the extracted medical information, leading to better, faster diagnosis of disease for patients.

Another example of NLP and AI being used to improve healthcare is at Winterlight Labs, where they’ve created a tool that can monitor cognitive impairment through speech. Their tool is being used to quickly and objectively analyze speech in order to detect dementia and mental illness.

NLP is also being used to treat anxiety and other mental health disorders through the use of Woebot, a chatbot therapist developed by Stanford University. Where Woebot stands out from other chatbots is its ability to form a therapeutic bond with humans, making cognitive and behavioural change possible.

With healthcare costs growing and an increasing demand for mental health care, NLP and AI tools are in high demand for their efficiency, efficacy, and ability to reduce costs.

Improve Information Sharing and Slow the Distribution of Fake News

One of the major struggles over the last few years, especially during the pandemic, is the distribution of false and inflammatory information. Concerns over bias and truth have led to deep divisions. To help identify fake news, the NLP Group at MIT developed NLP software that can examine and determine whether or not a news source is accurate and trustworthy, or politically biased. Over time, the group has worked to improve the software and to remove bias that was programmed into the data analysis.

While slowing the spread of fake news is intended to improve the quality of information that’s available, data scientists have also found that a lack of information can be harmful. To improve data sharing, we worked together with Translators without Borders, Carnegie Mellon University, Johns Hopkins University, big tech companies, and language services companies as part of TICO-19, a data-sharing and translation organization seeking to combat the lack of information on COVID-19 in low resource languages. The organization has used NLP and AI tools to translate and share information about COVID-19 from high resource language to low resource languages.

AI-Powered Predictive Text for Mobile Devices

When it comes to improving people’s everyday lives, NLP tools are already hard at work. You can see NLP and AI working together in smartphones, email clients, and smart assistants.

Predictive text, autocorrect, and autocomplete all use NLP technology to improve search efficiency and facilitate written work. These small improvements can make regular work more efficient for people. A well-built autocomplete should learn from every interaction, so it gets better over time.

On the back end, search engines use NLP to return the right results to searchers. Through an understanding of intent and extrapolation, searches are no longer literal and rule-based. For example, you can now type in a flight number and instead of simply getting results for which airline carries that flight, you’ll get the flight’s current status and arrival or departure information, and, if your search engine provider is also your email provider where you have your ticket confirmation, you will see your actual upcoming flight information.

Improve Customer Service Through Sentiment Analysis

If you’ve recently visited a website for a large company and have been greeted by a chatbot, you’ve interacted with NLP and AI customer service technology. These chatbots use NLP and algorithms to understand customer questions and respond appropriately in real time.

The latest advances in NLP now enable sentiment analysis. Earlier iterations of NLP technology could only understand words and not the feeling behind them. Sentiment analysis allows technology to understand the emotions behind our words. Using sentiment analysis, organizations can smooth out interactions with customers, and prevent bigger issues from developing on social media, for example.

NLP software is being used by companies in social media and on customer service calls to better understand customer sentiment and train their software to do the same. Anytime you hear “this call may be recorded for training purposes,” it may mean your call is being filtered through NLP software to improve customer service in future.

NLP and sentiment analysis are also used in the new Google Assistant technology that can make phone calls and appointments for users.

AI-Powered Translation and Sign-to-Text

10 years ago, if you needed help doing your foreign language homework, you could ask Google Translate, but it would be risky. Online translators, even just a few years ago, weren’t robust enough to handle colloquialisms or grammar. Instead, they’d give you a literal translation that often left sentences hard to understand.

With advances in NLP, online translators can now translate languages more accurately and with proper grammar. Many online tools now also recognize which language is being used and automatically translate from it. You can see this in real time if you visit a website in another language via Google.

Other translation tools have used NLP to advance the technology with sign language translation. SignAll helps those who are deaf or hard of hearing to communicate with those who don’t know sign language. The technology uses a camera to see and interpret sign language and translate it into written words. This technology will also have uses in VR technology as the ability to understand specific, minute hand movements has been a significant challenge.

NLP is not only being used to make translation easier between people who speak two different languages, it’s also being used to maintain and revitalize languages. Microsoft recently added text translation of Inuktitut, a native language in Canada, to Microsoft Translate, a project that Appen contributed training data to. Inuktitut is currently spoken by approximately 40,000 Inuit people across Canada. By enabling the language to be used more widely in everyday computing environments at work and school, this development supports the ongoing vitality of the language.

NLP and AI Data Analysis

One of the main constraints for NLP technology over the years is simply the fact that language is incredibly complicated. Words with the same spelling have alternate meanings, words with different pronunciations have the same spellings, and words can be used creatively to have many different emotional meanings through sarcasm. That’s a lot to take in!

As NLP is combined with improved data analysis and machine learning techniques, the technology is getting better at understanding what’s being communicated. Through data labeling and analysis, NLP technology is improving and making the world a better place.

Without high-quality annotated training data, however, NLP can’t continue to improve. At Appen, we recommend the use of smart labeling techniques such as pre-labeling, speed labeling, and smart validators, to make NLP data more efficient and more useful.

Using high quality labeled data, NLP and AI companies are working together to make the world a more efficient place with predictive text and smart assistants. They’re also making the world an easier place to live in and navigate through with improved customer service, better translation services, and better health care.

Expert Insight from Dr. Judith Bishop – Senior Director, Solutions & Advanced Research

For NLP technology to be successful in the long term, whether in business, finance, medicine or any other domain, it has to work equally well for every user, and not perpetuate patterns of discrimination. Clients ask us all the time, ‘How can we ensure our training data reflects the diversity of our customers’ interactions?’

In the context of NLP, that diversity is present in all the different ways we speak and write. Language diversity is not the same as traditional demographics, though; you can cover age groups, regions and genders in your training data, and yet not adequately account for the spectrum of ways that people really communicate. Understanding all the ways that language varies in the real world ensures we’re not wasting time and money collecting the wrong data — or worse, creating systems that work poorly for some customer segments.

To answer our clients’ question, there are three things we can do.

  1. Have linguists co-design and guide data collection and annotation efforts.
    Linguists understand real-world language variation and language behaviour and can ensure that NLP training data is truly fit for purpose. If not guided by experts, data collection guidelines can inadvertently impact variation in the data. Something as simple as requiring punctuation in a text data collection can bias the data collected toward more formal writing — which may not be representative of the informal text users will actually enter in the resulting NLP application, such as a chatbot.
  2. Have data annotated by people whose diversity matches the diversity of the data.
    There’s a growing body of evidence that data annotations — such as labels on images, but also speech transcriptions and translations — are as much a source of bias as the data itself. We all filter the world, including language, through the lens of our experience, attitudes and perceptions. If I’m familiar with a regional or dialectal word, I’m more likely to transcribe, label or translate it correctly.
  3. Work with diverse staff.
    Their sensitivity to difference can guide our understanding of what data bias looks like and how to avoid it when collecting and annotating training data. Gender bias in NLP has gained a lot of attention recently, with research showing that a negative bias toward female gender terms persists in training data sets and resulting applications. Working with gender-expansive colleagues has led me to ask how gender-expansive identities — including the use of the singular ‘they’ pronoun — are handled by NLP applications, which in many cases have been trained using a binary, or at most, ternary, set of gender labels (male/female/other). By working with diverse staff, we can anticipate such questions and concerns and proactively build more inclusive NLP.

What We Can Do For You

At Appen, our natural language processing expertise spans over 20 years, over which time we have acquired advanced resources and expertise on the best formula for successful NLP projects. Thanks to the support of our team with experts like Dr. Judith Bishop, the Appen Data Annotation Platform, and our crowd, we give you the high-quality training data you need to deploy world-class models at scale. Whatever your NLP needs may be, we are standing by to assist you in deploying and maintaining your AI and ML projects.

Learn more about how our expertise can help you with your next NLP project, or contact us today to speak with someone directly.

Website for deploying AI with world class training data
Language