What is Natural Language Processing?
What’s this about: Natural language processing, or NLP, can be defined as the study and application of specific tools that enable computers to process, analyze, and interpret human language. NLP combines the different fields of linguistics and computer science, and it is a component of artificial intelligence (AI).
Go Deeper to Learn More →
NLP has been around for over 50 years, and most of today's applications revolve around digital assistants like Siri and Alexa, chatbots, machine translation, and search engines. However, it is also used in fields like medical research and business intelligence.
The Basics of NLP
One of the greatest advancements of modern computers is their ability to interpret human language, which is unbelievably complex. But for this to be done, language must be converted in a way that computers can comprehend and manipulate, and machines must identify patterns within the language and extract data from, for example, the thousands of words within a document. One of the major challenges involves the difficulty of interpreting human language, which is incredibly complex with words and phrases having several different meanings to different groups of people. Context is everything, and we humans have our own way of injecting sarcasm and other types of sentiment.
With that said, NLP has general guidelines that are often used, and when used together, they help extract meaning from text. This meaning can then be used to develop machine learning algorithms that can accurately interpret language.
NLP algorithms often convert unstructured data, or data that isn’t organized, into structured data, or data that is clearly defined and searchable. When this is done successfully, an application like translation software correctly accomplishes its task. But if it fails, the machine loses the meaning of certain words and phrases.
The Different NLP Techniques
NLP techniques can usually be placed in two subcategories:
Syntax: Techniques based on the ordering of words.
Semantic: Techniques based on the meaning of words.
Let’s take a deeper look at each category, starting with syntax techniques:
Lemmatization: Reducing words down to their root forms for processing. For example, tenses and plurals are simplified, so “apples” could be simplified to “apple,” or “feet” to “foot.” This makes it easier for an algorithm to interpret the words.
Part-of-Speech Tagging: Words are marked by their part-of-speech, which includes nouns, verbs, and adjectives.
Morphological Segmentation: This takes place when words are divided into smaller parts known as morphemes. For example, the word untestably could be broken down into the parts “un,” “test,” “able” and “ly.” This technique is often used in machine translation and speech recognition.
Parsing: When all words in a sentence are grammatically analyzed, this is called parsing. For example, the sentence “The dog barked” can be parsed down into dog as the noun and barked as the verb.
Sentence Breaking: Sentence boundaries are placed in large texts, establishing where a sentence begins and ends.
Word Segmentation: Large texts are broken down into small units. For example, an algorithm could recognize words are separated by white spaces in a document.
Here is a breakdown of the different semantic techniques:
Word Sense Disambiguation: This method derives the meaning of a word based on the context. For example, in the sentence “The pen holds chickens,” the word pen could be confused given its various meanings. But the algorithm would be able to analyze the context to recognize the word pen is referring to a fenced-in area, not a writing utensil.
Natural Language Generation: Databases are used to turn structured data into natural language. In other words, the semantics behind words are determined and new text is generated. An example of this would be statistics getting transformed into natural language.
Named Entity Recognition: This technique takes words and categorizes them into pre-set groups. For example, an algorithm could analyze an article and group words into categories such as cities, companies, dates, and individuals.
Human or Machine?
Natural language processing has come a long way since Alan Turing published his paper “Computing Machinery and Intelligence” in 1950. The paper led to the Turing Test, which defined an intelligent computer as a machine that could converse with a human being without that human being realizing it was a machine.
NLP has enabled us to get close to achieving such a goal, as it allows us to interact with machines in an incredibly natural way. NLP’s great importance also stems from its ability to enable non-programmers to obtain crucial data from computing systems.
As NLP continues to rapidly evolve, it is one of the most consequential aspects of artificial intelligence. Non-subject matter experts can pose questions with only natural language and receive answers from machines. NLP continues to impact various sectors, and it will only continue to do so with major use cases in fields like healthcare, manufacturing, advertising, and the automotive industry. With these advances, we are getting closer to a time when it will be extremely difficult, if not impossible, for us to determine whether we are interacting with a fellow human being or machine in many of our daily-life activities.