Have you ever wondered how Google Assistant sounds so real, or how you can see advertisements based on a single word that you have probably typed in a message to a friend? Well, this is all the effect of Natural Language Processing (NLP).
NLP is a subset of Artificial Intelligence (AI) that delves into computer-to-human interactions. By combining the fields of Linguistics and Computer Science, and using different computational techniques, NLP manages to produce a human-like system adept at reading, deciphering, and extracting meaning from both spoken and written words.
NLP is at the heart of many AI-based technologies, such as virtual assistants like Siri and Cortana, but also automated grammar and spelling correctors (like Grammarly or BonPatron) and machine translation tools (like Google Translate or DeepL).
5 NLP Techniques
According to Mordor Intelligence, global NLP market was valued at $10.72 billion in 2020, and it is expected to reach $48.46 billion by 2026. NLP is the driving force behind machine intelligence in many real-world applications, such as spam detection, machine translation, chatbots, sentiment analysis, text generation and text extraction.
5 basic NLP techniques include:
- Tokenization: Being the most important step in text preprocessing, it involves segmentizing input data into tokens. The latter can be sentences, words, characters, etc.
- Stemming: As the word indicates, this includes reducing a word to its base form by depriving it of its suffixes and prefixes.
Example: "Unbelievable" is a word that has a prefix "un" and a suffix "able", if we reduce it to its root, we get "Believ". - Lemmatization: This step involves the morphological analysis of words. It reduces a word to its base form while retaining its meaning.
Example: In the word "Unbelievable", Lemmatization would automatically give the base form "Believe". - Part Of Speech (POS) Tagging: This process includes grammatically tagging and categorizing a word based on its meaning and context. This reveals how a word functions within a given context.
Example:
1/He can fly a plane.
2/ The fly is an insect.
POS identifies "Fly" as a verb in the first sentence and as a noun in the second. - Named Entity Recognition (NER): It is the process of identifying and categorizing named entities into pre-defined categories, like important personalities, organizations, locations, quantities, movies, etc.
Example:
In this sentence, "KAISENS DATA is a deep-tech company based in La Défense."
"KAISENS DATA" is identified as an organization and "La Défense" as a Location.