NLP in a Nutshell

Natural Language Processing (NLP) is a subset of Artificial Intelligence (AI). It is the driving force behind machine intelligence in many real-world applications that produce human-like systems adept at reading, deciphering, and extracting meaning from both spoken and written words.

Image-Article-NLP in a Nutshell -Kaisens Data

NLP in a Nutshell

  • Artificial intelligence

Have you ever wondered how Google Assistant sounds so real, or how you can see advertisements based on a single word that you have probably typed in a message to a friend?  Well, this is all the effect of Natural Language Processing (NLP).  

NLP is a subset of Artificial Intelligence (AI) that delves into computer-to-human interactions. By combining the fields of Linguistics and Computer Science, and using different computational techniques, NLP manages to produce a human-like system adept at reading, deciphering, and extracting meaning from both spoken and written words. 

NLP is at the heart of many AI-based technologies, such as virtual assistants like Siri and Cortana, but also automated grammar and spelling correctors (like Grammarly or BonPatron) and machine translation tools (like Google Translate or DeepL). 

 

5 NLP Techniques 

 

According to Mordor Intelligence, global NLP market was valued at $10.72 billion in 2020, and it is expected to reach $48.46 billion by 2026. NLP is the driving force behind machine intelligence in many real-world applications, such as spam detection, machine translation, chatbots, sentiment analysis, text generation and text extraction.  

5 basic NLP techniques include: 

 

  1. Tokenization: Being the most important step in text preprocessing, it involves segmentizing input data into tokens. The latter can be sentences, words, characters, etc.  
  2.  Stemming: As the word indicates, this includes reducing a word to its base form by depriving it of its suffixes and prefixes.
    Example: "Unbelievable" is a word that has a prefix "un" and a suffix "able", if we reduce it to its root, we get "Believ". 
  3. Lemmatization: This step involves the morphological analysis of words. It reduces a word to its base form while retaining its meaning.
    Example: In the word "Unbelievable", Lemmatization would automatically give the base form "Believe". 
  4.  Part Of Speech (POS) Tagging: This process includes grammatically tagging and categorizing a word based on its meaning and context. This reveals how a word functions within a given context. 
    Example
    1/He can fly a plane.
    2/ The fly is an insect.
    POS identifies "Fly" as a verb in the first sentence and as a noun in the second. 
  5. Named Entity Recognition (NER):  It is the process of identifying and categorizing named entities into pre-defined categories, like important personalities, organizations, locations, quantities, movies, etc. 

Example

In this sentence, "KAISENS DATA is a deep-tech company based in La Défense." 

"KAISENS DATA" is identified as an organization and "La Défense" as a Location. 


Comments

You also have your say, express yourself

WELCOME BACK !

Reserve space for the Kaisens Data team.

WELCOME BACK !

Please enter your email to recover your account.

WELCOME BACK !

Sign up to continue to Kaisens Data.

Do you have a account ? SIGN IN NOW

KAISENS DATA

Your email has been successfully registered

KAISENS DATA

You have already subscribed to our newsletters

KAISENS DATA

Your message has been sent! Thanks you

KAISENS DATA

Votre message ne pourra pas être envoyer

WELCOME BACK !

titreMdpReset

KAISENS DATA

Votre données ont été modifié avec succès