Personal tools

NLP Building Blocks

Harvard (Charles River) IMG 7698
(Harvard University - Harvard Taiwan Student Association)

- Overview 

In Natural Language Processing (NLP), "building blocks" refer to the fundamental components and techniques used to process and understand human language, including things like tokenization, stemming, stop word removal, part-of-speech tagging, named entity recognition, and sentiment analysis, which are applied in sequence to break down text into meaningful units for analysis. 

Key components about NLP building blocks:

  • Text Preprocessing: The initial step, where raw text is cleaned and transformed into a structured format by processes like tokenization (splitting text into individual words), removing punctuation, and converting text to lowercase.
  • Lexical Analysis: Identifying individual words and their meanings within a sentence, often including stemming (reducing words to their root form) and lemmatization (finding the base form of a word).
  • Part-of-Speech (POS) Tagging: Assigning grammatical categories to words (like noun, verb, adjective) to understand their role in a sentence.
  • Named Entity Recognition (NER): Identifying and classifying specific entities like people, locations, organizations, and dates within a text.
  • Syntactic Analysis: Analyzing the grammatical structure of a sentence, including parsing to identify the relationships between words and phrases.
  • Semantic Analysis: Understanding the meaning of words and phrases within the context of a sentence, often using techniques like word embedding to represent words as vectors.
  • Sentiment Analysis: Determining the emotional tone or sentiment expressed in a piece of text, whether positive, negative, or neutral. 
  • Discourse analysis: Analyzing how sentences relate to each other within a larger piece of text, considering context and overall meaning.
  • Pragmatic analysis: Interpreting the intended meaning of a statement based on the speaker's intent and the situation.

 

Stages of NLP: 

The process of NLP can be divided into five different phases: lexical analysis, syntactic analysis, semantic analysis, discourse integration, and pragmatic analysis. Each stage plays a vital role in the overall understanding and processing of natural language.

 

[More to come ...] 
Document Actions