Role of Natural Language Processing in Big Data
XenonStack - A Stack Innovator

Overview of Artificial Intelligence and Role of Natural Language Processing in Big Data

by Jagreet Kaur | April 21, 2017 |  Categories -  Natural Language Processing, Big Data, Log Analytics, Text Analytics

 

Artificial Intelligence Overview

 

AI refers to ‘Artificial Intelligence’ which means making machines capable of performing intelligent tasks like human beings. AI performs automated tasks using intelligence.

 

The term Artificial Intelligence has two key components -

  • Automation  
  • Intelligence

 

Goals of Artificial Intelligence

 

Key Goals of Artificial Intelligence

 

Stages of Artificial Intelligence

 

Stage 1 - Machine Learning - It is a set of algorithms used by intelligent systems to learn from experience.

 

Stage 2 - Machine Intelligence - These are the advanced set of algorithms used by machines to learn from experience. Eg - Deep Neural Networks.

 

ArtificiaI Intelligence technology is currently at this stage.

 

Stage 3 - Machine Consciousness - It is self-learning from experience without the need of external data.

 

Different Stages of Artificial Intelligence

 

Types of Artificial Intelligence

 

ANI - Artificial Narrow Intelligence - It comprises of basic/role tasks such as those performed by chatbots, personal assistants like SIRI by Apple and Alexa by Amazon.

 

AGI - Artificial General Intelligence - Artificial General Intelligence comprises of human-level tasks such as performed by self-driving cars by Uber, Autopilot by Tesla. It involves continual learning by the machines.

 

ASI - Artificial Super Intelligence - Artificial Super Intelligence refers to intelligence way smarter than humans.

 

What Makes System AI Enabled

 

AI Enabled Systems

 

Difference Between NLP, AI, ML, DL & NN

 

AI or Artificial Intelligence - Building systems that can do intelligent things.

 

NLP or Natural Language Processing - Building systems that can understand language. It is a subset of Artificial Intelligence.

 

ML or Machine Learning - Building systems that can learn from experience. It is also a subset of Artificial Intelligence.

 

NN or Neural Network - Biologically inspired network of Artificial Neurons.

 

DL or Deep Learning - Building systems that use Deep Neural Network on a large set of data. It is a subset of Machine Learning.

 

Difference Between NLP AI ML DL NN

 

What is Natural Language Processing?

 

Natural Language Processing (NLP) is “ability of machines to understand and interpret human language the way it is written or spoken”.

 

The objective of NLP is to make computer/machines as intelligent as human beings in understanding language.

 

What Is NLP

 

The ultimate goal of NLP is to the fill the gap how the humans communicate(natural language) and what the computer understands(machine language).

 

There are three different levels of linguistic analysis done before performing NLP -

 

Syntax - What part of given text is grammatically true.

Semantics - What is the meaning of given text?

Pragmatics - What is the purpose of the text?

 

NLP deal with different aspects of language such as

 

Phonology - It is systematic organization of sounds in language.

 

Morphology - It is a study of words formation and their relationship with each other.

 

Approaches of NLP for understanding semantic analysis

 

  • Distributional - It employs large-scale statistical tactics of Machine Learning and Deep Learning.

  • Frame - Based - The sentences which are syntactically different but semantically same are represented inside data structure (frame) for the stereotyped situation.

  • Theoretical - This approach is based on the idea that sentences refer to the real word (the sky is blue) and parts of the sentence can be combined to represent whole meaning.

  • Interactive Learning - It involves pragmatic approach and user is responsible for teaching the computer to learn the language step by step in an interactive learning environment. 

 

The true success of NLP lies in the fact that humans deceive into believing that they are talking to humans instead of computers.

 

Why Do We Need NLP?

 

With NLP, it is possible to perform certain tasks like Automated Speech and Automated Text Writing in less time.

 

Due to the presence of large data (text) around, why not we use the computers untiring willingness and ability to run several algorithms to perform tasks in no time.

 

These tasks include other NLP applications like Automatic Summarization (to generate summary of given text) and Machine Translation (translation of one language into another)

 

Process of NLP

 

In case the text is composed of speech, speech-to-text conversion is performed.

 

The mechanism of Natural Language Processing involves two processes:

 

  • Natural Language Understanding

  • Natural Language Generation

 

Natural Language Understanding

 

NLU or Natural Language Understanding tries to understand the meaning of given text. The nature and structure of each word inside text must be understood for NLU. For understanding structure, NLU tries to resolve following ambiguity present in natural language:

 

  • Lexical Ambiguity - Words have multiple meanings

  • Syntactic Ambiguity - Sentence having multiple parse trees.

  • Semantic Ambiguity - Sentence having multiple meanings

  • Anaphoric Ambiguity - Phrase or word which is previously mentioned but has a different meaning.

 

Next, the meaning of each word is understood by using lexicons (vocabulary) and set of grammatical rules.

 

However, there are certain different words having similar meaning (synonyms) and words having more than one meaning (polysemy).

 

Natural Language Generation

 

It is the process of automatically producing text from structured data in a readable format with meaningful phrases and sentences. The problem of natural language generation is hard to deal with. It is subset of NLP

 

Natural language generation divided into three proposed stages:-

 

1. Text Planning - Ordering of the basic content in structured data is done.

2. Sentence Planning - The sentences are combined from structured data to represent the flow of information.

3. Realization - Grammatically correct sentences are produced finally to represent text.

 

Difference Between NLP and Text Mining or Text Analytics

 

Natural language processing is responsible for understanding meaning and structure of given text.

 

Text Mining or Text Analytics is a process of extracting hidden information inside text data through pattern recognition.

 

Difference Between NLP Text Mining

 

Natural language processing is used to understand the meaning (semantics) of given text data, while text mining is used to understand structure (syntax) of given text data.

 

As an example - I found my wallet near the bank. The task of NLP is to understand in the end that ‘bank’ refers to financial institute or ‘river bank'.

 

What is Big Data?

 

According to the Author Dr. Kirk Borne, Principal Data Scientist, Big Data Definition is described as big data is everything, quantified, and tracked.

 

For More Details on Big Data, Please Read - Ingestion And Processing of Data For Big Data and IoT Solutions

 

NLP for Big Data is the Next Big Thing

 

Today around 80 % of total data is available in the raw form. Big Data comes from information stored in big organizations as well as enterprises. Examples include information of employees, company purchase, sale records, business transactions, the previous record of organizations, social media etc.

 

Though human uses language, which is ambiguous and unstructured to be interpreted by computers, yet with the help of NLP, this huge unstructured data can be harnessed for evolving patterns inside data to better know the information contained in data.

 

NLP can solve big problems of the business world by using Big Data. Be it any business like retail, healthcare, business, financial institutions.

 

What is Chatbot?

 

Chatbots or Automated Intelligent Agents

 

  • These are the computer program you can talk to through messaging apps, chat windows or through voice calling apps.

  • These are intelligent digital assistants used to resolve customer queries in a cost-effective, quick, and consistent manner.

 

Importance of Chatbots

 

Chatbots are important to understanding changes in digital customer care services provided and in many routine queries that are most frequently enquired.

 

Chatbots are useful in a certain scenario when the customer service requests are specific in the area and highly predictable, managing a high volume of similar requests, automated responses.

 

Working of Chatbot

 

What is Chatbot

Image Source - blog.wizeline.com

 

Knowledge Base - It contains the database of information that is used to equip chatbots with the information needed to respond to queries of customers request.

 

Data Store - It contains interaction history of chatbot with users.

 

NLP Layer - It translates users queries (free form) into information that can be used for appropriate responses.

 

Application Layer - It is the application interface that is used to interact with the user.

 

Chatbots learn each time they make interaction with the user trying to match the user queries with the information in the knowledge base using machine learning.

 

Why Deep Learning Needed in NLP

 

  • It uses a rule-based approach that represents Words as ‘One-Hot’ encoded vectors.

  • Traditional method focuses on syntactic representation instead of semantic representation.

  • Bag of words - classification model is unable to distinguish certain contexts.

 

Machine Learning Approach

 

Three Capabilities of Deep Learning

 

Expressibility - This quality describes how well a machine can approximate universal functions.

 

Trainability - How well and quickly a DL system can learn its problem.

 

Generalizability - How well the machine can perform predictions on data that it has not been trained on.

 

There are of course other capabilities that also need to be considered in Deep Learning such as Interpretability, modularity, transferability, latency, adversarial stability, and security. But these are the main ones.

 

Common Tasks of Deep Learning in NLP

 

Deep Learning Algorithms

NLP Usage

Neural Network - NN (feed)

 

  • Part-of-speech Tagging

  • Tokenization

  • Named Entity Recognition

  • Intent Extraction

Recurrent Neural Networks -(RNN)

 

  • Machine Translation

  • Question Answering System

  • Image Captioning

Recursive Neural Networks

 

  • Parsing sentences

  • Sentiment Analysis

  • Paraphrase detection

  • Relation Classification

  • Object detection

Convolutional Neural Network -(CNN)

 

  • Sentence/ Text classification

  • Relation extraction and classification

  • Spam detection

  • Categorization of search queries

  • Semantic relation extraction

 

 

Difference Between Classical NLP & Deep Learning NLP

 

 

Difference Between Classical NLP and Deep Learning

Image Source - blog.aylien.com

 

NLP For Log Analysis and Log Mining

 

What is Log?

 

A collection of messages from different network devices and hardware in time sequence represents a log. Logs may be directed to files present on hard disks or can be sent over the network as a stream of messages to log collector.

 

Logs provide the process to maintain and track the hardware performance, parameters tuning, emergency and recovery of systems and optimization of applications and infrastructure.

 

You may also love to read - Understanding Log Analytics, Log Mining and Anomaly Detection

 

What is Log Analysis?

 

Log analysis is the process of extracting information from logs considering the different syntax and semantics of messages in the log files and interpreting the context with application to have a comparative analysis of log files coming from different sources for Anomaly Detection and finding correlations.

 

What is Log Mining?

 

Log mining or log knowledge discovery is the process of extracting patterns and correlations in logs to reveal knowledge and predict anomaly detection if any inside log messages.

 

 

Techniques Used for Log Analysis and Log Mining

 

Different techniques used for performing log analysis are described below

 

  • Pattern recognition - It is one such technique which involves comparing log messages with messages stored in pattern book to filter out messages.

 

  • Normalization -  Normalization of log messages is done to convert different messages into the same format. This is done when different log messages having different terminology but same interpretation is coming from different sources like applications or operating systems.

 

  • Classification & Tagging - Classification & Tagging of different log messages involves ordering of messages and tagging them with different keywords for later analysis.

 

  • Artificial Ignorance - It is a kind of technique using machine learning algorithms to discard uninteresting log messages. It is also used to detect an anomaly in the normal working of systems.

 

Role of NLP in Log Analysis & Log Mining

 

Natural Language processing techniques are widely used in log analysis and log mining.

 

The different techniques such as tokenization, stemming, lemmatization, parsing etc are used to convert log messages into structured form.

 

Once logs are available in the well-documented form, log analysis, and log mining is performed to extract useful information and knowledge is discovered from information.

 

The example in case of error log caused due to server failure.

 

Diving into Natural Language Processing

 

Natural language processing is a complex field and is the intersection of artificial intelligence, computational linguistics, and computer science.

 

Getting started with NLP 

 

The user needs to import a file containing text written. Then the user should perform the following steps for natural language processing.

 

Technique

Example

Output

Sentence Segmentation

Mark met the president. He said:”Hi! What’s up -Alex?”

  • Sentence 1 - Mark met the president.

  • Sentence 2 - He said: ”Hi! What’s up - Alex?”

Tokenization

My phone tries to ‘charging’ from ‘discharging’ state.

  • [My] [phone] [tries] [to] [‘] [charging] [‘][from] [‘][discharging] [‘] [state][.]

Stemming/Lemmatization

Drinking, Drank, Drunk

  • Drink

Part-of-Speech tagging

If you build it he will come.

  • IN - prepositions and subordinating conjunctions.

  • PRP - Personal Pronoun

  • VBP - Verb Noun 3rd person singular present form.

  • PRP- Personal pronoun

  • MD - Modal Verbs

  • VB - Verb base form

Parsing

Mark and Joe went into a bar.

  • (S(NP(NP Mark) and (NP(Joe))

  • (VP(went (PP into (NP a bar))))

Named Entity Recognition

Let’s meet Alice at 6 am in India.

  • Let’s meet Alice at 6 am in India

  • Person Time Location

Coreference resolution

Mark went into the mall. He thought it was a shopping mall.

  • Mark went into the mall. He thought it was a shopping mall.

 

  • Sentence segmentation - It identifies sentence boundaries in the given text i.e where one sentence ends and where another sentence begins. Sentences are often marked ended with punctuation mark ‘.’

 

  • Tokenization - It identifies different words, numbers, and other punctuation symbols.

 

  • Stemming - It strips the ending of words like ‘eating’ is reduced to ‘eat.’

 

  • Part of speech (POS) tagging - It assigns each word in a sentence its respective part-of-speech tag such as designating word as noun or adverb.

 

  • Parsing - It involves dividing given text into different categories. To answer a question like this part of sentence modify another part of the sentence.

 

  • Named Entity Recognition - It identifies entities such as persons, location and time within the documents.

 

  • Co-Reference resolution - It is about defining the relationship of given word in a sentence with a previous and the next sentence.

 

Further Key Application Areas of NLP

 

Apart from application in Big Data, Log Mining, and Log Analysis it has other major application areas.

 

Although the term ‘NLP’ is not as popular as ‘big data’ ‘machine learning’ but we are using NLP every day.

 

Automatic summarizer - Given the input text, the task is to write a summary of text discarding irrelevant points.

 

Sentimental analysis - It is done on the given text to predict the subject of the text eg: whether the text conveys judgment, opinion or reviews etc.

 

Text classification - It is performed to categorize different journals, news stories according to their domain. Multi-document classification is also possible. A popular example of text classification is spam detection in emails.

 

Based on the style of the writing in the journal, its attribute can be used to detect its author name.

 

Information Extraction - Information extraction is something which proposes email program to automatically add events to the calendar.

 

Applications of NLP

 

XenonStack Offerings

 

XenonStack is a leading Software Company in Product Development and Solution Provider for DevOps, Big Data Integration, Real Time Analytics & Data Science.

 

Product NexaStack - Unified DevOps Platform Provides monitoring of Kubernetes, Docker, OpenStack infrastructure, Big Data Infrastructure and uses advanced machine learning techniques for Log Mining and Log Analytics.

 

Product ElixirData - Modern Data Integration Platform Enables enterprises and Different agencies for Log Analytics and Log Mining. 

 

Product Akira.AI is an Automated & Knowledge Drive Artificial Intelligence Platform that enables you to automate the Infrastructure to train and deploy Deep Learning Models on Public Cloud as well as On-Premises. 

 

Get 1 Hour Free Assessment For DevOps, Big Data Strategy, and Data Science. CONTACT US NOW!



Share Post On Social Media

Related Posts


Build, Deploy, Manage & Secure Continuous Delivery Pipeline & Analytics Stack.


NexaStack - DevOps & Serverless Computing Platform

Elixir Data - Modern Data Integration Platform

Contact For Free Assessment