XenonStack Recommends

Data Science

Real-Time Anomaly Detection for Cognitive Intelligence

Dr. Jagreet Kaur Gill | 04 September 2024

Real-Time Anomaly Detection for Cognitive Intelligence
21:43
Real Time Anomaly Detection for Cognitive Intelligence

Evolution of Analytics in the last decade

Around ten years ago, the tools for analytics or the available resources were Excel, SQL databases, and similar relatively simple ones compared to the advanced ones that are available nowadays. Analytics is also used to target things like reporting, customer classification, sales trends, whether they are going up or down, etc. In this article, we will discuss Real-Time Anomaly Detection.

Simulate the human thought process to learn from the data and extract the hidden patterns from data. Click to explore about, Cognitive Analytics Tools and its Applications

What is Big Data Analytics?

As time passed by, the amount of data exploded with various factors like social media data, transaction records, sensor information, etc., in the past five years. With the increase in data, how data is stored has also changed. It used to be SQL databases the most and analytics used to happen for the same during the ideal time. The analytics also used to be serialized. Later, NoSQL databases started to replace traditional SQL databases since the data size had become huge, and the analysis also changed from serial analytics to parallel processing and distributed systems for quick results.

What is Cognitive Analytics?

In the contemporary world, analytics mainly target predictions with impeccably high accuracy, trying to be as close to human understanding as possible. They are basically trying to mimic machines with the Cognitive intelligence that humans have. This analysis should be accurate, fast, and with constant Learning. The output is expected to be in real time and also predict future events.

What are the Layers of Cognitive Intelligence?

Cognitive Computing helps accelerate human intelligence through human Learning, thinking, and adaptivity. Moving towards the machine capacity not only helps augment the human potential; instead, it will increase the creativity of the individual and create new waves of innovations. The key areas of capability are -

1. Sensory Perception

Machines are enabled to stimulate human senses such as smell, touch, taste, and hearing. Therefore, they are developed in machine simulation, such as visual and auditory perception.

2. Deduction, Reasoning, and Learning

In this case, the machines are simulated with human thinking for decision-making. Therefore, various technologies such as machine learning, Deep Learning, and neural networks are deployed as a system of intelligence to extract meaningful and useful information and apply judgment.

3. Data processing

This larger dataset is accessed to facilitate the decision-making process and provides practical suggestions. Therefore, hyper scale-computing, knowledge representation, and natural language processing togetherly provide the required processing power to enable the system to engage in real time.

If you don’t want the future technologies to catch you off guard, pay attention to these current trends in big data analytics and succeed!”. Click to explore about, Latest Trends in Big Data Analytics

Features of Cognitive Computing Solutions

The purpose of cognitive computing is to create the frame for computing such that complex problems are solved easily without human intervention. Features are listed below -

1. Adaptive: It is one of the first steps in developing the machine learning-based cognitive system. The solution imitates the human ability to Learn from the surroundings. It is dynamic for data gathering, understanding goals, and requirements.

2. Interactive: The cognitive solution should dynamically interact bidirectionally in nature with each element in the system such as processes, devices, users, and cloud services. The system can understand human input and provide the results using natural language processing and deep learning models.

3. Iterative and Stateful: The system should be able to learn from previous iterations and be ready to return the information that is specifically crucial at that time. The system must follow data quality and visualization methodologies so that it provides enough information and the data sources can operate with reliable and updated data.

4. Contextual: The system should be able to understand, identify, and even extract contextual elements from the data, such as meaning, syntax, time, location, task, goal, and many more. The system removes multiple sources of information, such as structured, sensor inputs, unstructured, and semi-structured data.

Working of Cognitive Intelligence in Real Time Anomaly Detection

Cognitive applications use deep Learning and neural network algorithms to control technological applications such as data mining, pattern recognition, and natural language processing. The system gathers a variety of information and processes it with the previous report it already knows. After the data analysis is completed, it integrates with adaptive page displays to visualize the content for specific audiences in specific situations.

How does Cognitive Intelligence relate to Analytics?

Cognitive Computing extends the level of analytics to the next level using new technologies. It is used when the vast corpus of textual data is available as a free text document. Therefore, cognitive Computing is used to analyze these documents to support a wide range of activities. The cognitive system supports the training of the model using data containing a large set of training examples, like a training of the model using an extensive collection of questions with their corresponding answers.

The goal of cognitive computing is to simulate human thought processes in a computerized model. Source: Cognitive Computing

What is an Anomaly?

Sometimes, while trying to analyze data, unexpected occurrences occur, otherwise called anomalies. An "Anomaly" means abnormal or unexpected behavior or deviation from a regular trend. What are the Anomalies that are encountered in daily life?

1. Server getting down

2. Any unauthorized person entering a network.

3. Data leakage etc.

The Need for Anomaly Detection

Earlier, anomalies seldom occur. If the anomaly is not detected and rightful actions are not taken, the consequences may prove costly in situations like Network intrusion, change in log patterns, data leak, fraud transactions, Insider trading, and many more. Just imagine the loss that could incur if any of the lists mentioned above occurs.

Real-Time Anomaly Detection  in Docker, Hadoop cluster

The anomaly detection in docker runs at the same level as the docker daemon and keeps track of the events. When an event informs that the new container has started, the anomaly detection also begins simultaneously. The anomaly detection algorithm queries the daemon to find the process IDs that are running inside a container. Then, the syscalls are recorded using the process 'IDs, root privileges, and place and are sent to the Redis queue using the message service. The behavior is analyzed by the anomaly detection process to send notifications to the administrators.

Hadoop is one of the widely used big data platforms among various industries and businesses. With the increase in usage, the data availability has also increased, and so has the necessity for detecting anomalies in Hadoop clusters. Traditionally, rule-based models are used for alerting the occurrence of anomalies using domain knowledge and experience about the target system. The problem occurs when hidden patterns are not able to explore where the anomalies lie in business problems where the scope of domain knowledge is less. For this purpose pattern recognition techniques DBSCAN and PCA are used to find out the anomalies without any prior experience.

Anomaly Detection in IoT

Internet of Things, in simple terms, is the connection between everyday usage devices and the Internet. So, once the device is connected with internet access the data of that device is obtained. Essential tools that require anomaly detection techniques are the ones used in industries and business organizations which has sensors. Data has been received continuously with every passing second through these sensors. Notably, in the maintenance of the systems, the sensors have to be monitored to predict the anomaly. These predictions have high economic value because in IoT multiple things are interlinked. While working using immense data, the following things are required?

1. Feature selection

2. Feature Transformation
3. Pattern recognition

 

What features are responsible for the anomaly? What transformations should be made on these features to detect patterns? What patterns signify an anomaly? To explain with a simple example, let's consider a sensor that gives temperature values of special equipment in an industry. A change in sensor values is used to know if the equipment is stable or about to fail. Tracking is done on the statistical measures mean and standard deviation of the temperatures over some time.

If there are changes, that is, a shift in a mean or considerable fluctuations in standard deviation values, there is something wrong with the equipment, and immediate action is required. This notification is sent as an alert. With the advances in technology, multiple machine learning and statistical techniques are used to identify and predict anomalies accurately. The significant advantage is once the complete system is automated one needs not always keep track of equipment to know if everything is okay or not.

Log Analytics problem is to use Deep Learning Neural Network as a training classifier for the log data. Click to explore about, Log Analytics and Log Mining with Deep Learning

Anomaly Detection with Deep Learning

Deep Learning can basically be thought of as an extension of Artificial Neural Networks(ANN). An artificial neural network consists of input nodes that are passed to a series of hidden layers, and an activation function is used which signifies the signal strength that is supposed to be sent to connected nodes. The output is the result of a classification or regression problem. The model uses the result and the error to learn and update the model by changing the parameters involved. Deep Learning is a complicated case that includes mathematical concepts like matrix algebra, probability, etc., and intense hardware resources.

The model training with Neural networks and deep Learning happens using an objective function like a stochastic gradient process. The predict the value, calculate the error, and update the model to reduce the error. The implementation of neural networks and deep learning can be done using Keras and TensorFlow, which are open-source and have libraries in Python. Recurrent neural networks store information in the hidden layers, which are updated when new data is fed to the network. These can be used in time series. The applications of RNN are in handwriting recognition, speech recognition, log analysis, anomaly detection, etc. So, 'RNNs are used to find out anomalies in any of the mentioned use cases.

1. Transitioning from Anomalies to Correlation

Well, if everything goes fine and, let us assume, a reasonably good anomaly detection system is developed, should this be it? NO. It should also be able to determine what are the factors responsible for the anomaly. The challenge is, what if the data is enormous? Across so many fields/variables, which fields are responsible for an anomaly? Finding this is an essential task because it helps in taking immediate action to reduce the loss if the root cause is found. This analysis to find out the reasons for anomalies is called correlation analysis.

i. Downloading a huge file is the reason for more memory usage in a short time in Log data.

ii. Running a machine learning model with a massive amount of data results in an anomaly in CPU usage.

iii. A suspicious person trying to enter the network shows sudden changes in Log patterns.

2. Transitioning from Monitoring to Optimization

Once the correlation is done, the next step is to predict the potential anomalies that may occur in the future. So, how to predict future anomalies? One of the methods is to use Bayesian classification and Markov models. If a component with various features is given and if it is failing any function, it ends up failing the element, which is an anomaly.

Based on the trends of the features Markov model helps in identifying the future values up to some k timestamps, and the Bayesian method is used to predict the probability of anomaly symptoms. The feature "A" is expected to fail after the timestamp "T," which results in the failure of component "C." If feature A is repaired within timestamp T, it will end up making the component "C" run as usual. If it is unable to predict the reason initially and if the component fails, the loss is -

i. Time to identify the reason for the failure.
ii. Get the feature(sub-component) repaired.
iii. Get the component running again.
Instead, one should be careful enough to predict the failure of any feature by taking several actions before the occurrence. Hence, instead of monitoring every component in the system and resolving the issues, optimization is necessary, followed by the automation of this process. This improves the steps from finding anomalies to preventing anomalies.

How to find anomalies in Real-Time

Steps to find anomalies are listed below:

1. Supervised Anomaly Detection

In this approach, historical data is used which says data points and class defining if each position is abnormal or not. It is similar to the classification problem. So, the class variable as the anomaly column is taken and applied with the models such as "Random Forest," "XGB," "SVM," or regression algorithms to train the data. This model is used to the new data point to know if it is an anomaly or not. One should be careful about the ratio of an anomaly to no anomaly in the dataset. It 'shouldn't be too high - for example, more than 1:10 since it becomes a class imbalance problem.

2. Unsupervised Anomaly Detection

Clustering techniques are applied in this, as it is not known before whether a data point is an anomaly or not. Clustering algorithms detect anomalies.

K-Means clustering

The K- Means Clustering algorithm is required to be given the number of clusters to be formed as an initial input; based on that value, the algorithm provides the same amount of clusters as an output. At present, this process will consider all the data points and form clusters. Restrict the distance of the boundary from the centroid from each cluster. The restriction can be 95 percentile or 99 percentile based on the requirement. The points outside this range after the clustering process is done are the anomalies.

DBSCAN

This is also a clustering algorithm that is different from k-means. In DBSCAN minimum points and maximum distance as parameters are chosen. The algorithm initially starts with a random point, selects the locations in the neighborhood, and links them. Each of those will continue the same process. With this, the patterns are, and all the possible clusters are formed. The leftover points are considered anomalies. In the case of the Supervised Algorithm, the data should be trained already, and the model can detect only that kind of anomaly that was learned by it before. Therefore, this algorithm is not feasible for detecting all types of anomalies. So, Unsupervised Algorithms can be used instead of supervised algorithms. This can identify any anomalies from data, i.e. the anomalies that are never seen before.

Use Case of Real-Time Anomaly Detection

The Use Case of Real-Time Anomaly Detection:

1. Log Analysis

Log Data is a time series. So, the data point that is an anomaly depends on time. A particular value might be an anomaly at one timestamp but not at another, and vice versa. The server generates a lot of information, like memory usage, CPU usage, read-write operations, power consumption, etc. Let's consider power consumption usage to detect any abnormal behavior.

Hypothetical data in a table is described below

Cognitive Intelligence Solutions The basic idea is to firstly, get the data to look at the extreme values. Normal distribution may be considered and look for absolute values and conclude them as anomalies. So, based on a mean and standard deviation of the benefits of memory usage, get the points that are beyond the 90%, 95%, or 99% range based on our requirement and conclude them as anomalies. In that case, all the values above 210 as usage values are considered anomalies. Let us look at the plot for the same data. But is this explanatory enough for time series data?

2. Moving Averages

Moving averages are used assuming that there wouldn't occur sudden changes with time. If there is one, then it is an anomaly. To explain the moving average in simple terms, the usage value at the timestamp should be close to the average usage of the past few timestamps. The number of timestamps to take into consideration depends on the data. This technique is called a simple moving average. In this technique, the latest points are as important as the old ones. Another technique is the exponential moving average. In this technique, the latest points are given more importance in the analysis than the old ones. St+1 = α*yt+(1−α)*St , 0 < α ≤ 1 , t > 0.

Neither technique is superior. The method of being used depends upon the problem and the data. The point to consider here is exponential moving average is sensitive to the latest points but not the simple moving average. Sometimes, one gets misled by the data by ignoring the specific scenarios that seem anomalous but are not. For instance, on an off day, when there are no employees, the data is expected to be low. During such scenarios, two problems are analyzed .-

  • False alarms on the off day, because there is a mismatch between typical day values and off day values.

  • The off-day values are used in the analysis to predict the following day's values, giving scope for erred results.

To deal with such incidents, impute the values of the off day with the expected benefits, assuming the day was not off, and continue the analysis. This way, precision, and recall could be adjusted.

3. Importance of Real-Time Anomaly Detection

In specific use cases, anomaly detection has to work in real time. As soon as the anomaly is detected, several measures can be taken to mitigate the loss. The techniques used in real-time anomaly detection have to evolve with time. Static methods based on existing training data, which was formed by taking an actual sample, may not serve the purpose of fundamental discoveries. 'That's because data changes are fast with immense volume, and accordingly, the models have to learn from data for rightful predictions.

The actions that are carried out for solving the problem of anomaly can be delayed but the detection of an anomaly in real time cannot be missed. This is because the data containing the anomaly can consist of information that can further lead to loss or gain in business. For building the real-time anomaly detection platform following are the requirements

  • Collection of data

  • Aggregation of data

  • Visualization

  • Alerts/Notifications

  • Exportation

  • Analytics

A monitoring solution of such caliber must be able to correlate metrics across nodes, VMs, containers, and applications, perform capacity planning, perform proactive monitoring, and generate alerts, notifications, and reports. Any solution with all the above capabilities will surely see widespread adoption.

Our solutions cater to diverse industries, with a focus on serving ever-changing marketing needs. Real-Time Data Analytics Services

Summarizing Real-Time Anomaly Detection

Architecture's structure can be based on five critical pillars. They are Collection of data, Aggregation of data, Visualization, Alerts/Notifications, and Exportation. Analytics Many of the challenges mentioned at the beginning of the article are remedied by this approach and should be present in the ideal next-generation monitoring tool. The challenges faced by visualization are taken care of by a well-thought-out UI that consumes the API exposed by the solution and provides a single integrated dashboard and a different metrics store for the answer.

It should also offer a unique set of APIs for monitoring hosts, virtual machines, containers, and applications. The five pillars also improve the usability of an ideal solution, which should come packaged with scripts that make it easy to set up on a single workstation and also have a rich set of APIs for onboarding, catalog, labels, availability, metrics, dashboards, and exporter.