Around ten years ago, the tools for analytics or the available resources were excel, SQL databases, and similar relatively simple ones when compared to the advanced ones that are available nowadays. The analytics also used to target things like reporting, customer classification, sales trends whether they are going up or down, etc. In this article, we will discuss Real Time Anomaly Detection.
SQL Databases can also be termed Relational Database management Systems, also known as RDBMS, a classical approach to storing and operating historical data. Click to explore about, SQL vs NoSQL vs NewSQL
What is Big Data Analytics?
As time passed by the amount of data has got a revolutionary explosion with various factors like social media data, transaction records, sensor information, etc. in the past five years. With the increase of data, how data is stored has also changed. It used to be SQL databases the most and analytics used to happen for the same during the ideal time. The analytics also used to be serialized. Later, NoSQL databases started to replace the traditional SQL databases since the data size has become huge and the analysis also changed from serial analytics to parallel processing and distributed systems for quick results.
What is Cognitive Analytics?
In the contemporary world, the analytics mainly target predictions with impeccably high accuracy trying to be as close to human understanding as possible, basically trying to mimic machines with the Cognitive intelligence that humans have. This analysis should be accurate, fast, and with constant Learning. The output is expected in real time and also predict future events.
What are the Layers of Cognitive Intelligence?
Cognitive Computing helps in accelerating human intelligence by human Learning, thinking, and adaptivity. By moving towards the machine capacity, it not only helps in augmenting the human potential; instead, it will increase the creativity of the individual and create new waves of innovations. The key areas of capability are -
Sensory Perception- Machines are enabled in such a way that they can stimulate the senses of humans such as smell, touch, taste, and hearing. Therefore, they are developed in terms of machine simulation such as visual and auditory perception.
Deduction, Reasoning, and Learning- In this case, the machines are simulated with human thinking for decision making. Therefore various technologies such as machine learning, Deep Learning, and neural networks are deployed as a system to intelligence to extract meaningful and useful information and apply the judgment.
Data processing- In this larger dataset is accessed to facilitate the decision-making process and provides practical suggestions. Therefore, hyperscale-computing, knowledge representation, and natural language processing togetherly provide the required processing power to enable the system for engaging in real time.
What are the features of Cognitive Computing Solutions?
The purpose of cognitive computing is to create the frame for computing such that complex problems are solved easily without human intervention. Features are listed below -
Adaptive It is one of the first steps in developing the machine learning-based cognitive system. The solution imitates to adapt the human ability with the Learning from the surroundings. It is dynamic for data gathering, understanding goals, and requirements.
Interactive The cognitive solution should dynamically interact bidirectionally in nature with each element in the system such as processes, devices, users, and cloud services. The system can understand human input and provides the results using natural language processing and deep learning models.
Iterative and Stateful The system should be able to learn from previous iterations and ready to return the information which is specifically crucial at that time. The system must follow data quality and visualization methodologies so that it provides enough information and the data sources can operate reliable and updated data.
Contextual The system should be able to understand, identify, and even extract the contextual elements from the data such as meaning, syntax, time, location, task, goal, and many more. The system removes the multiple sources of information like structured, sensor inputs, unstructured and semi-structured data.
What is the working of Cognitive Computing?
Cognitive applications use deep Learning and neural network algorithms to control technological applications such as data mining, pattern recognition, and natural language processing. The system gathers a variety of information and processes it with the previous report it already knows. After the completion of data analysis, it integrates with the adaptive page displays to visualize the content for specific audiences in specific situations.
The goal of cognitive computing is to simulate human thought processes in a computerized model. Source: Cognitive Computing
How Cognitive Intelligence relates with analytics?
Cognitive Computing extends the level of analytics at the next level using new technologies. It is used when the vast corpus of textual data is available as a free text document. Therefore, cognitive Computing is used to analyze these documents to support a wide range of activities. The cognitive system supports the training of the model using data containing a large set of training examples like a train the model using an extensive collection of questions with their corresponding answers.
What is an Anomaly?
While being busy trying to analyze data sometimes, it ends up with unexpected occurrences otherwise called as ANOMALIES. An "Anomaly" means abnormal or unexpected behavior or deviation from a regular trend. What are the Anomalies that are encountered in daily life?
Server getting down
Any unauthorized person entering a network.
Data leakage etc.
The need for Anomaly Detection?
Earlier, anomalies seldom occur. If the anomaly is not detected and rightful actions are not taken, soon the consequences may prove to be costly in situations like Network intrusion, change in log patterns, data leak, fraud transactions, Insider trading, and many more. Just imagine the loss that could incur if any of the lists mentioned above occurs.
Real Time Anomaly Detection in Docker, Hadoop cluster
The anomaly detection in docker runs at the same level as the docker daemon and keeps track of the events. When an event informs that the new container has started, the anomaly detection also begins simultaneously. The anomaly detection algorithm queries the daemon to find the process IDs that are running inside a container. Then the syscalls are recorded using the process 'ID's, root privileges, and place and are sent to Redis queue using message service. The behavior is analyzed by the anomaly detection process to send notifications to the administrators.
Hadoop is one of the widely used big data platforms among various industries and businesses. With the increase in usage, the data availability has also increased so is the necessity for detecting anomalies in Hadoop clusters. Traditionally, the rule-based models are used for alerting the occurrence of anomalies using domain knowledge and experience about the target system. The problem occurs when hidden patterns are not able to explore where the anomalies lie in business problems where the scope of domain knowledge is less. For this purpose pattern recognition techniques DBSCAN and PCA are used to find out the anomalies without any prior experience.
Anomaly Detection in IoT
Internet of things in simple terms is the connection between everyday usage devices with the internet. So, once the device is connected with internet access to the data of that device is obtained. Essential tools that require anomaly detection techniques are the ones used in industries and business organizations which has sensors. Data has been received continuously with every passing second through these sensors. Notably, in the maintenance of the systems, the sensors have to be monitored to predict the anomaly. These predictions have high economic value because in IoT as multiple things are interlinked. While working using immense data, the following things are required?
What are the features that are responsible for the anomaly? What transformations should be made on these features to detect patterns? What patterns signify anomaly? To explain with a simple example, 'let's consider a sensor gives temperature values of special equipment in an industry. Change in sensor values is used to know if the equipment is stable or about to fail. Tracking is done on the statistical measures mean and standard deviation of the temperatures over some time.
If there are changes, that is a shift in a mean or considerable fluctuations in standard deviation values, there is something wrong with the equipment, and immediate action is required. This notification is sent as an alert. With the advances in technology, multiple machine learning and statistical techniques are used to identify and predict anomalies accurately. The significant advantage is once the complete system is automated one needs not always keep track of equipment to know if everything is okay or not.
Anomaly detection on Time-Series data with Deep Learning
Deep Learning basically can be thought of as an extension to Artificial Neural Networks(ANN). An artificial neural network consists of input nodes that are passed to a series of hidden layers, and an activation function is used which signifies the signal strength that is supposed to be sent to connected nodes. The output is the result of a classification or regression problem. The model uses the result and the error to learn and update the model by changing the parameters involved. Deep Learning is a complicated case that includes mathematical concepts like matrix algebra, probability, etc. and intense hardware resources.
The model training with Neural networks and deep Learning happens using an objective function like a stochastic gradient process. The predict the value, calculate the error, and update the model to reduce the error. The implementation of neural network and deep learning can be done using Keras, TensorFlow which are open source and has libraries in python. Recurrent neural networks store information in the hidden layers which are updated when new data is feed to the network. These can be used in time series. The applications of RNN are in handwriting recognition, speech recognition, log analysis, anomaly detection, etc. So, 'RNN's are used to find out anomalies in any of the mentioned use cases.
Transitioning from anomalies to correlation
Well, if everything goes on fine and let us assume, a reasonably good anomaly detection system is developed, should this be it? NO. It should also be able to determine what are the factors responsible for the anomaly. The challenge is what if the data is enormous? Across so many fields/variables, which fields are responsible for an anomaly? To find this is an essential task because it helps in taking immediate action to reduce the loss if the root cause is found out. This analysis to find out the reasons for anomalies is called correlation analysis.
Downloading a huge file is the reason for more memory usage in a short time in Log data.
Running a machine learning model with a massive amount of data results in an anomaly in CPU usage.
A suspicious person trying to enter the network shows sudden changes in Log patterns.
Transitioning from Monitoring to optimization
Once the correlation is done, the next step is to predict the potential anomalies that may occur in the future. So, how to predict future anomalies? One of the methods is to use Bayesian classification and Markov models. If a component with various features is given and if it is failing any function, ends up failing the element is an anomaly.
Based on the trends of the features Markov model helps in identifying the future values up to some k timestamps, and the Bayesian method is used to predict the probability of anomaly symptoms. The feature "A" is expected to fail after the timestamp "T" which results in failing up of component "C." If feature A is repaired within timestamp T, it will end up making the component "C" running as usual. If it is unable to predict the reason initially and if the component fails, the loss is -
Time to identify the reason for the failure.
Get the feature(sub-component) repaired.
Get the component running again.
Instead, one should be careful enough in predicting the failure of any feature, by taking several actions before the occurrence. Hence, instead of monitoring every component in the system and resolving the issues, optimization is necessary followed by the automation of this process. Improving the steps from finding out anomalies through to preventing anomalies.
In this approach, historical data is used which says data points and class defining if each position is abnormal or not. It is similar to the classification problem. So, the class variable as the anomaly column is taken and applied with the models such as "Random Forest," "XGB," "SVM" or regression algorithms to train the data. This model is used to the new data point to know if it is an anomaly or not. One should be careful about the ratio of an anomaly to no-anomaly in the dataset. It 'shouldn't be too high - for example, more than 1:10 since it becomes a class imbalance problem.
Unsupervised Anomaly detection
Clustering techniques are applied in this as it is not known before if a data point is anomaly or not. So, clustering algorithms are used to detect anomalies.
K-Means clustering algorithm is required to be given with the number of clusters to be formed as an initial input, based on that value the algorithm provides the same amount of clusters as an output. At present, by this process, it will consider all the data points and form clusters. Restrict the distance of the boundary from the centroid from each cluster. The restriction can be 95 percentile or 99 percentile based on the requirement. The points outside this range after the clustering process is done are the anomalies.
This is also a clustering algorithm that is different from k-means. In DBSCAN minimum points and maximum distance as parameters are chosen. The algorithm initially starts with a random point and select the locations in the Ɛ neighborhood and links them. Each of those will continue the same process. With this, the patterns are, and all the possible clusters are formed. The leftover points are considered as anomalies. In the case of the Supervised Algorithm, the data should be trained already, and the model can detect only that kind of anomaly that is learned by it before. Therefore, this algorithm is not feasible for detecting all types of anomalies. So, Unsupervised Algorithms can be used instead of supervised algorithms. This can identify any anomalies from data, i.e. the anomalies that are never seen before.
Log Data is a time series of data. So, the data point being an anomaly depends on time. A particular value might be an anomaly at a one-time stamp but not in another and vice-versa. A lot of information generated by the server like memory usage, CPU usage, read-write operations, power consumption, etc. Let's consider power consumption usage to detect if there is any abnormal behavior.
Hypothetical data in a table is described below
The basic idea is that firstly get the data to look at the extreme values. Normal distribution may be considered and look for absolute values and conclude them as anomalies. So, based on a mean and standard deviation of the benefits of the memory usage, get the points that are beyond the 90% or 95% or 99% range based on our requirement and conclude them as anomalies. In that case, all the values above 210 as usage values are considered as anomalies. Let us look at the plot for the same data. But is this explanatory enough for time series data?
Moving averages are used assuming that there 'wouldn't occur sudden changes with time. If there is one, then it is an anomaly. To explain the moving average in simple terms, usage value at the timestamp should be close to the average usage of the past few timestamps. The number of timestamps to take into consideration depends on the data. This technique is called a simple moving average. In this technique, the latest points are as important as the old ones. Another technique is the exponential moving average. In this technique, the latest points are given more importance in the analysis that the old ones. St+1 = α*yt+(1−α)*St , 0 < α ≤ 1 , t > 0.
Neither techniques are superior. The method of being used depends upon the problem and the data. The point to consider here is exponential moving average is sensitive to the latest points but not the simple moving average. Sometimes one gets misled by the data by ignoring the specific scenarios which seem anomaly but not. For instance, on an off day, when there are no employees the data is expected to be low, during such scenarios, two problems are analyzed .-
False alarms on the off day, because there is a mismatch between typical day values and off day values.
The off-day values are used in the analysis to predict the following day's values giving scope for erred results.
To deal with such incidents, impute the values of the off day with the expected benefits assuming the day was not off and continue the analysis. This way precision and recall could be adjusted.
Importance of Real-Time Anomaly Detection
In specific use cases, anomaly detection has to work in real-time. As soon as the anomaly is detected several measures can be taken to mitigate the loss. The techniques used in real-time anomaly detection have to evolve with time. Static methods based on an existing training data which was formed taking an actual sample may not serve the purpose of fundamental discoveries. 'That's because data changes are fast with immense volume and accordingly the models have to learn from data for rightful predictions.
The actions that are carried out for solving the problem of anomaly can be delayed but the detection of an anomaly in real time cannot be missed. This is because the data containing the anomaly can consist of information that can further lead to loss or gain in business. For building the real-time anomaly detection platform following are the requirements
Collection of data
Aggregation of data
Capability to correlate metrics across nodes, VMs, containers, and applications, capacity planning and proactive Monitoring, ability to generate alerts, notifications, and reports are must-haves in a monitoring solution of such caliber, and any solution with all the above capabilities will surely see widespread adoption
Architecture's structure can be based on five critical pillars. They are Collection of data, Aggregation of data, Visualization, Alerts/Notifications, and Exportation. And Analytics Many of the challenges mentioned at the beginning of the article are remedied by this approach and should be present in the ideal next-generation monitoring tool. The challenges faced by visualization are taken care of a well thought out UI which consumes the API exposed by the solution and provides a single integrated dashboard and a different metrics store for the answer.
It should also offer a unique set of APIs for monitoring hosts, virtual machines, containers, and applications. The five pillars also improve the usability in an ideal solution, which should come packaged with scripts that make it easy to set up on a single workstation and also have a rich set of APIs for on-boarding, catalog, labels, availability, metrics, dashboards, and exporter.