XenonStack Recommends

Enterprise AI

Real-time Machine Learning | The Complete Guide

Dr. Jagreet Kaur Gill | 16 May 2023

Real-time Machine Learning


There is a growing need for compact artificial intelligence (AI) national security solutions that can be deployed quickly while still being highly accurate. However, machine learning systems are typically pre-trained and cannot adapt to new datasets in real time, limiting their ability to learn and respond to changing scenarios.

To address this challenge, the Real Time Machine Learning (RTML) program is working on developing hardware generators and compilers that can automatically create ML (Machine Learning) Application-Specific Integrated Circuits (ASICs) from high-level source code without the need for human intervention. The aim is to create a compiler that can convert a machine learning framework written in a high-level programming language into Verilog, streamlining customized hardware development that can perform complex ML algorithmic tasks on a chip. Real-time ML refers to machine learning systems that can adapt and make predictions in real-time or near-real-time. For example, in the e-commerce industry, a real-time ML system can quickly respond to changing customer preferences and supply updated recommendations to improve the customer experience. This is achieved by continuously learning from new data in real time rather than relying solely on pre-existing datasets.

Technique that can be used as a democratization of AI tools and techniques, enabling business users to develop machine learning models. Taken From Article, Automated Machine Learning

The RTML program is working towards developing real-time ML systems that can quickly adapt to changing scenarios and provide highly accurate predictions with limited computational resources. These systems can potentially revolutionize industries such as e-commerce, entertainment, and advertising, where customer satisfaction is crucial.

What is Real-time Machine Learning?

Real-time machine learning involves continuously training a machine learning model with live data instead of relying on historical testing data in an offline mode. This approach is practical when limited initial data is available for training or when the data needs to adapt to new patterns. For instance, a product recommendation engine based on machine learning can quickly adjust to changes in consumer tastes without requiring retraining efforts. The benefits of real-time machine learning include faster decision-making, improved customer experiences, and better predictions. Additionally, this approach can be helpful in situations where retraining the model is expensive or when there is a need to respond quickly to changes in the data. Real-time machine learning involves:

  • Deploying machine learning models into production environments.
  • Receiving live data.
  • Continuously updating the model's predictions.

Real-time machine learning systems can be complex and require careful design and implementation to ensure they are reliable, scalable, and secure. The deployment of real-time machine learning models requires monitoring to ensure their performance and to identify any issues that may arise.

So, this is a powerful approach to machine learning that can provide immediate accuracy and enable quick adaptation to new patterns in data. It is helpful in scenarios where data is continuously changing and can help organizations make faster decisions and improve customer experiences. However, deploying real-time machine learning models requires careful design, implementation, and monitoring to ensure their reliability, scalability, and security.

An area of computer science that allows computers to learn without having to be programmed directly. Taken From Article, Machine Learning (ML) in Security

Why is there a need for Real-time Machine Learning?

Real-time machine learning has become increasingly important due to the need for quick and exact decision-making in various industries. For example, real-time ML can help detect fraudulent transactions and prevent monetary losses in the financial sector. In healthcare, real-time ML can be used for disease diagnosis and monitoring, allowing for prompt interventions and improving patient outcomes. Real-time ML can also help the e-commerce industry by supplying personalized recommendations based on user behavior in real time, improving the customer experience and increasing sales. Furthermore, real-time ML allows for adapting to rare events and learning from each new input, making it essential in situations where user preferences can change rapidly. Real-time ML is crucial for businesses to stay competitive in today's fast-paced environment and supply the best user experience.

Machine Learning Model Preparation

Real-time machine learning is essential because it enables organizations to make faster and more accurate decisions based on today's fast-paced and competitive business environment. Real-time machine learning models are commonly deployed in an event-driven architecture where a data stream is continuously fed into the model. The processing pipeline takes care of any data transformations and enrichment necessary to prepare the data for input into the model. The data pipeline updates the model in real-time by incorporating the current live data and the reference data set on which the model is built. This allows the model to continuously learn and adapt to new patterns in the incoming data.  

The feature store is an essential component of the real-time architecture, which holds the reference data that trains the model. This store feature must be continually updated based on new data points in the data stream. In deployments with high input data rates, the feature store must be fast and have extremely low latency. It should be powered by in-memory technologies to achieve this level of performance. The feature store is critical in ensuring the model is accurately trained and continuously learning from the incoming data.  

Leverage insights from data and identify business opportunities to improve the way to serve customers and increase sales. Taken From Article,  Real-Time Intelligence

Difference between Real-time and Analytical Machine Learning

Real-time machine learning and analytical machine learning are two different approaches to using machine learning in an organization. Real-time machine learning is used for mission-critical applications requiring fast real-time decision-making. Examples of these applications include credit card fraud detection, recommendation systems, and loan application approvals. On the other hand, analytical machine learning is used for human-in-the-loop decision-making and operates at a human timescale. Examples of these applications include churn predictions, customer segmentation, and sales forecasting tools. The two approaches are implemented differently and serve different functions within an organization.

What is Real-time Data Pipeline?

Real-time machine learning applications require real-time data pipelines, which ingest raw data from event sources and transform it before delivering it to machine learning models. These pipelines must be designed to handle errors to ensure the models receive all data. Real-time data pipelines allow for real-time feature engineering, which adapts to changes in user behavior by making features available through online feature stores. These stores allow for complex calculations to be run in real-time and facilitate feature sharing for machine learning models at scale. 
Online training, which involves adjusting model parameters as new data arrives, can be challenging to implement. As a result, offline training is often used instead. For example, Weibo uses streaming machine learning, which involves downloading real-time logs to generate training data, performing offline training, and uploading updated model parameters to the online parameter service, all within minutes to hours. So, real-time feature engineering and online feature stores are essential for keeping machine learning models up to date in real-time machine learning applications. By leveraging real-time data pipelines and event-driven architectures, data can be processed and delivered quickly and accurately to the models, enabling them to make informed decisions in real time.

What is Online/Real-time Prediction?

Although widespread adoption of continual learning may be a few years away, companies are making significant investments to shift towards real-time inference. The primary online prediction system that uses batch features helps supply in-session adaptation to users. Then, it will delve into developing a more advanced online prediction system that incorporates complex streaming and batch features. This shows a growing interest in moving towards online prediction systems among businesses. There is no evidence of plagiarism in this text.

Batch Prediction

Batch prediction is typically used in scenarios where predictions can be pre-computed at regular intervals and don't require immediate response times. This approach is common in collaborative filtering and content-based recommendations, where predictions can be generated in bulk every few hours or once a day. Companies like DoorDash, Reddit, and Netflix have extensively used batch prediction. Batch prediction is not ideal for scenarios where predictions need to be made in real-time or near real-time, such as fraud detection, anomaly detection, or personalized recommendations for new or unregistered users. In these cases, online or real-time prediction is required, which involves processing incoming data streams and generating predictions in real-time.

It is still widely used in industries that rely on legacy batch systems such as Hadoop, which are well-suited for the periodic processing of large volumes of data. However, as the demand for real-time and near real-time predictions grows, businesses are increasingly shifting towards online prediction systems that can handle data streams and generate predictions in real time.

Real-time Prediction with Batch Features

Real-time inference is well-suited for scenarios where predictions must be generated quickly in response to user requests. For example, in e-commerce, real-time inference can be used to provide personalized product recommendations to users as they browse a website or app. Real-time inference can also be helpful in fraud detection, where quick decision-making is critical. Companies can quickly detect and respond to suspicious activity by analyzing user behavior in real time. In general, any application that requires real-time decision-making can benefit from using real-time inference. This process is called candidate generation or retrieval. The embeddings can be learned separately or together with the ranking model, and this approach can be applied to tasks like ads CTR and search. The goal of session-based predictions is to increase conversion and retention rates. Many companies, including Netflix, YouTube, Roblox, and Coveo, are already using or planning to use online inference. This trend is expected to grow, with most recommender systems becoming session-based within the next two years.

Huyen's Stage 3 in the evolutionary scale of machine learning involves online prediction with complex streaming and batch features. Companies at this stage require many streaming features to make accurate predictions. These features are extracted from streaming data and may include dynamic or online features. Companies like Stripe, Uber, and Faire use this technology for fraud detection, credit scoring, estimation for driving and delivery, and recommendations. To move machine learning workflows to this stage, essential requirements like a mature streaming infrastructure, a feature store, a model store, and a better development environment exist. Data scientists need direct access to data streams to validate new stream features, and technologies like Flink and Kafka notebook integrations make it possible today. Huyen's path is based on her experience with some of the most technologically advanced organizations.

A part of Artificial Intelligence (AI) that give power to the systems to automatically determine and boost from experience without being particularly programmed. Machine Learning Model Testing Training

What is the Efficiency of Stream Processing and Batch Processing?

Compare the effectiveness of a system using three criteria: cost efficiency, performance efficiency, and talent efficiency. This involves analyzing the system's ability to achieve its goals cost-effectively, its computational efficiency, and how efficiently it utilizes the skills and expertise of its personnel.

Cost Efficiency

There is no universal formula for calculating cost efficiency between stream and batch processing, as it depends on factors like latency requirements, data size, and failure tolerance. Stream processing is helpful for continuous, unbounded data processing and is often less expensive than stateless batch processing when used correctly. For example, streaming can avoid redundant processing when computing a sliding window over time, making it less computationally expensive than batch processing. However, batch processing is still the better choice for the one-time processing of large datasets at rest. The most effective teams are skilled in stream and batch processing.

Performance Efficiency

Stream Processing, particularly with frameworks like Apache Flink, has become a sophisticated technology that is highly scalable and fully distributed. It's optimized for speed and unbounded data, and its performance is measured by throughput per operator. However, it could be better for processing extensive, bounded data. In contrast, batch processing is better suited for one-time processing of extremely large datasets. Although stream processing and batch processing have their strengths and weaknesses, having a cohesive and unified architecture that leverages both is currently an active development area. As a data processing user, worrying about the current divide between streaming and batch processing is okay because the gap will eventually close as the abstraction inevitably moves up.

Talent Efficiency

It is commonly agreed that managing stream processing in-house requires a highly skilled team of infrastructure engineers capable of technical depth and operational work. However, as data processing becomes increasingly commoditized, many companies will consider purchasing solutions rather than building in-house. The analogy used is that just as you don't need nuclear scientists to turn electricity into higher values, companies don't necessarily need to handle stream processing internally.

Real-time Machine Learning trends in the real world

The use of real-time machine learning predict times and how the process of converting raw data into machine learning features is common across all real-time machine learning use cases. The importance of modern data architecture in enabling real-time machine learning and how companies like Tecton are building central feature platforms to help businesses build and deploy machine learning models more efficiently.

In this example from Uber Eats, we can see how real-time machine learning is used to predict food delivery wait times and show those predictions live in the app. Behind the simple user interface, a complex system is at work that draws data from multiple sources, such as driver availability, restaurant kitchen capacity, and customer location, to make accurate delivery time predictions. 
To accomplish this, Uber Eats uses a feature platform called Michelangelo that converts raw data into machine learning features, which are then used to make real-time predictions. For example, the feature 'num_orders_last_30_min 'could be used to predict delivery time.

This process of converting raw data into features and then using those features to make predictions is common across all real-time machine learning use cases, regardless of the problem being solved. This commonality allows Tecton to build a central feature platform for all real-time machine learning use cases.

By centralizing the feature engineering process, Tecton's platform can help companies build and deploy machine learning models more quickly and efficiently. This can lead to more accurate predictions and better business outcomes across various use cases, from fraud detection to recommendation engines.

Uber's success in using real-time machine learning was due to its modern data architecture, which enables the data collection from various sources and converts it into machine learning features for predictions. Other industries have also adopted modernization to keep up with technological advancements.

Handling Dynamic Real-time Features

Real-time features are computed on the fly in an event-stream processing pipeline. This approach differs from the batch approach because it requires a list of aggregated values for a specific window in each period rather than an overall aggregation of values. This is useful for use cases like predicting engine failures, recommending news articles, and estimating delivery times. A Dataflow streaming pipeline can capture and aggregate real-time events and store them in a low-latency read/write database. Cloud Bigtable is a suitable possibility for such a database. These dynamically created features can then be used as input to the model to produce predictions. Figure 5 shows a high-level architecture of a stream processing pipeline.

  1. The system retrieves real-time events from Pub/Sub with a Dataflow streaming pipeline, which applies time-window aggregations.
  2. The aggregations of real-time data are stored and supported in Bigtable.
  3. The input features for the AI Platform models are taken from the values stored in Bigtable in real time.
  4. The predictions generated by the AI Platform models are saved in Datastore. Other systems can use these predictions or send them as Pub/Subtopics for further processing and real-time notifications, ads, etc., to be delivered to the right user.
The process used for analyzing the huge amount of data at the moment it is used or produced. Taken From Article, Real Time Data Streaming Tools

What are the technologies used to make Real-time Machine Learning?

The scenario involves building an app that utilizes machine learning to learn from user data in real time. The data is first streamed in real-time using Apache Kafka software to a model built and trained using the Spark library. The resulting predictions are then saved to a database for analysis, visualizations, and updating of the user interface. While these specific technologies are not mandatory, they are commonly used for this use case. Kafka is a popular choice for data streaming due to its powerful features.

Applying an ML model to the feature vector for prediction is crucial when responding to real-time requests using machine learning. However, using Golang can complicate things. Different approaches can be taken, such as using an -  

  • Inference endpoint
  • Custom application code
  • Cross-runtime algorithms
  • Portable model formats

When the training environment is different from the runtime environment, like PySpark for training and Golang for runtime, it can be challenging to use models trained with standard data science stacks such as sklearn, XGBoost, SparkML, and TensorFlow. Several approaches can be used to deal with these limitations, but pure Go implementations still need to be improved.

What are the Challenges of Real-time Machine Learning?

There are many challenges facing continual learning, both theoretical and practical.


Continual learning challenges traditional machine learning approaches by removing the notion of epochs and convergence. In this approach, the model encounters each data point only once, and there is no stationary point to converge to, as the underlying data distribution is constantly shifting. Another challenge is evaluating the model, as traditional batch training evaluates the model on stationary test sets. Continual learning, on the other hand, requires online evaluation using current data. Companies often subject new models to offline tests before deploying them online and then evaluate them in parallel with existing models via A/B testing to ensure they perform better. The choice of evaluation metric is also a critical consideration.


A standard infrastructure for online training currently needs to be available, leaving companies to build their in-house solutions. While some companies have adopted a streaming architecture with parameter servers, others have kept their online training infrastructure confidential as it provides them with a competitive advantage. As a result, more publicly available information on the infrastructure companies use for online training must be available.

Integrate Real-time analytics services with AI for data ingestion, processing and streaming analysis to continuously gain insights. Explore Real-Time Analytics Services


In conclusion, real-time machine learning is a powerful tool that can benefit businesses significantly. Still, it requires a mature streaming infrastructure and collaboration between the data science/ML and platform teams to be effectively implemented. Fortunately, many solutions are now available to make streaming more straightforward and accessible. Companies can also take advantage of surveys like the one conducted better to understand the challenges and adoption of real-time ML. With the help of teams specializing in online prediction, model evaluation, and automated training, businesses can unlock the full potential of real-time machine learning and gain a competitive advantage in their industries.