Machine learning Platform for Training, Validation, and Serving
XenonStack is a leading machine learning platform building company that builds products based on machine learning/ artificial intelligence. Our client wanted a product where he can automate the process complete process of machine learning from model building to model deployment in production. We were asked to automate and unite the following operation- Building a machine learning model
- Integrating machine learning model with the existing data pipelines
- Training model over a large data set
- Versioning of ml/dl models
- A/B testing of machine learning models
- Analyzing the performance of various models
- Serving machine learning models
- Continuous deployment of machine learning models
Challenges for Building Machine learning Platform
We needed to build a platform that can build, version, validate and serve machine learning models. A platform where :- A data scientist can build a machine learning model, version and train models
- A data engineer can integrate the model with the existing data pipelines
- An analyst can visualize the data generated from the machine learning model
- A testing team can perform A/B testing
- We can establish a standardized platform that enables cross-company sharing of features data and components
- We can “Make it easy to do the right thing” (ex: consistent training/streaming/scoring logic)
- No consistency between ML Workflows
- New teams struggle to begin using ML
- Existing ML workflows are slow, fragmented and brittle
Solution Offered to Build ML Platform
Akira-ai a complete platform where we can perform model building, validation, versioning, serving and deployment of machine learning models.Our solution has been inspired by Uber’s Michelangelo and Netflix’s meson project. Features of Akira AI-- Distributed training of machine learning models over big data
- Model versioning
- Machine learning model analytics
- Model validation
- Model visualization
- Model impact analysis
- Model comparison to find out which model is best suited
- Model serving in production and sandbox environments
Overview of Solution Architecture for Building ML Platform
Feature storage The data from the data warehouse is processed and stored in an extracted feature repository which can be later used by data scientist to build machine learning models from these features. Model Building Model building services consist of the jupyter notebook which helps in easy model building and data visualization. Model Training Models are trained in a distributed manner and using specialized hardware such as TPU’s for faster training. Training is done on multiple nodes simultaneously over big data. Model Versioning Generally, for a solution multiple machine learning model are built. so we store and version the models in our model repos Model Deployment Deployment service is responsible for the deployment of built machine learning models and through this service the machine learning models are made available in different regions. Model Validation Once the models are deployed the models are put into production the models are made available to the end users or a/b testing of the models is done. On the basis of the impact of the model, the models are validated to their performance. We can monitor the model performance via a monitoring dashboard. Once the best model is selected it is made available across all the regions.Technology stack:
Model building Jupyter Machine learning Tensorflow, Keras, sci-kit-learn Model training distributed / standalone Google cloud TPU / Cloud machine learning engine Data warehouse Big query Data pipeline Cloud data flow Data visualization Google data studio Model versioning and serving Tensorflow-serving Deployment Kubernetes, dockerImpacts of Solution Architecture for Building ML Platform
- Enabled more users to create machine learning based products
- Reduced time and efforts
- Enabled easier model evaluations
- Increased insights into machine learning model in production
- Standardized environments for machine learning model development
- View Real-time model performances
- Better business decisions based on deep insights from the user data
- Reduced development time
- No need to extract features again and again
Thanks for submitting the form.