Understanding Observability for AI Infrastructure
- Build an Observability Platform for Monitoring, Logging, Tracing and Visualization of Big Data Cluster and Kubernetes with ML and Deep learning, Data Pipeline in Real Time and receive alerts if any critical activity occurs in the cluster.
- Utilize and access the reports of Cluster according to HDFS Quota, number of Files and Directory Access Reports those are maximum and minimal useable with Time-span.
Challenge for Building AIOps Platform
Defining Four pillars of the Observability
- Distributed systems Tracing Infrastructure
- Log Aggregation/Analytics
Solution Offered for Building Reactive Platform
- Build Reactive Platform for Big Data Analytics using Apache Flink and Scala. Microservices Architecture on Kubernetes, Tracing, and Monitoring. Logs Aggregation for Object Storage.
- Alerting System to Process the Alerts. Analytics Platform to detect data anomalies and enable Log Aggregation to show all related log files at one place at any given period, save efforts and time of development team.
- Build an Extensible platform for Observability and Monitoring of Microservices, Kubernetes, and Big Data. An approach involving Continuous Security, Compliance and Automation for Cloud-Native application for Constant Integration, Testing, Deployment, Delivery and DevOps Pipeline for Enterprises.
- Automation for Cloud-Native application for Constant Integration, Testing, Deployment, Delivery and DevOps Pipeline for Enterprises.
Understanding Reactive Monitoring
Observe and monitor the progress or state of something over a span of time.Keep under well-organized review.Maintain constant surveillance. Monitoring Levels include Infrastructure Monitoring, Data Pipeline Monitoring, Applications/Jobs Monitoring.
Understanding Reactive Observability
Provide extremely granular insights into the performance of systems along with rich context. Provide clarity into implicit failure modes. Provide fly generation of information required for debugging