Overview of ASIC MonitoringOld general-purpose processors (CPU) fail to meet the rising computational demand of Artificial Intelligence. Model complexity and computation demands are rising by factor 10 each year which far outpaces improvements in CPU performance. Thus Facebook comes with accelerators that are hardware devices used to optimize AI prediction and video encoding by fulfilling demands of computation and latency. Accelerators give 10x-30x more performance on their most significant AI models and deliver 3-10x performance-per-watt improvement over a CPU.
What are the challenges of ASIC Monitoring?
- Cloud infrastructure needs to keep the accelerator running smoothly and reliably to provide a good user experience.
- It is undoubtedly improving the performance of data centers. But, it is challenging to operate them efficiently at scale due to their heterogeneous nature.
- It has a complex software and hardware system on its own.
- Thus to operate them smoothly, an observability platform is required. Facebook introduced three tools to perform this task these are: ASICmon, Asimov, and Atrace
Observability and Monitoring have become a relevant term, importantly when you’re discussing software development. Click to explore about, Observability vs Monitoring
Why do we need ASICmon or Accelerator observability?
The importance of ASICmon or Accelerator observability are below:
While working, the accelerator may overheat or hit faulty conditions or encounter a functional bug. Thus an automated process is required to monitor ASIC health and remediate issues by resetting the accelerator or repairing it whenever required.
Monitoring performance and system load are very important to scale AI jobs to meet the day's demands. It helps to detect regression in performance with new models and software deployments.
Sometimes issues like time outs and poor performance work as bottlenecks in software run; to resolve such problems, it is a must to know how an accelerator works. Moreover, it is needed to equip software developers with tools that may help them understand their application's performance while running on accelerators.
A way to get insights into the whole infrastructure. It is essential for the operations team. Click to explore about, Observability Best Practices and its Benefits
What is Asicmon?
ASICmon offers abstraction to upstream monitoring software. Moreover, it makes development easy by leveraging customer-built specification language - Asimov.
Asimov helped prototype and onboard the new accelerator quickly, thus reducing onboarding time from months to weeks. Tracing also plays a vital role in understanding performance and interaction between CPU and accelerator. Atrace, a tracing framework, helps collect and process traces at scale. Atrace provides insights into operator profiles and critical path analysis.
In addition, Native tracing capabilities can be extended by correlating events to the CPU in the open-source Gloang PyTorch software stack. It allows engineers to close a 10% performance gap on PyTorch and caffe2 AI models implementations.
What are the design objectives of ASICmon?
- Abstraction: A simple and uniform interface for all internal monitoring and operational tools to enable infrastructure engineers and other teams to operate multiple accelerators commonly effectively.
- Development velocity: The framework should be able to iterate quickly and easily understand.
- Performance: The Observability system should be lightweight in terms of resources. So, it diminishes interference with high-throughput video and AI applications.
What are the benefits of ASICmon?
- Asicmon acts as a connector between individual accelerator drivers and the rest of the internal monitoring software.
- There are health check tools at the left top of the diagram which spot the health. It spots any change in health signal and then automatically fix faulty ASICs.
- On the right, there is a telemetry daemon that periodically publishes performance metrics so that engineers can inspect the accelerator.
- Then an automated load balancing and auto-scaling system such as Shard manager utilize the counters.
An Observability Platform for Monitoring, Logging, Tracing and Visualization of Big Data Cluster and Kubernetes with ML and Deep learning, Data Pipeline in Real Time. Click to explore about, Observability for Kubernetes
How does Asicmon work?
Asicmon creates an instance per accelerator device to monitor the module. It maintains a cache of statistics that updates periodically by probing the accelerator driver and computing-derived metrics.
What is Asimov?
No doubt they come with accelerators to solve computation. However, they can still solve the difficulty of writing the glue code that connects the accelerator driver to these standard metrics, which needs to be done separately for each of the accelerators with aggressive and overlapping timelines.
Therefore a method was required to develop on Asicmon that could be quick to iterate and easy to ramp up on while also efficient. Here, Asimov comes in.
How does Asimov work?They are using Shard Manager to scale inference service instances automatically. Here a shard is a copy of an AI model that can serve inference. The load on the device is measured by Asicmon using an abstract metric called accelerator device utilization. Thus, it helps share balancing the load among servers and accordingly scales the number of shards.
Using an accelerator by facebook removes the bottlenecks coming in computational requirements for AI application usage. But monitoring the accelerator is complicated due to the complex nature of the accelerator as compared to CPUs. Therefore Facebook comes with ASICmon that monitors the accelerator performance, detects faults, and repairs it. It also provides metrics to check the performance trends that make it easy to use accelerators at scale efficiently.