AIOps in IT Operations, Application Performance, and Event Management

Gartner invented the term AIOps to refer broadly to how AI (Artificial Intelligence) and ML (Machine Learning) will be infused into every aspect of how modern IT Operations, software and hardware infrastructure, applications, transactions, microservices, and business services will be managed. AIOps will also play a crucial role in helping enterprises manage all of the new software driven services that are coming from their Digitalization and Digital Transformation initiatives. This leads to AIOps playing a role in the following areas:

  • Application Performance Management (APM) – APM tools have streams of metrics about the performance of the microservices, transactions, and applications that they monitor that measure the performance (response time), throughput (amount of work done per unit of time), and error rate of these microservices, transactions and applications. APM vendors are incorporating AIOps into their products in order for the AI to learn what is the normal state of each metric for each monitoring object, and then to automate the process of detecting anomalies in these metrics. Their job is to then automate to the greatest extent possible the process of determining where in the monitored code the cause of the problem lies.
  • Infrastructure Monitoring or IT Operations Management – With the death of the ITOM suites from IBM, BMC, HP and CA, the emergence of open source tools for infrastructure monitoring, the breakup of infrastructure monitoring into many point tools, and the emergence of virtualization (VMware) and cloud platforms (Amazon AWS, Microsoft Azure, and the Google Cloud Platform), managing the availability and performance of the software and hardware infrastructure has become significantly more difficult and complicated. AIOps is expected to help these tools cope with the deluge of metrics that come from the hardware and software infrastructure layer and help operators of the environment automatically find anomalies and prioritize them.
  • Event Management – Event Management refers to software that consumes all of the events and alarms in the environment, deduplicates them, prioritizes them and then facilitates the resolution of the event by the appropriate teams. Legacy event management systems like IBM NetCool were rule based and fell into disfavor because in a rapidly changing environment, it was impossible to keep the rules up to date. Modern Event Management systems use AIOps to automate the process of sorting and prioritizing the events.
  • Digitalization and Digital Transformation Digitization and Digital Transformation mean that many new software based business services are being put in production, and that each of them are now being evolved (changed) more frequently than legacy online applications. These new applications tend to be built around microservice based architectures which means that there are many more things to monitor. The rate of change in these new microservices means that they must be monitored more frequently. The combination of the explosion in the number of things to be monitored with the increased frequency creates a real time big data problem that AIOps is uniquely positioned to handle.
  • AIOps Platforms – In addition to AIOps being infused into every existing category of monitoring and management solution, Gartner is projecting that a new category of monitoring and management solution will emerge – the AIOps platform. The AIOps platform will consume log, metrics, events, alarms and relationships from all of the existing platforms and tools and then apply the benefits of AIOps across this consolidated and related set of data.

Why is an AIOps Platform Needed?

For most enterprises the modern software and hardware infrastructure environment has never been more complex nor as dynamic. Today the question is not whether to do cloud, but how many and which clouds (Amazon AWS, Microsoft Azure, and Google GCP) are to be added to the existing on-premise private cloud which is usually based upon VMware vSphere.

Complex Multi-Cloud Environments

Multi_Cloud_Architecture (1)

The complexity of just the hardware and software infrastructure in most environments defies monitoring by a single vendor. This is a primary driver of the need for an AIOps platform.

The second driver of the need for an AIOps platform is the pace of innovation in the software stack supporting all of the new business services that are being put into production and the resulting diversity in this stack.

In response to the need for business and technical agility, applications are being architected around a microservices model, the process to deliver code into production is being streamlined around CI/CD a very diverse set of languages, middleware components and database components are being used to facilitate developer productivity and time to market.

The rate of change and the rate of dynamic behavior in the resulting hardware and software stack is the second major driver of the need for an AIOps platform.

Innovation and Diversity at Every Layer of the Stack

Innovation_and_Dynamic_Behavior_in_Digitalization (1)

Introduction to the AIOps Platform

The most important part of an AIOps platform is that its value is not just to the teams whose tools and platforms feed metrics, events and relationships to the AIOps platform. Its value is to all of these teams, and the business constituents whose business results rely upon the operation, availability and performance of digitally enabled business services. The image below provides Gartner’s overview of how and AIOps platform fits with the existing categories of tools.

AIOps_Overview

The role of the AIOps Platform consuming and creating value out of the different types of data (logs, events, metrics and relationships), and the opportunity for the AIOps platform to create new value for enterprises that the component tools and platforms cannot create on their own is highlighted in the image below.

AIOps_Platform

Business Benefits of an AIOps Platform

If you are a typical enterprise you already have between 20 and 200 different monitoring and management tools. Why do you need another one (an AIOps Platform) on top of what you already have? The AIOps Platform will offer you the following unique benefits that are not available from the point monitoring tools and platforms that feed it:

  • An AIOps Platform combines multiple applications and their supporting software and hardware stacks into Business Services that support the operations of business constituents who are responsible for the revenue, market share, and customer satisfaction of these critical online services.
  • An AIOps Platform organizes the hardware and software components that support digitalization and digital transformation initiatives into business service views that are relevant to the business owners and Product Managers of these digital initiatives.
  • An AIOps Platform creates and manages the service levels of each of these critical business services
  • An AIOps Platform applies AIOps (AI and ML) across all of the metrics, logs and events across the entire stack of software and hardware that supports each critical business service
  • The AI in the AIOps Platform automatically learns the normal state of each critical business service and the normal underlying behavior of the support hardware and software services, and automatically flags anomalies.
  • The AIOps Platform has a real-time understanding of how each business service is composed which dramatically facilitates root cause analysis when issues occur.

Summary

Modern business services are composed of new applications, existing applications, custom developed applications and purchased applications. The software and hardware infrastructure for these new business services is updated frequently and operates in a dynamic manner. This creates a new imperative to be able to monitor the resulting business services in a continuous and full-stack manner across the entire stack. The AIOps platform is uniquely suited to meet this new need.