Four Step Approach to AI-Ops for Greater Cloud Success
April 27, 2020 / Suzanne Taylor
Our Cloud Success BarometerTM study shows that the level of an organization’s cloud success is directly tied to whether they have adopted an advanced, core commitment to cloud itself. In other words, if they don’t make cloud a priority, projects fail. From my experience, two key obstacles often present themselves: poor cloud commitment, and too many manual operations. In the first case, the client has very few cloud champions who are proponents for cloud transformation. At such sites, cloud initiatives languish over time due to lack of enthusiasm and fail to deliver. However, Operations is often overlooked as another “cloud-busting” culprit—yet it, too, can stall and even halt cloud innovation.
How? After cloud is deployed, it takes a different set of tools and processes to effectively manage operations—but the proper data center tools and procedures may be lacking in the newly deployed cloud. One recent client comes to mind. After a migration of an Azure workload, testing provided a multitude of incident tickets—to be expected after initial startup. Getting to the root causes of these startup incidents takes experience with migration and cloud operations—something that is in short supply at many organizations. Our CloudForte® Center of Excellence cloud experts were able to quickly resolve most issues quickly, knowing just what to expect and look for in this routine migration. Most client sites do not have such expertise, however. And faced with recurring batches of incidents, they may think twice about attempting another migration. We saved them hours and hours of time, hassle, and potential frustration by quickly resolving the bulk of the incident tickets.
At Unisys, we are automating this entire process—using tools and artificial intelligence for improving operations. We have found that many incident tickets can be routinely identified and automatically resolved using AI, reserving high priority and more complex incidents for manual, human intervention. In essence, “packaging” our expertise into automated systems. In most cases, on a daily basis, the majority of commonly found incidence tickets can and should be resolved using AI and automation, greatly reducing ticket counts and time-to-resolution. This predictive, proactive approach to operations saves time, money—and builds “cloud morale.”
We’re working on a four-step approach for our AI-Ops:
1) Collect operations data, such as our “new migration” experience
2) Feed that data into the machine learning (ML) engine
3) Drive specific outcomes from the ML, then
4) Feed the data into governance.
In this way, a great percentage of incidents can be resolved through AI and automation, reserving more important incidents for human intervention. Meanwhile, incidents are resolved more quickly, and human error is reduced. By streamlining and automating operations with predictive, proactive AI, organizations have one less obstacle in the way of their continued cloud success.