INTEGRATE
Integration Mesh

Connect all your business tools into one platform, including Slack, AWS, SAP, and 400+ others

DESIGN
No-code Smart Workflow Studio

Drag and drop from 100+ pre-build automation and decision units in minutes to get a unique workflow for your business automation

Developer Workbench

Bring your custom automation ideas to life with a powerful toolkit for developers

ACT
Autoremediation and Predictions

Use smart behavioral analytics and decision units to prevent incidents before they impact your business

Smartbots

Improve interactions in your teams with in-time notifications and information exchange

ANALYZE
Bot Analytics

Illuminate important business processes and get insights for smart management decisions

Top 6 Machine Learning Use Cases of Automation in ITOps

Oksana Riabichko
Oksana Riabichko

Oksana is highly driven Head of Operations with exstansive experience in Management including Business Strategy, Operations and Consulting. Have proven hands-on record of solving technical, management and strategic issues. An excellent and extremely personable communicator proficient at tailoring information to meet the needs of the customer.

February 2, 2021

As we are leaving the tumultuous 2020 with all its challenges behind, are you already automating IT processes in your enterprise? 

Whether you have just heard about automation of IT operations, making first steps in automation or already made significant progress in this direction in your company and looking how you could enhance it even more – you’re in the right place. Botprise is a platform for enterprise automation with Low Code or No Code.

This post is the first in a series of posts about HyperAutomation and what solutions for it we provide at Botprise. Here we will go through 3 specific use cases for AIOps automation and how you could easily implement them within the Botprise platform. Within these 3 use cases we show how to apply 6 different ML techniques to automate your operations:

  1. ML enhanced SLI monitoring
    1. Time Series Forecast
  2. Automated Smart Ticketing 
    1. Tickets Classification
    2. Tickets Clustering and Deduplication
    3. Named Entity Recognition
    4. Workflow recommendation
    5. Workflow generation
  3. ML enhanced Root Cause Identification
    1. Time Series Anomaly detection
    2. ML for RCA

So, let’s get started! 🤖

Introduction. Smart Workflow Studio

Let me first introduce Smart Workflow studio, if you’re not familiar with it yet:

This is a drag-and-drop tool for building your automation workflows within the enterprise. It has some basic objects such as data connectors, branching etc., but main blocks are Automation Units (AU) and Decision Units (DU). You could use pre-defined AUs and DUs or build your own. Basically, AU is a piece of code that does some predefined automation step (by Botprise of by user) , while DU is some intelligent unit backed by AI/ML models. Main goal of DUs – to provide intelligent steps in the automation workflow, where human expertise was needed before.

Let’s start with actual use-cases.

ML enhanced Service Level Indicators monitoring

SLA violation is a pain point for many enterprises in the presence of outages. To manage SLA non violation it is possible to add intelligence via Botprise platform to proactive system remediation workflows. The same logic could be applied to cloud cost monitoring workflows and many more. What is needed – set up data ingestion block, connect it to created Time Series DU and assign remediation AU, that would be triggered upon DU forecasts SLA violation.

Automated Smart Ticketing

Monitoring and ITSM tools generate a lot of alarms, and enterprise suddenly could fail into an alarm storm after some system failure. It’s not easy for a small team of support engineers to comprehend all these alarms coming from massive infrastructure. Service Desk automation helps to overcome this. Here we will describe several levels of Service Desk automation and how to implement it within our platform.

1st automation level

The simplest automation that user could do – just have some workflows triggered on specified event. For example – restart service if service is not running. It’s pretty straightforward to implement this on the platform – just connect monitoring tool or agent to remediation AU.

2nd automation level (classification and deduplication)

Next, users could bring intelligence to the workflow – deduplicate tickets and classify them. Based on classification result – assign one of predefined workflows. Below is how to implement such workflow on our platform.

3rd automation level (workflow customization) (Named Entities Recognition)

Further automation could be achieved by adding NER DU to the workflow. User could specify workflows that depend on data in tickets: what service is not running, what server sent the error, what VM failed, what specific type of issue happened and other. NER DU will extract this information and feed it into AU as parameters.

4th level of automation (workflow recommendation)

The next step in workflow automation is not to only assign remediation step but rather assign the whole workflow based on recommendations of DU. Having a list of hundreds of possible workflows and enough training data minimize human interactions with Bots even more. 

5th level of automation (automatic remediation workflow generation) 

The highest level of automation could be achieved if DU dynamically generates graph that represents remediation workflow. This is achievable by a complex feature engineering and specific workflow representation space. To see how we do workflow self-healing and self-reconfiguration, stay tuned for next posts.

Variants of classification DUs

Within the platform users have several option to obtain ticket classification DU:

  • Use pre trained DUs from Botprise
  • Do supervised learning, if user has data and labels
  • Do unsupervised learning if user doesn’t have label and interactively create a categories

User also has several options for algorithms – from simple TF-IDF to powerful BERT models. In the one of next posts we will dive deeper into how we do NLP on our platform, so stay tuned!

ML enhanced Root Cause Identification

Even with great fault prevention strategies, breakdowns are inevitable. Unfortunately, modern IT systems generate so much data that it becomes very difficult if possible at all for humans to navigate through it. Here comes in play correlation and ML supported RCI engines that analyse massive amounts of data, correlate events and build a causal graph. With assistance of these algorithms, engineers could significantly reduce time and efforts to detect the actual cause of the problem.

With Smart Workflows we simplify building such a pipeline to nearly drag and drop exercise. What is needed – specify an input data (logs from Linux, Microsoft Server, Cisco, Juniper routers, monitoring tools), anomaly detection algorithms and RCI algorithms. When you create such a pipeline – you’re done – all data preprocessing and feature engineering is done automatically under the hood. As an output DU provides a casual graph and suggested root causes. There is a separate page for each RCA DU in “Model Monitoring” tab.

Summary

Here we overviewed several examples of adding intelligent automation to ITOps and how to accomplish it with Smart Workflow Studio on Botprise Platform.

To check out new ML based hyperautomation use-cases and algorithms details stay tuned – we will share it in new blog posts!