Back to blog results

April 10, 2019 By Ankit Goel

Improve Alert Visibility and Monitoring with Sumo Logic and Opsgenie

Dealing with IT outages and downtime is one of the biggest technical challenges of the modern era, costing North American businesses an estimated $700 billion per year. Today's world of interconnected cloud services and microservice architectures has created infinitely more opportunities for something to go wrong and disrupt service. When that happens, there's an urgent need to alert the right people or teams to fix things.

Opsgenie is an incident management service that empowers DevOps teams to plan for service disruptions and stay in control during incidents. Opsgenie ensures your services are always up and running, while your team never misses a critical service alert. Opsgenie tracks alerts, filters out the noise, and notifies you using multiple channels, providing the necessary information for your team to immediately begin resolution. Opsgenie also manages teams on-call schedules, escalations, routing rules and on-call reminder notifications

At Sumo Logic we’re focused on providing customers the ability to proactively detect issues before they occur and deliver end-to-end monitoring via our bi-directional integration with Opsgenie.

With the Sumo Logic integration for Opsgenie you can:

  • Visualize trends and patterns of alert resolution statistics over time and understand which teams and alert sources need the most improvement
  • Detect anomalies in alert creation and escalation events via Sumo Logic’s machine learning capabilities
  • Identify and prioritize events in the DevOps infrastructure that warrant alert creation and leverage Sumo Logic to automatically send these to Opsgenie for tracking and resolution

Integration Overview

Sumo Logic provides a bi-directional integration with Opsgenie via:

  1. The Opsgenie App available in the Sumo Logic App catalog, that collects and analyzes alert data from Opsgenie. For those interested you can access it on the Atlassian marketplace.
    .
  2. A webhook mechanism by which you can create alerts in Opsgenie based on analyzing logs, metrics and events machine data across your DevOps infrastructure.

The Opsgenie App

The Opsgenie App available in the Sumo Logic app catalog is designed to effectively monitor Opsgenie alerts, team performances, detect any outliers, and track the team’s mean time to repair (MTTR) incidents. With Sumo Logic dashboards you can easily identify:

  • Alerts by type over time
  • Alerts created - Outlier
  • Alerts escalated - Outlier
  • Alerts breakdown by team/priority/users/sources/tags
  • Alerts created/closed/escalated/acknowledged/escalated to next
  • Alerts - one day time comparison
  • Alerts MTTR with additional details

Installation

Installation is simple, and requires enabling Opsgenie Webhook Alerts into Sumo Logic. The Sumo Logic Opsgenie integration supports the following alert types:

  • Create
  • AddRecipient
  • Acknowledge
  • AddNote
  • UnAcknowledge
  • EscalateToNext
  • Escalate
  • Close

Once Opsgenie Alerts are configured to send data to Sumo Logic through webhooks, the Sumo Logic App can be installed. Simply navigate to the App Catalog in your Sumo Logic account and add the Opsgenie app to the library after providing the sources configured in the previous step.

For more details on app configuration, please see the instructions on Sumo Logic’s DocHub.

Sumo Logic Opsgenie App Dashboards

In any DevOps environment, there are often multiple teams responsible for various software services, and it gets further complicated with different on-call and escalation schedules.

The Opsgenie App provides at-a-glance views and detailed analytics for alerts on your DevOps environment, allowing you to effectively monitor incidents and respond in a timely fashion. It helps you track MTTR for service incidents, and team performances for resource allocation & planning.

Opsgenie Overview Dashboard

Let’s take a closer look at the Alerts Over Time panel.

The Alerts Over Time panel displays trends of different alert types. This helps you identify any spikes or unusual behavior in your DevOps environment, and allows you to identify the root cause by drilling further down and looking at related fields.

One Day Time Comparison of alerts also tells us if there is unusually high activity of alerts, which may be due to impacted services.

If we drill down to in the alert details this dashboard gives visibility into the summary and source of the incident to help the DevOps team debug the root cause of the issue. In addition, you can use panel filters to slice the information by source, priority, team and users.

The Opsgenie overview dashboard provides visibility into your infrastructure by monitoring key KPIs such as alerts created, closed, escalated, acknowledged, and escalated to next.

Drilling down to the queries of the panels will help in determining the exact resource list with the selected message.

The Sumo Logic Opsgenie App comes with pre packaged parameterized searches, which let you filter Opsgenie alerts by types.

Creating Automated Alerts in Opsgenie

With Sumo Logic, you can detect critical events in your DevOps infrastructure that can identify potential application or infrastructure outages using Sumo Logic’s scheduled search functionality and send these over for resolution and tracking in and automated manner to Opsgenie as alerts.

To get this setup, follow the steps below:

  1. Setup and configure a Sumo Logic webhook for Opsgenie by following these instructions
  2. Write a scheduled search to identify critical events. For instance, you may want to detect if you have an unusually high number of server errors in you Apache access logs using Sumo Logic’s outlier detection capabilities as shown below:
  3. Once this is done, you can then send these as alerts to Opsgenie via the webhook connection that you setup in step 1. These alerts will show up in Opsgenie as shown below.

Closed loop visibility into your DevOps Cycle

Sumo Logic can correlate data from Opsgenie and Jira to provide more enhanced visibility into your DevOps lifecycle. Sumo Logic can correlate Opsgenie and Jira alert data to provide on-call teams with real-time information from different systems. The flow of data across different DevOps tools is outlined by the following diagram:

Using data from both systems, Sumo Logic can provide visibility into all Jira issues created or escalated by Opsgenie as shown below:

Get started now!

The Sumo Logic App for Opsgenie monitors your entire DevOps infrastructure spanning hundreds of services and helps determine the appropriate corrective and preventative actions.

The bi-directional Sumo Logic integration with OpsGenie helps you proactively detect incidents before they occur and delivers end-to-end monitoring of the alert lifecycle.

To get started, check out the Sumo Logic Opsgenie App help doc and how to configure a webhook for Opsgenie. If you don’t yet have a Sumo Logic account, you can sign up for a free trial today.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Ankit Goel

Ankit Goel

Ankit Goel is Solutions Architect at Sumo Logic with 10+ years of experience in designing and architecting applications. He is passionate about Machine Learning and Big Data projects. Ankit graduated from Carnegie Mellon University with a masters degree in Information Systems.

More posts by Ankit Goel.

People who read this also enjoyed