---
title: "Lightning-fast troubleshooting for AWS: how to find the root cause fast with Sumo Logic"
page_name: "Lightning-fast troubleshooting for AWS: How to find the root cause fast with Sumo Logic"
type: "blog"
slug: "aws-observability-fast-troubleshooting"
published_at: "2023-11-27"
modified_at: "2025-12-08"
url: "https://www.sumologic.com/blog/aws-observability-fast-troubleshooting"
canonical: "https://www.sumologic.com/blog/aws-observability-fast-troubleshooting"
markdown_url: "https://www.sumologic.com/blog/aws-observability-fast-troubleshooting.md"
lang: "en"
excerpt: "AWS Observability offers a unified approach to monitoring and troubleshooting, improving reliability and cutting troubleshooting toil in clicks."
taxonomy_blog_category:
  - "Cloud SIEM"
  - "DevOps &amp; IT Operations"
  - "SecOps &amp; Security"
---

[ All blogs ](https://www.sumologic.com/blog "blog")[Cloud SIEM](https://www.sumologic.com/blog/cloud-siem), [DevOps &amp; IT Operations](https://www.sumologic.com/blog/devops-it-operations), [SecOps &amp; Security](https://www.sumologic.com/blog/secops-security)

# Lightning-fast troubleshooting for AWS: How to find the root cause fast with Sumo Logic

[Michael Riordan](#blog-author-block-231)[Greg Ziemiecki](#blog-author-block-141)

November 27, 2023

3 min read 

[Cloud SIEM](https://www.sumologic.com/blog/cloud-siem), [DevOps &amp; IT Operations](https://www.sumologic.com/blog/devops-it-operations), [SecOps &amp; Security](https://www.sumologic.com/blog/secops-security)

##### Table of contents

 

 

 

It’s time to stop firefighting. With Sumo Logic’s AWS Observability, companies like [Snoop](https://aws.amazon.com/partners/success/snoop-sumo-logic/) have been
 able to simplify data collection, achieve unified visibility across AWS accounts and regions and leverage machine
 learning to troubleshoot — fast.

This re:Invent, we’re excited to showcase how our capabilities for AWS have evolved. Offering a unified approach to
 monitoring and troubleshooting for AWS, Sumo Logic lets [DevOps](https://www.sumologic.com/glossary/devops/) and SRE teams improve the reliability of their services and cut
 troubleshooting toil in just a few clicks.

Looking for lightning-speed troubleshooting? Here’s how Sumo Logic can help you find the root cause and reclaim your
 time.

## Your starting point: a unified view of your AWS environment

In the fast-paced world of e-commerce, timely order processing and inventory updates are crucial for maintaining
 customer satisfaction. But what happens when an efficient, serverless architecture starts showing intermittent
 delays?

Here the processing and inventory update system for our e-commerce site leverages [Amazon
 SQS](https://www.sumologic.com/blog/cloud-messaging-and-collaboration/) for queuing orders, [AWS Lambda](https://www.sumologic.com/blog/lambda-extensions/) for the core business logic, and [Amazon RDS](https://www.sumologic.com/glossary/aws-rds/) as the persistent data store. Customers are reporting experiencing
 intermittent delays in placing orders and during checkout.

To understand what might be going wrong, you first need a centralized view of your AWS environment that brings
 together your relevant logs and metrics. With AWS observability, you unlock a comprehensive view across your AWS
 accounts, regions and individual namespaces. This content is provided out of the box after deploying the solution via
 the CloudFormation template or Terraform.

## Detecting issues with pre-built alerts

AWS observability comes with pre-built alerts for different AWS services, including Amazon SQS, AWS Lambda, and
 Amazon RDS. These alerts can notify you about the issue with the e-commerce site. In our example, the “Amazon SQS –
 Message processing not fast enough” alert was triggered.

From the alert, you can determine the characteristic of the issue – if it triggers often, how long it has been
 unresolved, and other relevant details. In addition, you can understand how long messages are waiting in the queue
 before they are processed.

## High-speed troubleshooting in action

Now, with this knowledge, the troubleshooting begins.

You start your investigation by diving into SQS, where messages from the Order Processing Service are queued.
 CloudWatch metrics for SQS provide the first clues.

You observe that the `NumberOfMessagesSent` is much higher than `NumberOfMessagesReceived`,
 indicating that messages are
 being queued faster than they are being consumed. The `ApproximateAgeOfOldestMessage` metric shows that
 some messages
 have been in the queue for a long time, which could indicate a bottleneck.

Next, you turn your attention to AWS Lambda, responsible for processing SQS messages to update your inventory. Log
 entries give evidence of prolonged function execution and timeouts, suggesting potential issues with the Lambda
 function’s efficiency or resource allocation.

Here, Sumo Logic’s out-of-the-box dashboards for [AWS Lambda error analysis](https://help.sumologic.com/docs/observability/aws/integrations/aws-lambda/#3-aws-lambda---error-analysis) indicate the following log entry.

Because the Lambda function interacts with an Amazon RDS instance, checking RDS would be your next step.

The RDS performance metrics show high CPU utilization and errors related to database locks.

Again, Sumo Logic’s out-of-the-box dashboards for [Amazon RDS error log analysis](https://help.sumologic.com/docs/observability/aws/integrations/amazon-rds/#07-amazon-rds---mysql-logs---error-logs-analysis) help to locate particular log error messages
 confirming the database issue.

```
2023-11-09T01:45:00Z [ERROR] Deadlock found when trying to get lock; 
try restarting transaction
```

A closer look into the [RDS slow query logs analysis](https://help.sumologic.com/docs/observability/aws/integrations/amazon-rds/#08-amazon-rds---mysql-logs---slow-query-analysis) out of the box dashboard revealed sub-optimal
 queries significantly dragging down performance.

```
# Query_time: 899.00 Lock_time: 0.594385 Rows_sent: 45 Rows_examined: 54392
SELECT * FROM inventory;
  
```

You can see that the culprit is a full table scan caused by a missing index.

By thoroughly examining each component of the serverless architecture, you can now address any delays. As the next steps, you can adjust the Lambda function’s timeout settings and increase the memory allocation. Additionally, you can add an index to the RDS instance to speed up the problematic query.

### AWS logs for monitoring and troubleshooting

  

## It’s time to reclaim your time

Without a unified view of your AWS environment, and the ability to pivot between services and centralized logging,
 getting to the root cause of this issue may have been extremely difficult, if not impossible. You can learn more from our helpful guides:

- [Log management](https://www.sumologic.com/guides/log-management-process/)
- [Artificial intelligence for log analytics](https://www.sumologic.com/guides/machine-data-analytics/)
- [Log analytics](https://www.sumologic.com/guides/log-analytics/)

Looking to reclaim your time? Get started today with AWS observability, which you can deploy in minutes via the
 CloudFormation template or Terraform. Learn more and start your trial [here](https://www.sumologic.com/solutions/aws-monitoring/).

### Article Tags

- [Cloud SIEM](https://www.sumologic.com/blog/cloud-siem)
- [DevOps &amp; IT Operations](https://www.sumologic.com/blog/devops-it-operations)
- [SecOps &amp; Security](https://www.sumologic.com/blog/secops-security)

Michael Riordan

Senior Product Marketing Manager

Michael is a member of the Observability Product Marketing team at Sumo Logic. Before Sumo Logic, he worked as a PMM at forward-looking technology companies Axon and Fastly. When he’s not working with sellers and product managers, Michael enjoys watching reality TV and collecting vintage clothes.

Greg Ziemiecki

Senior Technical Product Manager

Greg is a Product Manager with about 15 years of experience in IT, half of it being in a Product Management role. His experience includes SIEM, monitoring, and Network Management products in the Telco space. At Sumo Logic, he’s working in the observability area looking at how to provide best-in-class out-of-the-box insights into our customers’ cloud infrastructure.

When not working, he enjoys motorcycle riding, hiking, and playing chess.

[](https://www.sumologic.com/feed "RSS Feed")[](https://twitter.com/intent/tweet?text=Lightning-fast%20troubleshooting%20for%20AWS%3A%20How%20to%20find%20the%20root%20cause%20fast%20with%20Sumo%20Logic&url=https%3A%2F%2Fwww.sumologic.com%2Fblog%2Faws-observability-fast-troubleshooting "X")[](https://www.facebook.com/sharer/sharer.php?u=https%3A%2F%2Fwww.sumologic.com%2Fblog%2Faws-observability-fast-troubleshooting "Facebook")[](https://www.linkedin.com/sharing/share-offsite/?url=https%3A%2F%2Fwww.sumologic.com%2Fblog%2Faws-observability-fast-troubleshooting "Linkedin")

[Previous blog

The future of Sumo Logic begins at the atomic level of logs](https://www.sumologic.com/blog/future-sumo-logic-atomic-level-logs)[Next blog

Enhanced Linux visibility with Sumo Logic](https://www.sumologic.com/blog/enhanced-linux-visibility-with-sumo-logic)

People who read this also enjoyed

[  

Before you replace your SIEM: AI-driven security requires operational context, not just centralized data

May 21, 2026

 

 ](https://www.sumologic.com/blog/before-you-replace-your-siem)[  

Closing the AI compliance and visibility gap: Integrate the Claude Compliance API with Sumo Logic

May 21, 2026

 

 ](https://www.sumologic.com/blog/sumo-logic-claude-compliance-api-integration)[  

How to secure cloud workloads without building a full-scale SOC

April 30, 2026

 

 ](https://www.sumologic.com/blog/secure-cloud-workloads-with-limited-resources)[  

Observability is security (We just pretended it wasn’t)

April 28, 2026

 ](https://www.sumologic.com/blog/observability-is-security)

[AI Instructions](https://www.sumologic.com/ai-instructions.md)
