It’s time to stop firefighting. With Sumo Logic’s AWS Observability, companies like Snoop have been
able to simplify data collection, achieve unified visibility across AWS accounts and regions and leverage machine
learning to troubleshoot — fast.

This re:Invent, we’re excited to showcase how our capabilities for AWS have evolved. Offering a unified approach to
monitoring and troubleshooting for AWS, Sumo Logic lets DevOps and SRE teams improve the reliability of their services and cut
troubleshooting toil in just a few clicks.

Looking for lightning-speed troubleshooting? Here’s how Sumo Logic can help you find the root cause and reclaim your
time.

Your starting point: a unified view of your AWS environment

In the fast-paced world of e-commerce, timely order processing and inventory updates are crucial for maintaining
customer satisfaction. But what happens when an efficient, serverless architecture starts showing intermittent
delays?

Here the processing and inventory update system for our e-commerce site leverages Amazon
SQS for queuing orders, AWS Lambda for the core business logic, and Amazon RDS as the persistent data store. Customers are reporting experiencing
intermittent delays in placing orders and during checkout.

To understand what might be going wrong, you first need a centralized view of your AWS environment that brings
together your relevant logs and metrics. With AWS observability, you unlock a comprehensive view across your AWS
accounts, regions and individual namespaces. This content is provided out of the box after deploying the solution via
the CloudFormation template or Terraform.

Your starting point a unified view of your AWS environment

Detecting issues with pre-built alerts

AWS observability comes with pre-built alerts for different AWS services, including Amazon SQS, AWS Lambda, and
Amazon RDS. These alerts can notify you about the issue with the e-commerce site. In our example, the “Amazon SQS –
Message processing not fast enough” alert was triggered.

From the alert, you can determine the characteristic of the issue – if it triggers often, how long it has been
unresolved, and other relevant details. In addition, you can understand how long messages are waiting in the queue
before they are processed.

High-speed troubleshooting in action

Now, with this knowledge, the troubleshooting begins.

You start your investigation by diving into SQS, where messages from the Order Processing Service are queued.
CloudWatch metrics for SQS provide the first clues.

You observe that the NumberOfMessagesSent is much higher than NumberOfMessagesReceived,
indicating that messages are
being queued faster than they are being consumed. The ApproximateAgeOfOldestMessage metric shows that
some messages
have been in the queue for a long time, which could indicate a bottleneck.

Next, you turn your attention to AWS Lambda, responsible for processing SQS messages to update your inventory. Log
entries give evidence of prolonged function execution and timeouts, suggesting potential issues with the Lambda
function’s efficiency or resource allocation.

Here, Sumo Logic’s out-of-the-box dashboards for AWS Lambda error analysis indicate the following log entry.

Because the Lambda function interacts with an Amazon RDS instance, checking RDS would be your next step.

The RDS performance metrics show high CPU utilization and errors related to database locks.

Again, Sumo Logic’s out-of-the-box dashboards for Amazon RDS error log analysis help to locate particular log error messages
confirming the database issue.

2023-11-09T01:45:00Z [ERROR] Deadlock found when trying to get lock; 
try restarting transaction

A closer look into the RDS slow query logs analysis out of the box dashboard revealed sub-optimal
queries significantly dragging down performance.

# Query_time: 899.00 Lock_time: 0.594385 Rows_sent: 45 Rows_examined: 54392
SELECT * FROM inventory;

You can see that the culprit is a full table scan caused by a missing index.

By thoroughly examining each component of the serverless architecture, you can now address any delays. As the next
steps, you can adjust the Lambda function’s timeout settings and increase the memory allocation. Additionally, you can
add an index to the RDS instance to speed up the problematic query.

It’s time to reclaim your time

Without a unified view of your AWS environment, and the ability to pivot between services and centralized logging,
getting to the root cause of this issue may have been extremely difficult, if not impossible. You can learn more from our helpful guides:

Looking to reclaim your time? Get started today with AWS observability, which you can deploy in minutes via the
CloudFormation template or Terraform. Learn more and start your trial here.

BY SECURITY USE CASE

BY OBSERVABILITY USE CASE

BY INDUSTRY

BY COMPETITION

LEARN

ENGAGE

TRAIN

COMMUNITY

Lightning-fast troubleshooting for AWS: How to find the root cause fast with Sumo Logic

Table of contents

Your starting point: a unified view of your AWS environment

Detecting issues with pre-built alerts

High-speed troubleshooting in action

It’s time to reclaim your time

Ten new and updated apps for securing and monitoring your environments

Kubernetes vs Docker: How to choose the right container solution?

Lessons from the 2025 Security Operations Insights report

BY SECURITY USE CASE

BY OBSERVABILITY USE CASE

BY INDUSTRY

BY COMPETITION

LEARN

ENGAGE

TRAIN

COMMUNITY

Lightning-fast troubleshooting for AWS: How to find the root cause fast with Sumo Logic

Table of contents

Your starting point: a unified view of your AWS environment

Detecting issues with pre-built alerts

High-speed troubleshooting in action

It’s time to reclaim your time

People who read this also enjoyed

Ten new and updated apps for securing and monitoring your environments

Kubernetes vs Docker: How to choose the right container solution?

Lessons from the 2025 Security Operations Insights report