Sign up for a live Kubernetes or DevSecOps demo

Click here

Resource Center

Browse our library of ebooks, solutions briefs, research reports, case studies, webinars and more.

Blog resources

18 of 22 results

Blog

How Doximity solved their high operational overhead of their Elastic stack with Sumo Logic

Blog

5 business reasons why every CIO should consider Kubernetes

Blog

Exploring Nordcloud’s Promise to Deliver 100 Percent Alert-Based Security Operations to Customers

Blog

Pokemon Co. International and Sumo Logic's Joint Journey to Build a Modern Day SOC

The world is changing. The way we do business, the way we communicate, and the way we secure the enterprise are all vastly different today than they were 20 years ago. This natural evolution of technology innovation is powered by the cloud, which has not only freed teams from on-premises security infrastructure, but has also provided them with the resources and agility needed to automate mundane tasks. The reality is that we have to automate in the enterprise if we are to remain relevant in an increasingly competitive digital world. Automation and security are a natural pairing, and when we think about the broader cybersecurity skills talent gap, we really should be thinking about how we can replace simple tasks through automation to make way for teams and security practitioners to be more innovative, focused and strategic. A Dynamic Duo That’s why Sumo Logic and our partner, The Pokemon Co. International, are all in on bringing together the tech and security innovations of today and using those tools and techniques to completely redefine how we do security operations, starting with creating a new model for how security operations center (SOC) should be structured and how it should function. So how exactly are we teaming up to build a modern day SOC, and what does it look like in terms of techniques, talent and tooling? We’ll get into that, and more, in this blog post. Three Pillars of the Modern Day SOC Adopt Military InfoSec Techniques The first pillar is all about mindset and adopting a new level of rigor and way of thinking for security. Both the Sumo Logic and Pokemon security teams are built on the backbone of a military technique called the OODA loop, which was originally coined by U.S. Air Force fighter pilot and Pentagon consultant of the late twentieth century, John Boyd. Boyd created the OODA loop to implement a change in military doctrine that focused on an air-to-air combat model. OODA stands for observe, orient, decide and act, and Boyd’s thinking was that if you followed this model and ensured that your OODA loop was faster than that of your adversary’s, then you’d win the conflict. Applying that to today’s modern security operations, all of the decisions made by your security leadership — whether it’s around the people, process or tools you’re using — should be aimed at reducing your OODA loop to a point where, when a situation happens, or when you’re preparing for a situation, you can easily follow the protocol to observe the behavior, orient yourself, make effective and efficient decisions, and then act upon those decisions. Sound familiar? This approach is almost identical to most current incident response and security protocols, because we live in an environment where every six, 12 or 24 months we’re seeing more tactics and techniques changing. That’s why the SOC of the future is going to be dependent on a security team’s ability to break down barriers and abandon older schools of thought for faster decision making models like the OODA loop. This model is also applicable across an organization to encourage teams to be more efficient and collaborative cross-departmentally, and to move faster and with greater confidence in order to achieve mutually beneficial business goals. Build and Maintain an Agile Team But it’s not enough to have the right processes in place. You also need the right people that are collectively and transparently working towards the same shared goal. Historically, security has been full of naysayers, but it’s time to shift our mindset to that of transparency and enablement, where security teams are plugged into other departments and are able to move forward with their programs as quickly and as securely as they can without creating bottlenecks. This dotted line approach is how Pokemon operates and it’s allowed the security team to share information horizontally, which empowers development, operations, finance and other cross-functional teams to also move forward in true DevSecOps spirit. One of the main reasons why this new and modern Sumo Logic security team structure has been successful is because it’s enabled each function — data protection/privacy, SOC, DevSecOps and federal — to work in unison not only with each other, but also cross-departmentally. In addition to knowing how to structure your security team, you also need to know what to look for when recruiting new talent. Here are three tips from Pokemon’s Director of Information Security and Data Protection Officer, John Visneski: Go Against the Grain. Unfortunately there are no purple security unicorns out there. Instead of finding the “ideal” security professional, go against the grain. Find people with the attitude and aptitude to succeed, regardless of direct security experience. The threat environment is changing rapidly, and burnout can happen fast, which is why it’s more important to have someone on in your team with those two qualities.Why? No one can know everything about security and sometimes you have to adapt and throw old rules and mindsets out the window. Prioritize an Operational Mindset. QAs and test engineers are good at automation and finding gaps in seams, very applicable to security. Best Security Engineers didn’t know a think about security before joining Pokemon, but he had a valuable skill set.Find talent pools that know how the sausage is made. Best and brightest security professionals didn’t even start out in security but their value add is that they are problem solvers first, and security pros secondary. Think Transparency. The goal is to get your security team to a point where they’re sharing information at a rapid enough pace and integrating themselves with the rest of the business. This allows for core functions to help solve each other’s problems and share use-cases, and it can only be successful if you create a culture that is open and transparent. The bottom line: Don’t be afraid to think outside of the box when it comes to recruiting talent. It’s more important to build a team based on want, desire and rigor, which is why bringing in folks with military experience has been vital to both Sumo Logic’s and Pokemon’s security strategies. Security skills can be learned. What delivers real value to a company are people that have a desire to be there, a thirst for knowledge and the capability to execute on the job. Build a Modern Day Security Stack Now that you have your process, and your people, you need your third pillar — tools sets. This is the Sumo Logic reference architecture that empowers us to be more secure and agile. You’ll notice that all of these providers are either born in the cloud or are open source. The Sumo Logic platform is at the core of this stack, but its these partnerships and tools that enable us to deliver our cloud-native machine data analytics as a service, and provide SIEM capabilities that easily prioritize and correlate sophisticated security threats in the most flexible way possible for our customers. We want to grow and transform with our own customer’s modern application stacks and cloud architectures as they digitally transform. Pokemon has a very similar approach to their security stack: The driving force behind Pokemon’s modern toolset is the move away from old school customer mentality of presenting a budget and asking for services. The customer-vendor relationship needs to mirror a two way partnership with mutually invested interests and clear benefits on both sides. Three vendors — AWS, CrowdStrike and Sumo Logic — comprise the core base of the Pokemon security platform, and the remainder of the stack is modular in nature. This plug and play model is key as the security and threat environments continue to evolve because it allows for flexibility in swapping in and out new vendors/tools as they come along. As long as the foundation of the platform is strong, the rest of the stack can evolve to match the current needs of the threat landscape. Our Ideal Model May Not Be Yours We’ve given you a peek inside the security kimono, but it’s important to remember that every organization is different, and what works for Pokemon or Sumo Logic may not work for every particular team dynamic. While you can use our respective approaches as a guide to implement your own modern day security operations, the biggest takeaway here is that you find a framework that is appropriate for your organization’s goals and that will help you build success and agility within your security team and across the business. The threat landscape is only going to grow more complex, technologies more advanced and attackers more sophisticated. If you truly want to stay ahead of those trends, then you’ve got to be progressive in how you think about your security stack, teams and operations. Because regardless of whether you’re an on-premises, hybrid or cloud environment, the industry and business are going to leave you no choice but to adopt a modern application stack whether you want to or not. Additional Resources Learn about Sumo Logic's security analytics capabilities in this short video. Hear how Sumo Logic has teamed up with HackerOne to take a DevSecOps approach to bug bounties in this SnapSecChat video. Learn how Pokemon leveraged Sumo Logic to manage its data privacy and GDPR compliance program and improve its security posture.

Blog

The 3 Phases Pitney Bowes Used to Migrate to AWS

Blog

Finding and Debugging Memory Leaks with Sumo

Memory leaks happen when programs allocate more memory than they return. Memory is beside Compute one of the critical assets of any computer system. If a machine runs out of memory, it cannot provide its service. In the worst case, the entire machine might crash and tear down all running programs. The bugs responsible for that misbehavior are often hard to find. Sumo’s collector enables monitoring memory consumption out of the box. Using some additional tooling, it is possible to collect fine-grained logs and metrics that accelerate finding and efficient debugging of memory leaks. Ready to get started? See all the ways the Sumo Logic platform helps monitor and troubleshoot—from a seamless ingestion of data, to cross-platform versatility, and more. You can even get started for free.Free Trial Memory Management and Memory Leaks Memory management is done on multiple levels: The Operating System (OS) keeps track of memory allocated by its program in kernel space. In user space, virtual machines like the JVM might implement their own memory management component. At its core, memory management follows a Producer-Consumer pattern. The OS or VM gives away (produces) chunks of memory whenever programs are requesting (consuming) memory. Since memory is a finite resource in any computer system, programs have to release the allocated memory that is then returned to the pool of available memory managed by the producer. For some applications, the programmer is responsible for releasing memory, in others like the JVM a thread called garbage collector will collect all objects that are used no more. A healthy system would run through this give-and-take in a perfect circle. In a bad system, the program fails to return unused memory. This happens for example if the programmer forgets to call the functionfree, or if some objects keep on being referenced from a global scope after usage. In that case, new operations will allocate more memory on top of the already allocated, but unused memory. This is misbehavior is called a memory leak. Depending on the size of the objects this can be as little as a few bytes, kilobytes, or even megabytes if the objects, for example, contain images. Based on the frequency the erroneous allocation is called, the free space fills up as quickly as a few microseconds or it could take months to exhaust the memory in a server. This long time-to-failure can make memory leaks very tricky to debug because it is hard to track an application running over a long period. Moreover, if the leak is just a few bytes this marginal amount gets lost in the noise of common allocation and release operations. The usual observation period might be too short to recognize a trend. This article describes a particularly interesting instance of a memory leak. This example uses the Akka actor framework, but for simplicity, you can think of an actor as an object. The specific operation in this example is downloading a file: An actor is instantiated when the user invokes a specific operation (download a file) The actor accumulates memory over its lifetime (keeps adding to the temporary file in memory) After the operation completes (file has been saved to disk), the actor is not released The root cause of the memory leak is that it can handle only one request and it is useless after saving the content of the file. There are no references to the actor in the application code, but there still is a parent-child relationship defined in the actor system that defines a global scope. From After-the-Fact Analysis to Online Memory Supervision Usually, when a program runs out of memory it terminates with an “Out of Memory” error or exception. In case of the JVM, it will create a heap dump on termination. A heap dump is an image of program’s memory at the termination instant and saved to disk. This heap dump file can then be analyzed using tools such as MemoryAnalyzer, YourKit, or VisualVM for the JVM. These tools are very helpful to identify which objects are consuming what memory. They operate, however, on a snapshot of the memory and cannot keep track of the evolution of the memory consumption. Verifying that a patch works is out of the scope of these tools. With a little scripting, we can remediate this and use Sumo to build an “Online Memory Supervisor” that stores and processes this information for us. In addition to keeping track of the memory consumption history of our application, it saves us from juggling around with heap dump files that can potentially become very large. Here’s how we do it: 1. Mechanism to interrogate JVM for current objects and their size The JVM provides an API for creating actual memory dumps during runtime, or just retrieve a histogram of all current objects and their approximate size in memory. We want to do the latter as this is much more lightweight. The jmap tool in the Java SDK makes this interface accessible from the command line: jmap -histo PID Getting the PID of the JVM is as easy as grepping for it in the process table. Note that in case the JVM runs as a server using an unprivileged user, we need to run the command as this user via su. A bash one-liner to dump the object histogram could look like: sudo su stream -c"jmap -histops -ax | grep "[0-9]* java" | awk '{print $1}' > /tmp/${HOSTID}_jmap-histo-`date +%s`.txt" 2. Turn result into metrics for Sumo or just drop it as logs As a result of the previous operation, we have now a file containing a table with object names, count, and retained memory. IN order to use it in Sumo we’ll need to submit it for ingestion. Here we got two options: (a) send the raw file as logs, or (b) convert the counts to metrics. Each object’s measurement is a part of a time series tracking the evolution of the object’s memory consumption. Sumo Metrics ingest various time series input formats, we’ll use Graphite because it’s simple. To affect the conversion of a jmap histogram to Graphite we use bash scripting. The script cuts beginning and end of the file and then parses the histogram to produce two measurements: <class name, object count, timestamp> <class name, retained size, timestamp> Sending these measurements to Sumo can be done either through Sumo’s collector, using collectd with Sumo plugin, or sending directly to the HTTP endpoint. For simplicity, we’ve used the Graphite format and target the Sumo collector. To be able to differentiate both measurements as well as different hosts we prepend this information to the classpath: <count|size>.<host>.classpath For example, a jmap histogram might contain data in tabular form like: 69: 18 1584 akka.actor.ActorCell 98: 15 720 akka.actor.RepointableActorRef 103: 21 672 akka.actor.ChildActorPath 104: 21 672 akka.actor.Props Our script turns that into Graphite format and adds some more hierarchy to the package name. In the next section, we will leverage this hierarchy to perform queries on objects counts and sizes. count.memleak1.akka.actor.ActorCell 18 123 count.memleak1.akka.actor.RepointableActorRef 15 123 count.memleak1.akka.actor.ChildActorPath 21 123 count.memleak1.akka.actor.Props 21 123 In our case, we’ll just forward these logs to the Sumo collector. Previously, we’ve defined a Graphite source for Metrics. Then, it’s as easy as cat histogram-in-graphite | nc -q0 localhost 2003. 3. Automate processing via Ansible and StackStorm So far we are now capable of creating a fine-grained measurement of an application’s memory consumption using a couple of shell commands and scripts. Using the DevOps automation tools Ansible and StackStorm, we can turn this manual workflow in an Online Memory Supervision System. Ansible helps us to automate taking the measurement of multiple hosts. For each individual host, it connects to the hosts via ssh, runs the jmap command, the python conversion script, and submits the measurement to Sumo. StackStorm manages this workflow for us. In a given period, it kicks off Ansible and logs the process. In case something goes wrong, it defines remediation steps. Of course, there are alternatives to the myriad of available tools. Ansible competes with SaltStack, Chef, and Puppet. StackStorm is event-driven automation with all bells and whistles, for this example, we could have used a shell script with sleepor a simple cron job. Using Sumo to Troubleshoot Memory Leaks Now it’s time to use Sumo to analyze our memory. In the previous steps, we have submitted and ingested our application’s fine-grained memory consumption data. After this preparation, we can leverage Sumo to query the data and build dashboards. Using queries, we can perform in-depth analysis. This is useful as part of a post-mortem analysis to track down a memory leak, or during development to check, if a memory allocation/deallocation scheme actually works. During runtime, dashboards could monitor critical components of the application. Let’s check this out on a live example. We’ll use a setup of three JVMs simulating an application and a StackStorm instance. Each is running in their own Docker container, simulating a distributed system. To make our lives easier, we orchestrate this demo setup using Vagrant: Figure 1: Memory leak demo setup and control flow A Memory Measurement node orchestrates the acquisition process. We’ve developed a short Ansible script that connects to several application nodes and retrieves a histogram dump from the JVMs running the faulty program from [1]. It converts the dumps to Graphite metrics and sends them via the collector to Sumo. StackStorm periodically triggers the Ansible workflow. Finally, we use the UI to find and debug memory leaks. Analyze memory consumption First, we want to get an overview of what’s going on in the memory. We start to look at the total memory consumption of a single host. A simple sum over all objects sizes yields the application’s memory consumption over time. The steeply increasing curve abruptly comes to an end at a total of about 800 Mb. This is the total memory that we dispatched to the JVM (java -Xmx800m -jar memleak-assembly-0.1.jar). Figure 2: Total memory consumption of host memleak3 Drilling down on top memory consumers often hints at the responsible classes for a memory leak. For that query, we parse out all objects and sum their counts and sizes. Then we display only the top 10 counts. In the size query, we filter out objects above a certain size. These objects are the root objects of the application and do not contain much information. Figure 3: Top memory consumers on a single node Figure 4: To memory top consumers by size We find out that a Red-Black Tree dominates the objects. Looking at the Scala manual tells us that HashMaps make extensive use of this data structure: Scala provides implementations of immutable sets and maps that use a red-black tree internally. Access them under the names TreeSet and TreeMap. We know that ActorSystem uses HashMaps to store and maintain actors. Parsing and aggregating queries help to monitor entire subsystems of a distributed application. We use that to find out that the ActorSystem accumulates memory not only on a single host but over a set of hosts. This leads us to believe that this increase might not be an individual error, by a systemic issue. Figure 5: Use query parsing and aggregation operations to display the ActorSystem’s memory consumption A more detailed view of the Child actor reveals the trend how it accumulates memory. The trick in this query is that in the search part we filter out the packages inakka.actor.* the search expression and then use the aggregation part to parse out the single hosts and sum the size values of their individual objects. Since all three JVMs started at the same time, their memory usage increases at a similar rate in this picture. We can also split this query into three separate queries like below. These are looking at how the Child actors on all three hosts are evolving. Figure 6: The bad Child actor accumulating memory Finally, we verify that the patch worked. The latest chart shows that allocation and deallocation are now in balance on all three hosts. Figure 7: Memory leak removed, all good now Memory Analysis for Modern Apps Traditional memory analyzers were born in the era of standalone, desktop applications. Therefore, they work on snapshots and heap dumps and cannot track the dynamicity of memory allocation and deallocation patterns. Moreover, they are also restricted to work on single images and it is not easy to adapt them to a distributed system. Modern Apps have different requirements. Digital Businesses provide service 24/7, scale out in the cloud, and compete in terms of feature velocity. To achieve feature velocity, detecting memory issues online is more useful than after-the-fact. Bugs such as memory leaks need rapid detection and bugfixes inserted frequently and without stopping services. Pulling heap dumps and starting memory analyzers just won’t work in many cases. Sumo takes memory analysis to the next level. Leveraging Sumo’s Metrics product we can track memory consumption for classes and objects within an application. We look at aggregations of their counts and sizes to pinpoint the fault. Memory leaks are often hard to find and need superior visibility into an application’s memory stack to become debuggable. Sumo achieves this not only for a single instance of an application but scales memory analysis across the cloud. Additionally, Sumo’s Unified Logs and Monitoring (ULM) enables correlating logs and metrics and facilitates understanding the root cause of a memory leak. Bottom Line In this post, we showed how to turn Sumo into a fine-grained, online memory supervision system using modern DevOps tools. The fun doesn’t stop here. The presented framework can be easily extended to include metrics for threads and other resources of an application. As a result of this integration, developers and operators gain high visibility in the execution of their application. References Always stop unused Akka actors – Blog Post Acquire object histograms from multiple hosts – Ansible Script Sumo’s Modern Apps report – BI Report

Blog

Customer Blog: OpenX

Blog

Designing a Data Analytics Strategy for Modern Apps

Yesterday at AWS re:Invent 2016, Sumo Logic Co-Founder and CTO Christian Beedgen presented his vision for machine data analytics in a world where modern apps are disrupting virtually every vertical market in business. Every business is a software business, Marc Andreessen wrote more than five years ago. Today, driven by customer demand, the need to differentiate and the push for agility, digital transformation initiatives are disrupting every industry. “We are still at very the beginning of this wave of Digital Transformation,” Christian said. “By 2020 half of all businesses will have figured out digitally enhanced products and services.” The result is that modern apps are being architected differently than they were just 3 years ago. Cloud applications are being built on microservices by DevOps teams that automate to deliver new functionality faster. “It used to be that you could take the architecture and put it on a piece of paper with a couple of boxes and a couple of arrows. Our application architecture was really clean.” But with this speed and agility comes complexity, and the need for visibility has become paramount. “Today our applications look like spaghetti. Building microservices, wiring them up, integrating them so they can work with something else, foundational services, SQL databases, NoSQL databases…” You need to be able to see what’s going on, because you can’t fix what you cannot see. Modern apps require Continuous Intelligence to provide insights, continuously and in real-time, across the entire application lifecycle. Designing Your Data Analytics Strategy Ben Newton, Sumo Logic’s Principal Product Manager of the Metrics team, took the stage to look at the various types of data and what you can do with them. Designing a data analytics strategy begins by understanding the data types that are produced by machine data, then focusing on the activities that data supports. The primary activities are Monitoring where you detect and notify (or alert), and Troubleshooting where you identify, diagnose, restore and resolve. “What we often find is that users can use that same data to do what we call App Intelligence – the same logs and metrics that allows you to figure out something is broken, also tells you what your users are doing. If you know what users are doing, you can make life better for them because that’s what really matters.” So who really cares about this data? When it comes to monitoring where the focus is on user-visible functionality, it’s your DevOps and traditional IT Ops teams. Engineering and development also are responsible for monitoring their code. In troubleshooting apps where the focus is on end-to-end visibility, customer success and technical support teams also become stakeholders. For app intelligence, the focus is on user activity and visibility everyone is a stakeholder including sales, marketing, and product management. “Once you have all of this data, all of these people are going to come knocking on your door,” said Ben. Once you understand the data types you have, where it is within your stack and the use cases, you can begin to use data to solve real problems. In defining what to monitor and measure, Ben highlighted: Monitor what’s important to your business and your users. Measure and monitor user visible metrics. Build fewer, higher impact, real-time monitors. “Once you get to troubleshooting side, it gets back to you can’t fix what you can’t measure.” Ben also said: You can’t improve what you can’t measure. You need both activity metrics and detailed logs. Up to date data drives better data-driven decisions. You need data from all parts of your stack. So what types of data will you be looking at? Ben broke it down to the following categories: Infrastructure Rollups vs. Detailed What resolution makes sense? Is real-time necessary? Platform Rollups vs. Detailed Coverage of all components Detailed logs for investigations Architecture in the metadata Custom How is your service measured? What frustrates users? How does the business measure itself? “Everything you have produces data. It’s important to ensure you have all of the components covered.” Once you have all of your data, it’s important to think about the metadata. Systems are complex and the way you make sense out of it is through your metadata. You use metadata to describe or tag your data. “For the customer, this is the code you wrote yourself. You are the only people that can figure out how to monitor that. So one of the things you have to think about is the metadata. ” Cloud Cruiser – A Case Study Cloud Cruiser’s Lead DevOps Engineer, Ben Abrams, took the stage to show how the company collects data and provide some tips on tagging it with metadata. Cloud Cruiser is a SaaS app that enables you to easily collect, meter, and understand your cloud spend in AWS, Azure, and GCP. Cloud Cruiser’s customers are large enterprises and mid-market players globally distributed across all verticals, and they manage hundreds of millions of cloud spend. Cloud Cruiser had been using an Elastic (Elasticsearch, Logstash, and Kibana) stack for their log management solution. They discovered that managing their own logging solution was costly and burdensome. Ben cited the following: Operational burden was a distraction to the core business. Improved security. Ability to scale + cost. Cloud Cruiser runs on AWS (300-500 instances) and utilizes microservices written in Java using the dropwizard framework. Their front-end web app runs on Tomcat and uses Angularjs. Figure 1 shows the breadth of the technology stack: In evaluating a replacement solution, Ben said “We were spending too much time on our ELK stack.” Sumo Logic’s Unified Logs and Metrics (ULM) was also a distinguishing factor. The inclusion of metrics meant that they didn’t have to employ yet another tool that would likewise have to be managed. “Logs are what you look at when something goes wrong. But Metrics are really cool.” Ben summarized the value and benefits they achieved this way: Logs Reduced operational burden. Reduced cost. Increased confidence in log integrity. Able to reduce the number of people needing VPN. Alerting based on searches did not need ops handholding. Metrics Increased visibility in system and application health. Used in an ongoing effort with application and infrastructure changes in that we were able to reduce our monthly AWS bill by over 100%. Ben then moved into a hands on session, showing how they automate the configuration and installation of Sumo Logic collectors, and how they tag their data using source categories. Cloud Cruiser currently collects data from the following sources: Chef: automation of config and collector install Application Graphite Metrics from Dropwizard Other graphite metrics forwarded by Sensu to Sumo Logic “When I search for something I want to know what environment is it, what type of log is it, and which server role did it come from.” One of their decisions was to differentiate log data from metrics data as shown below. Using this schema allows them to search logs and metrics by environment, type of log data and corresponding Chef role. Ben walked through the Chef Cookbook they used for deploying with Chef and shared how they automate the configuration and installation of Sumo Logic collectors. For those interested, I’ll follow on this up in the DevOps Blog. A key point from Ben, though, was “Don’t log secrets.” The access ID and key should be defined elsewhere, out of scope and stored in an encrypted data bag. Ben also walked through the searches they used to construct the following dashboard. Through this one dashboard, Cloud Cruiser can utilize both metrics and log data to get an overview of the health of their production deployment. Key Takeaways Designing your data analytics strategy is highly dependent on your architecture. Ultimately it’s about the experience you provide to your users. It’s no longer just about troubleshooting issues in production environments. It’s also about understanding the experience you provide to your users. The variety of data that streams in real time comes from the application, operating environment and network layers produces an ever increasing volume of data every day. Log analytics provides the forensic data you need, and time-series based metrics give you insights into the real-time changes taking place under the hood. To understand both the health of your deployment and the behavior/experience of your customers, you need to gather machine data from all of its sources, then apply both logs and metrics to give teams from engineering to marketing the insights they need. Download the slides and view the entire presentation below:

Blog

Customers Share their AWS Logging with Sumo Logic Use Cases

In June Sumo Dojo (our online community) launched a contest to learn more about how our customers are using Amazon Web Services like EC2, S3, ELB, and AWS Lambda. The Sumo Logic service is built on AWS and we have deep integration into Amazon Web Services. And as an AWS Technology Partner we’ve collaborated closely with AWS to build apps like the Sumo Logic App for Lambda. So we wanted to see how our customers are using Sumo Logic to do things like collecting logs from CloudWatch to gain visibility into their AWS applications. We thought you’d be interested in hearing how others are using AWS and Sumo Logic, too. So in this post I’ll share their stories along with announcing the contest winner. The contest narrowed down to two finalists – SmartThings, which is a Samsung company operates in the home automation industry and provides access to a wide range of connected devices to create smarter homes that enhance the comfort, convenience, security and energy management for the consumer. WHOmentors, Inc. our second finalist, is a publicly supported scientific, educational and charitable corporation, and fiscal sponsor of Teen Hackathon. The organization is, according to their site, “primarily engaged in interdisciplinary applied research to gain knowledge or understanding to determine the means by which a specific, recognized need may be met.” At stake was a DJI Phantom 3 Drone. All entrants were awarded a $10 Amazon gift card. AWS Logging Contest Rules The Drone winner was selected based on the following criteria: You have to be a user of Sumo Logic and AWS To enter the contest, a comment had to be placed on this thread in Sumo Dojo. The post could not be anonymous – you were required to log in to post and enter. Submissions closed August 15th. As noted in the Sumo Dojo posting, the winner would be selected based on our own editorial judgment and community reactions to the post (in the form of comments or “likes”) to select one that’s most interesting, useful and detailed. SmartThings SmartThings has been working on a feature to enable Over-the-air programming (OTA) firmware updates of Zigbee Devices on user’s home networks. For the uninitiated, Zigbee is an IEEE specification for a suite of high-level communication protocols used to create personal area networks with small, low-power digital radios. See the Zigbee Alliance for more information. According to one of the firmware engineers at SmartThings, there are a lot of edge cases and potential points of failure for an OTA update including: The Cloud Platform An end user’s hub The device itself Power failures RF inteference on the mesh network Disaster in this scenario would be a user’s device ending up in a broken state. As Vlad Shtibin related: “Our platform is deployed across multiple geographical regions, which are hosted on AWS. Within each region we support multiple shards, furthermore within each shard we run multiple application clusters. The bulk of the services involved in the firmware update are JVM based application servers that run on AWS EC2 instances. Our goal for monitoring was to be able to identify as many of these failure points as possible and implement a recovery strategy. Identifying these points is where Sumo Logic comes into the picture. We use a key-value logger with a specific key/value for each of these failure points as well as a correlation ID for each point of the flow. Using Sumo Logic, we are able to aggregate all of these logs by passing the correlation ID when we make calls between the systems. Using Sumo Logic, we are able to aggregate all of these logs by passing the correlation ID when we make calls between the systems. We then created a search query (eventually a dashboard) to view the flow of the firmware updates as they went from our cloud down to the device and back up to the cloud to acknowledge that the firmware was updated. This query parses the log messages to retrieve the correlation ID, hub, device, status, firmware versions, etc.. These values are then fed into a Sumo Logic transaction enabling us to easily view the state of a firmware update for any user in the system at a micro level and the overall health of all OTA updates on the macro level. Depending on which part of the infrastructure the OTA update failed, engineers are then able to dig in deeper into the specific EC2 instance that had a problem. Because our application servers produce logs at the WARN and ERROR level we can see if the update failed because of a timeout from the AWS ElasticCache service, or from a problem with a query on AWS RDS. Having quick access to logs across the cluster enables us to identify issues across our platform regardless of which AWS service we are using. As Vlad noted, This feature is still being tested and hasn’t been rolled out fully in PROD yet. “The big take away is that we are much more confident in our ability identify updates, triage them when they fail and ensure that the feature is working correctly because of Sumo Logic.” WHOmentors.com WHOmentors.com, Inc. is a nonprofit scientific research organization and the 501(c)(3) fiscal sponsor of Teen Hackathon. To facilitate their training to learn languages like Java, Python, and Node.js, each individual participate begins with the Alexa Skills Kit, a collection of self-service application program interfaces (APIs), tools, documentation and code samples that make it fast and easy for teens to add capabilities for use Alexa-enabled products such as the Echo, Tap, or Dot. According WHOmentors.com CEO, Rauhmel Fox, “The easiest way to build the cloud-based service for a custom Alexa skill is by using AWS Lambda, an AWS offering that runs inline or uploaded code only when it’s needed and scales automatically, so there is no need to provision or continuously run servers. With AWS Lambda, WHOmentors.com pays only for what it uses. The corporate account is charged based on the number of requests for created functions and the time the code executes. While the AWS Lambda free tier includes one million free requests per month and 400,000 gigabyte (GB)-seconds of compute time per month, it becomes a concern when the students create complex applications that tie Lambda to other expensive services or the size of their Lambda programs are too long. Ordinarily, someone would be assigned to use Amazon CloudWatch to monitor and troubleshoot the serverless system architecture and multiple applications using existing AWS system, application, and custom log files. Unfortunately, there isn’t a central dashboard to monitor all created Lambda functions. With the integration of a single Sumo Logic collector, WHOmentors.com can automatically route all Amazon CloudWatch logs to the Sumo Logic service for advanced analytics and real-time visualization using the Sumo Logic Lambda functions on Github.” Using the Sumo Logic Lambda Functions “Instead of a “pull data” model, the “Sumo Logic Lambda function” grabs files and sends them to Sumo Logic web application immediately. Their online log analysis tool offers reporting, dashboards, and alerting as well as the ability to run specific advanced queries as needed. The real-time log analysis combination of the “SumoLogic Lambda function” assists me to quickly catch and troubleshoot performance issues such as the request rate of concurrent executions that are either stream-based event sources, or event sources that aren’t stream-based, rather than having to wait hours to identify whether there was an issue. I am most concerned about AWS Lambda limits (i.e., code storage) that are fixed and cannot be changed at this time. By default, AWS Lambda limits the total concurrent executions across all functions within a given region to 100. Why? The default limit is a safety limit that protects the corporate from costs due to potential runaway or recursive functions during initial development and testing. As a result, I can quickly determine the performance of any Lambda function and clean up the corporate account by removing Lambda functions that are no longer used or figure out how to reduce the code size of the Lambda functions that should not be removed such as apps in production.” The biggest relief for Rauhmel is he is able to encourage the trainees to focus on coding their applications instead of pressuring them to worry about the logs associated with the Lambda functions they create. And the Winner of AWS Logging Contest is… Just as at the end of an epic World-Series battle between two MLB teams, you sometimes wish both could be declared winner. Alas, there can only be one. We looked closely at the use cases, which were very different from one another. Weighing factors like the breadth in the usage of the Sumo Logic and AWS platforms added to our drama. While SmartThings uses Sumo Logic broadly to troubleshoot and prevent failure points, WHOmentors.com use case is specific to AWS Lambda. But we couldn’t ignore the cause of helping teens learn to write code in popular programming languages, and building skills that may one day lead them to a job. Congratulations to WHOmentors.com. Your Drone is on its way!

Blog

Sending JMX Metrics to Sumo Logic Unified Logs and Metrics

This is a brief excerpt of how Mayvenn started sending JMX metrics to Sumo Logic Unified Logs and Metrics solution. For the full blog post please visit Mayvenn’s engineering blog. We’ve been using Sumo Logic for logs and were excited to have one tool and dashboard to visualize logs and metrics! In order to wire this up, we decided to use jmxtrans to regularly pipe the JMX metrics we query with Graphite formatted output to the new Sumo Logic collectors. These collectors can essentially be thought of a hosted version of Graphite. Step 1: Upgrade/Install Sumo Logic Collectors There’s a lot of guides out there on this one, but just in case you have existing collectors, they do need to be updated to have the new Graphite source. Step 2: Add a Graphite Source for the Collector This step can either be done in the Sumo Logic dashboard or through a local file for the collector that configures the sources. Either way, you will need to decide what port to run the collector on and whether to use TCP or UDP. For our purposes, the standard port of 2003 is sufficient and we don’t have an extremely high volume of metrics with network/CPU concerns to justify UDP. For configuring this source in the dashboard, the Sumo Logic guide to adding a Graphite source does a pretty thorough walkthrough. To summarize, though, the steps are pretty simple: go to to the collector management page, select the relevant collector, click add source, choose the Graphite source and configure it with the port and TCP/UDP choices. This method is certainly fast to just try out Sumo Logic metrics.

Blog

CI/CD, Docker and Microservices - by JFrog and Sumo Logic’s Top Developers

Blog

Confessions of a Sumo Dashboard Addict

When you think of Ulta Beauty, you most likely think of makeovers, not technology. But in fact, we’ve also been doing makeovers of our technology to bring together all the technical groups that touch our guests (application development, operations, e-commerce, plus our off-shore teams) under one organization to drive the best possible guest experience. For those of you who aren’t familiar with Ulta Beauty, we’re the fastest growing beauty retailer with both brick and mortar and an online presence. Back in 2014, we experienced some challenges with online guest order fulfillment during the holiday season (our busiest time of the year). Our business partners simply lacked the visibility into inventory levels during peak season. We identified the problem in advance, but due to time constraints weren’t able to resolve so we took a novel approach using machine data analytics for better visibility. We knew we needed a solution to streamline operations and proactively identify product trends. We selected Sumo Logic to help us get a real-time view of our order throughput so that we could manually bring down order levels if throughput went too high. In my role as VP of Guest-facing Services, I rely on deep technical knowledge and business sense to make sure our applications are running smoothly. Sumo Logic was easy enough for us to manage on our own. It’s flexible and simple but powerful, which enables me to ensure my business stakeholders are empowered with the information they need to be successful. Fast forward to holiday season 2015. We not only improved our backend systems but also expanded and operationalized Sumo Logic with our DevOps, App Dev and Business Partner teams. We created multiple dashboards and reports to identify hot selling items, what’s trending, inventory levels and more. This was huge for our business partners, who could then consciously make inventory business decisions on their own. The biggest impact of rolling out Sumo Logic has been the ability to impact the guest experience in a positive way and effectively manage the channel. I confess that I became a bit of a Sumo Logic dashboard addict. During the holiday season, if I was out and about, I would check the mobile view so frequently that I blew my cellular data plan. What’s next for Ulta and Sumo Logic? We’re expanding our use of Sumo and validating new business use cases for point-of-sale, network infrastructure and warehouse management systems. With Sumo’s assistance, we’re creating an enterprise application performance management roadmap that incorporates Sumo Logic’s machine data analytics to ensure the maximum reliability and stability of our business-critical systems. Now that’s a beautiful makeover!

Blog

Delivering Analytics Behind the Analytics at Dodge Data & Analytics

If you’re not a builder or architect, you may not be familiar with Dodge Data & Analytics. We help building product manufacturers, general contractors and subcontractors, architects and engineers to size markets, prioritize prospects, strengthen market positions and optimize sales strategies. Simply put, we build the analytics engine for the builder community. In our industry, it’s important that we deliver a consistent level of operational availability in order to best serve our clients. We didn’t have a solution in place for machine data analytics and needed a way to make better use of our logs and time-series metrics data to quickly surface and remediate known and unknown issues. Sumo Logic was a great choice for helping us monitor and manage our system’s operational metrics based on ease of deployment, maintenance and support, and powerful search queries. Sumo Logic’s machine data analytics platform allows our teams to accurately correlate meaningful information to provide root cause analysis and operational behavior patterns. Literally with one click of a button we have access to our data, giving us better real-time insights into our own infrastructure, so we can in turn serve up better insights for our customers. Sumo Logic calls this continuous intelligence and it’s our approach as well. Tighter organizational collaboration is also important to us. Sumo Logic is helping to bring together various teams within Dodge, including our IT operations team, DevOps, DBAs and incident managers. It provides single version of the truth for monitoring and troubleshooting so that we work better together and solve problems faster and isn’t that what it’s all about? About the bloggers: Doug Sheley is Director of Infrastructure Systems. Jay Yeras is a system administrator at Dodge Data & Analytics, unofficially known as “Doer of Things”.

Blog

Carsales Drives Confidently into the Cloud with Sumo Logic

I always love talking to customers and hearing how they’re using Sumo Logic to help solve challenges within their organizations, particularly those that in the middle of their journey to moving workloads and data to the cloud. Without fail, I’m always surprised to learn how hard the day-to-day was for IT teams, and how by taking a cloud-native approach to managing log data analytics, they’re able to open up a whole new world of insights and intelligence that really impacts their business. I recently spoke with one of our newest customers in the Asia Pacific region, carsales. One of Australia’s leading automotive classifieds website (think of the equivalent of CarFax, TraderOnline or CraigsList here in the U.S.), carsales services both consumers and more than 6,000 dealers across the country. As you can imagine, the company experiences a huge amount of website traffic and processes more than 450 million searches and over 12.5 billion image downloads. As a growing enterprise, carsales had long been looking to transition from a legacy data center to the cloud. Interestingly this journey became a priority when their executive team asked about their disaster recovery plan. “We originally started moving our infrastructure to the cloud because our site traffic varies greatly throughout the day – no one day is the same. The cloud is perfect for allowing us to adjust our footprint as necessary. It also made it easy for us to develop a solid disaster recovery plan without having to pay and manage separate data centers” said Michael Ridgway, director of engineering for Ryvuss at carsales.com. The carsales team quickly discovered that retrieving logs manually from machines wasn’t practical so they started looking for a log management solution. One of their non-negotiable requirements for this solution was to avoid managing any additional infrastructure or software. Since moving to Sumo Logic, the carsales team is now in the driver’s seat and has gained operational visibility across their entire infrastructure stack and obtained new insights into application health and performance “With Sumo Logic we’ve just scratched the surface. Our entire development team now has real-time access to our log applications and can see trending metrics over time. As a result, we can now put the power in the hands of the people who can actually fix the problem. Our average response times have decreased from hours to minutes and we can detect and resolve issues before they have potential to impact our customers.” For more information on how carsales is getting value from Sumo Logic check out the full case study.

Blog

Why Twitter Chose Sumo Logic to Address PCI Compliance

Blog

The Digital Universe and PCI Compliance – A Customer Story

According to IDC, the digital universe is growing at 40% a year, and will continue to grow well into the next decade. It is estimated that by 2020, the digital universe will contain nearly as many digital bits as there are stars in the universe. To put this into perspective, the data we create and copy annually will reach 44 zettabytes, or 44 trillion gigabytes. In 2014 alone, the digital universe will equal 1.7 megabytes a minute for every person on earth. That is a lot of data! As a new employee at Sumo Logic, I’ve had the opportunity to come in contact with a lot of people my first few weeks – employees, customers and partners. One interaction with a global, multi-billion dollar travel powerhouse really stood out for me, as they are a great example of an organization grappling with massive growth in an ever expanding digital universe. The Business The travel company provides a world-class product-bookings engine and delivers fully customized shopping experiences that build brand loyalty and drive incremental revenue. They company is also responsible for safeguarding the personal data and payment information of millions of customers. “Customer security and being compliant with PCI DSS is essential to our business” was echoed many times. The Challenge As a result of phenomenal growth in their business, the volume of ecommerce transactions and logs produced was skyrocketing, more than doubling from the previous year. The company was processing over 5 billion web requests per month, generating on average close to 50GB of daily log data across 250 production AWS EC2 instances. It became clear that an effective solution was required to enable the company to handle this volume of data more effectively. Current manual processes using Syslog and other monitoring tools were not manageable, searchable or scalable and it was very difficult to extract actionable intelligence. Additionally, this effort was extremely time intensive and would divert limited resources from focusing on more important areas of the business – driving innovation and competitive differentiation. PCI Compliance: The ability to track and monitor all access to network resources and cardholder data (PCI DSS Requirement 10) was of particular importance. This is not surprising as logging mechanisms and the ability to track user activities are critical in minimizing the impact of a data compromise. The presence and access to of log data across the AWS infrastructure is critical to provide necessary tracking, alerting and analysis when something goes wrong. The Solution While multiple solutions were considered – including Splunk, Loggly and ELK stack, the company selected Sumo Logic for its strong time to value, feature set, and low management overhead. Additionally, the security attestations, including PCI DSS 3.0 Service Provider Level 1, as well as data encryption controls for data at rest and in motion, were levels above what other companies provided. Being able to not worry about the execution environment – handled by Sumo Logic – and focus on extracting value from the service was extremely valuable. The Results The most important immediate benefits for the client included being able to reduce the time, cost and complexity of their PCI audit. They were also able to leverage the platform for IT Ops and Development use cases, reducing mean time to investigate (MTTI) and mean time to resolve (MTTR) by over 75%. As I was wrapping up our conversation, I asked if they had any “aha moments” in leveraging the Sumo Logic platform and dealing with this exponential growth in their digital universe. Their response was: “I’ve been really impressed with how fast the team has been able to identify and resolve problems. Sumo Logic’s solution has helped us change the playing field in ways that were just not possible before.” To learn more about Sumo Logic’s compliance & security solutions for AWS, please visit: http://www.sumologic.com/aws-trial To try Sumo Logic for free, please visit: http://www.sumologic.com/pricing

Blog

Monitoring SumoLogic Usage for Faster Issue Resolution

Guest Blog Post by Ethan Culler-Mayeno, TechOps Engineer at LogicMonitor On the LogicMonitor Technical Operations team, we love our logs. We know that logs are more than just “those files you need to delete to clear a disk space alert.” Logs provide patterns of normal behavior, and therefore make it easier to identify anomalies or changes in these patterns. Here at LogicMonitor, while our own product is the primary method used for monitoring performance and identifying issues, we also use logs as a tool to investigate (or better yet, prevent) issues. Recently, one of our servers managed to find its way into an email loop with an auto-responder. Unfortunately, in this game of digital chicken, it was our machine which reached its port 25 saturation point first. This email loop resulted in degraded performance for other applications running on that machine. Now you may be saying something like “Bummer, but just put in a rule to discard mail from the auto-responder and Bob’s your Uncle.” While that is certainly the correct way to address this issue, let’s take a step back – how would you identify the issue? In most troubleshooting scenarios, fixing the issue is easy. It’s finding the issue that is the hard (and often time consuming) part. How would you know that your machine was getting blown up by an email responder if you got an alert for HTTPS performance issues for a web application on that machine? Well now I am sure you’re guessing something along the lines of “the logs have the answer!” But mailserver logs are not usually the first place you look when investigating a web service performance issue…. So lets take a look at how we were able to use LogicMonitor to identify this issue. Our team uses SumoLogic as a log analysis tool. SumoLogic provides an excellent API for programmatically performing queries against logs, which allows our team to monitor and alert on subsets of our logs. We alert on specific events and exceptions, but we also use a groovy-based LogicModule (created by our engineers) that uses SumoLogic’s API to monitor the rate of log messages being written per device. Below is a graph for that datasource that shows the total number of log entries written for the server that was hit by the aforementioned mail loop. Because we were trending the number of log messages, as soon as we started looking at the performance of that server in LogicMonitor, it was very clear that we needed to investigate the logged messages in SumoLogic for details of the issue – which immediately led us to the mail loop, and a quick resolution. Monitoring your logs at a high level can fill in the pieces that content based log monitoring might miss. In many cases logs do not contain content that would cause a watcher (or monitoring solution) to bat an eye. However, when a device is logging 30x as many messages per minute as normal, it’s pretty safe to say that there is something wrong. You can download the SumoLogic LogicModule we used – SumoLogic_Logs_Per_Host – from our core repository, by selecting “Settings..Datasources…Add.. From LogicMonitor Repository” from within LogicMonitor. (Some more information about it is available on this help page.) You can also easily modify it to track, graph and alert on other data from SumoLogic. Let us know if you have other cool ways to tie into SumoLogic, or other logging systems, too!

Blog

How Trek10 Uses Sumo Logic to Monitor Amazon Container Service

Guest Blog Post by Andrew Warzon, Founder at Trek10 and Jared Short, Director of DevOps at Trek10 You’ve probably heard of Docker by now – maybe you’ve toyed with Dockerizing your app or maybe you’re even running Docker in staging or production (We are!). For us, Docker means we can focus on building solid infrastructure and monitoring without having to worry about managing all sorts of things, like application specific dependencies, bootstrapping scripts, baking AMIs, or environment management. Docker also enables high fidelity parity from dev through to production, since everything runs in the same Docker containers. No more “but it works on my machine” bugs, or nasty dependency surprises between staging and production. With the explosive growth of Docker, it is no surprise that AWS announced the EC2 Container Service (ECS) in November 2014. ECS entered General Availability in April 2015. At Trek10, we have been running apps in ECS since May. ECS enables you to take your lovingly Dockerized applications and processes and distribute them across a cluster. It handles tasks like load balancing, rolling deploys, service/application updates, healing, and more. Only because of Docker & ECS can we confidently say, “Running eight different applications on a cluster is no different than running one or two applications.” Powerful stuff! As great as Docker & ECS seem to be, one of the biggest hurdles we faced was logging in a reliable, easy-to-manage way that let Docker & ECS do what they do best, with minimal engineering, configuration, and resource overhead. Unfortunately, collecting logs from a container has no accepted, perfect solution. There are a lot of options out there… Our primary goal is simplicity; we just want the logging problem to “get out of the way” so we can push ahead with building value. We’ve opted for a relatively simple solution: installing the Sumo collector on the ECS host and using mounted volumes. For the impatient, here is the quick summary of how we make this work: 1. Install the Sumo agent unattended on the ECS host with user data 2. Make sure your sumosources.json file points to a new directory like /tmp/logs which you will map into your containers 3. Make sure your logs inside the container are written to some directory like /tmp/clogs 4. Use the ECS task definition to map /tmp/clogs to /tmp/logs/mycontainer Here is some more detail on each step: Step 1: Write a script to install Sumo on the host unattended. We will run this script in the EC2 user data. User data is EC2’s way to let you run scripts upon launching an instance. In this way, we can customize the host without maintaining our own AMIs; we simply add this script to the existing ECS AMI. There are many ways to accomplish this, but our script includes the following: Copy some configs out of an S3 bucket including Sumo access keys and a Sumo sources JSON file. Create /etc/sumo.conf and /etc/sumosources.json on the host machine. Actually install the Sumo collector The key here is the sumosources.json file. Here is ours: { "api.version": "v1", "sources": [ { "sourceType" : "LocalFile", "name" : "${CLIENT_NAME}_${ECS_CLUSTER}-ecs_apps", "pathExpression" : "${LOGDIR}/**", "category": "${CLIENT_NAME}_${ECS_CLUSTER}-ecs", "hostName": "${CLIENT_NAME}_${INSTANCE_ID}", "useAutolineMatching": false, "multilineProcessingEnabled": ${MULTILINE}, "manualPrefixRegexp": "${APP_REGEX}", "timeZone": "UTC", "automaticDateParsing": true, "forceTimeZone": false, "defaultDateFormat": "MMM dd HH:mm:ss" }, { "sourceType" : "LocalFile", "name" : "${CLIENT_NAME}_${ECS_CLUSTER}-ecs_messages", "pathExpression" : "/var/log/messages", "category": "${CLIENT_NAME}_${ECS_CLUSTER}-ecs", "hostName": "${CLIENT_NAME}_${INSTANCE_ID}", "useAutolineMatching": false, "multilineProcessingEnabled": false, "timeZone": "UTC", "automaticDateParsing": true, "forceTimeZone": false, "defaultDateFormat": "MMM dd HH:mm:ss" }, { "sourceType" : "LocalFile", "name" : "${CLIENT_NAME}_${ECS_CLUSTER}-ecs_secure", "pathExpression" : "/var/log/secure", "category": "${CLIENT_NAME}_${ECS_CLUSTER}-ecs", "hostName": "${CLIENT_NAME}_${INSTANCE_ID}", "useAutolineMatching": false, "multilineProcessingEnabled": false, "timeZone": "UTC", "automaticDateParsing": true, "forceTimeZone": false, "defaultDateFormat": "MMM dd HH:mm:ss" } ] } Note line 7, pathExpression… this is the key. We define $LOGDIR to be some path on the host instance where we will later put our logs. This config just says to push anything in this directory into Sumo. Step 2: Pick some directory inside the container where your logs will exist. How you accomplish this will vary significantly based on your application. We point ours to a separate directory inside the container, /tmp/clogs. One key tip here: if whatever you are doing is different than how you would usually run this container, use the ECS Task Definition “Command” to override the default command for your container. Step 3: Mount your volumes with the ECS Task Definition. Here we are basically telling ECS to map all of the log files from inside the container (/tmp/clogs in our case) to outside the container where Sumo will be looking for log files as defined in sumosources.json In the ECS task definition, this is done with two pieces. First, you must define a Volume. This is the path on the host that will now be available to be mapped to containers. Here is where to edit this in the AWS Management Console “Task Definition Builder” GUI: One key note here: Make sure that this source path is a subdirectory of $LOGDIR as defined in sumosources.json, and that subdirectory is unique for each container you define across all task definitions in your cluster. This way, any given host can have an arbitrary number of containers and an arbitrary number of tasks running on it and Sumo will get all of the logs and keep them separate. The second piece of the task definition required is the “mount points” section of each container defined in your task definition. Use the volume name defined above, and map it to the log path inside the container. Below is how this looks in the Task Definition Builder: If you prefer to write the JSON for the Task Definition directly, here is a generic Task Definition with these two pieces: { "family": "my-container", "containerDefinitions": [ { "name": "MyContainer", "image": "python", "cpu": 400, "memory": 800, "entryPoint": [], "environment": [ { "name": "MY_VAR", "value": "foo" } ], "command": [], "portMappings": [ { "hostPort": 8080, "containerPort": 8080 } ], "volumesFrom": [], "links": [], "mountPoints": [ { "sourceVolume": "logs", "containerPath": "/tmp/clogs", "readOnly": false } ], "essential": true } ], "volumes": [ { "name": "logs", "host": { "sourcePath": "/tmp/logs/mycontainer" } } ] } So that’s it… a simple, low-maintenance, and flexible way to get all of your logs from ECS-run Docker containers into Sumo Logic. Good luck!