Sign up for a live Kubernetes or DevSecOps demo

Click here

Posts by

Blog

How to View Logs in Kubectl

Blog

All The Logs For All The Intelligence

Blog

Vagrant vs. Docker: Which Is Better for Software Development?

Blog

NGINX vs Apache - Which Web Server Is Right for You?

Blog

Sumo Logic and Amazon Web Services Continue to Help Businesses Thrive in the Cloud Era

Blog

The New Sumo Logic AWS Security Quick Start

Blog

New Sumo Logic Apps with support for AWS Hierarchies

Blog

Announcing Sumo Logic Archive Intelligence Service now in Beta

Blog

Monitor Cloud Run for Anthos with Sumo Logic

Blog

How to Monitor Redshift Logs with Sumo Logic

Blog

How to Monitor AWS CloudTrail Logs with Sumo Logic

Blog

AWS S3 Monitoring with Sumo Logic

Blog

Top 10 best practices of Cloud SIEM

Blog

Multi-Cloud is Finally Here!

Blog

Data Privacy Is Our Birthright - national cybersecurity month

Blog

How to Monitor AWS S3

Blog

Context is Everything - How SPS Commerce uses context to embrace complexity

Blog

What is AWS S3

Blog

Kubernetes vs. Docker: What Does It Really Mean?

Blog

How Informatica Confidently Migrates to Kubernetes with Sumo Logic

Blog

How Doximity solved their high operational overhead of their Elastic stack with Sumo Logic

Blog

5 business reasons why every CIO should consider Kubernetes

Blog

Kubernetes DevSecOps with Sumo Logic

Blog

How to Monitor Amazon Redshift

Blog

5 Tips for Preventing Ransomware Attacks

Blog

How to Collect Kubernetes Data

Blog

We Live in an Intelligence Economy - Illuminate 2019 recap

Blog

Cloud Scale Correlation and Investigation with Cloud SIEM

Blog

Service Levels––I Want To Buy A Vowel

Blog

Serverless Computing for Dummies: AWS vs. Azure vs. GCP

Blog

Why Traditional Kubernetes Monitoring Solutions Fail

Blog

How to Secure Kubernetes Using Cloud SIEM?

Blog

Serverless Computing Security Tips

Blog

10 Modern SIEM Use Cases

Blog

Challenges of Monitoring and Troubleshooting in Kubernetes Environments

Blog

More Innovations from Sumo Logic that Harnesses the Power of Continuous Intelligence for Modern Enterprises

Blog

Monitoring Slack workspaces with the Sumo Logic app for Slack

Blog

A 360 degree view of the performance, health and security of MongoDB Atlas

Blog

Reimagining Observability: Announcing Sumo Logic’s new Kubernetes Solution

Blog

Monitor your Google Anthos clusters with the Sumo Logic Istio app 

Blog

Helping solve the Kubernetes challenge: Sumo Logic at the helm

Blog

Sumo Logic’s World Class Partner and Channel Ecosystem Experiences Triple Digit Growth

Blog

6 Observations from the 2019 CI Report: State of Modern Applications and DevSecOps In The Cloud

Blog

What is PCI DSS compliance?

Blog

Objectives-Driven Observability

Blog

Peering Inside the Container: How to Work with Docker Logs

Blog

Security Strategies for Mitigating IoT Botnet Threats

Blog

How to Read, Search, and Analyze AWS CloudTrail Logs

Blog

Serverless vs. Containers: What’s the Same, What’s Different?

Blog

How to Monitor Syslog Data with Sumo Logic

Blog

Know Your Logs: IIS vs. Apache vs. NGINX Logs

Blog

Multi-Cloud Security Myths

Blog

What is Amazon Redshift?

Blog

See You in September at Illuminate!

Blog

Sumo Logic adds Netskope to its Security and Compliance Arsenal

Blog

How to SIEMplify through Cloud SIEM

Blog

Illuminate 2019 Stellar Speaker Line-up Will Help Attendees See Business and the World Differently Through Data Analytics

Blog

How to Monitor Fastly CDN Logs with Sumo Logic

Blog

How to Monitor NGINX Logs with Sumo Logic

Blog

To SIEM or not to SIEM?

Blog

Cloud Security: What It Is and Why It’s Different

Blog

How to Monitor Fastly Performance

Blog

Gartner is fully in the cloud. Are you?

Blog

How to monitor NGINX logs

Blog

Why you need to secure your AWS infrastructure and workloads?

Blog

What is AWS CloudTrail?

Blog

6 steps to secure your workflows in AWS

Blog

Machine Data is Business Intelligence for Digital Companies

Blog

Launching the AWS security threats benchmark

Blog

3 key takeaways on Cloud SIEM from Gartner Security & Risk Management Conference 2019

Blog

Sumo Logic provides real-time visibility, investigation and response of G Suite Alerts

Blog

What is NGINX?

Blog

What is Fastly CDN?

Blog

Industry Analysts Recognizing Cloud Analytics Brings Wave of Disruption to the SIEM Market

Blog

Now FedRAMP Ready, Sumo Logic Empowers Public Organizations

Blog

What is Database Security?

Blog

The Super Bowl of the Cloud

Blog

The Cloud SIEM market is validated by Microsoft, Google, and AWS

Blog

Clearing the Air: What Is Cloud Native?

Blog

Key Metrics to Baseline Cloud Migration

Blog

Typing a useReducer React hook in TypeScript

Blog

What is IoT Security?

Blog

Recycling is for Cardboard, not Analytics Tools

Blog

How to Monitor Apache Web Server Performance

Blog

Where to Find IIS Log Files

Blog

Software visibility is the key to innovation

Blog

What is Apache Web Server? In-Depth Overview

Blog

The Why Behind Modern Architectures

Blog

Control Your Data Flow with Ingest Budgets

Blog

From SRE to QE - Full Visibility for the Modern Application in both Production and Development

Blog

Sumo Logic Cert Jams Come to Japan

Blog

Best Practices with AWS GuardDuty for Security and Compliance

Blog

People-driven Documentation

Blog

Improve Alert Visibility and Monitoring with Sumo Logic and Opsgenie

Blog

What is AWS GuardDuty?

Blog

Platforms All The Way Up & Down

Blog

What is Serverless Architecture?

Blog

How Sumo Logic Maps DevOps Topologies

Blog

Endpoint Security Analytics with Sumo Logic and Carbon Black

Blog

RSAC 19 Partner Cam: Sumo Logic & PagerDuty Deliver Seamless SecOps

Blog

AWS 101: An Overview of Amazon Web Services

Blog

Building Cross-platform Mobile Apps

The way we all experience and interact with apps, devices, and data is changing dramatically. End users demand that apps are responsive, stable, and offer the same user experience no matter which platform they are using. To build these well, many developers consider creating cross-platform apps. Although building a separate native app per platform is a preferred approach for mass market consumer apps, there are still a lot of situations where it makes more sense to go cross-platform. In this post I’ll look at the most popular strategies a developer faces when it comes to building a mobile app, and some tools that help you to build well. Mobile Web Apps This is probably the most easiest way onto a mobile device. Mobile web apps are hosted on a remote server and built using identical technologies to desktop web apps: HTML5, JavaScript and CSS. The primary difference is that it will be accessed via mobile device’s built-in web browser, which may require you to apply responsive web design principles to ensure that the user experience is not degraded by the limited screen size on mobile and that would be costly to build and maintain. The cost of applying responsive design principles to a web site may be a significant fraction of developing a mobile app. Native Mobile Apps Native apps are mainly developed using the device’s out-of-the-box SDK. This is a huge advantage as you have full access to the device’s API, features, and inter-app integration. However it also means you need to learn Java to build Apps for Android, Objective-C for iOS, and C# for Windows phones. Whether you are a single developer or working with a company and multi team skills, learning to code in multiple languages is costly and time-consuming. And most of the time, all features will not be available on every platform. Cross-Platform Mobile Apps Cross-platform apps have somewhat of a reputation of not being competitive against native apps, but we continue to see more and more world class apps using this strategy. Developers only have to maintain a single code base for all platforms. They can reuse the same components within different platforms, and most importantly, developers can still access the native API via native modules. Below are some tools that support building cross-platform apps: PhoneGap Owned by Adobe, PhoneGap is a free resource and handy to translate HTML5, CSS and JavaScript code. Once the app is ready, the community will help in reviewing the app and it is supported all major platforms including BlackBerry. Xamarin.Forms With a free starter option, Xamarin.Forms is great tool for C# and Ruby developers to build an app cross-platform with the option of having access to native platform’s API. The a wide store of component to help achieve the goal faster. Xamarin has created a robust cross platform mobile development platform that’s been adopted by big names like Microsoft, Foursquare, IBM, and Dow Jones. Unity 3D This tool is mainly focused on building game apps, and very useful when graphics is most important detail in it. This cross platform mobile development tool goes beyond simple translation. After developing your code in UnityScriptor or C#, you can export your games to 17 different platforms, including iOS, Android, Windows, Web, Playstation, Xbox, Wii and Linux. When it comes to building an app, whether cross-platform or not, views and thoughts always differ. My preference is cross-platform for one main reason; it is less time-consuming - that is critical because I can then focus on adding new features to the app, or building another one. About the Author Mohamed Hasni is a Software Engineer focusing on end-to-end web and mobile development and delivery. He has deep experience in building line of business applications for large-scale enterprise deployments.

Blog

Sumo Logic Expands into Japan to Support Growing Cloud Adoption

In October of last year, I joined Sumo Logic to lead sales and go-to-market functions with the goal of successfully launching our newly established Japan region in Tokyo. The launch was highly received by our customers, partners, prospects and peers in the Japanese market and everyone walked away from the event optimistic about the future and hungry for more! It certainly was an exciting time not only for the company, but for me, personally, and as I reflect over the past few months here, I wanted to share a little bit about why the company’s launch in Japan came at a very strategic and opportune time as well as why Sumo Logic is a great market fit. Market Opportunity In terms of overall IT spend and market size, Japan remains the second largest market in the world behind U.S. in enterprise technology. A large part of that is because of service spending versus traditional hardware and software. For years, Japan had been a half step or more behind in cloud technology innovation and adoption, but that has since changed. Now, Japan is experiencing a tsunami of cloud adoption with major influence from Amazon Web Services (AWS) who has aggressively invested in building data centers in Japan the past several years. The fact that AWS began heavily investing in the Japanese technology market was a strong indication to us at Sumo Logic that as we continue to expand our global footprint, the time was finally right to capitalize on this market opportunity. Sumo Logic Opportunity However, market opportunity aside, the nature of our SaaS machine data analytics platform and the services we provide across operations, security and the business, was a perfect fit for the needs of innovating Japanese enterprises. I’ve been here in Tokyo for over 30 years so I feel (with confidence) that it was our moment to shine in Japan. From a sales perspective, we’re very successful with a land and expand approach where we start with only a small subset of the business, and then gradually, we grow to other areas, operations, security and business as we continue to deliver great customer experiences that demonstrate long term value and impact. That level of trust building and attentiveness we provide to our global customer base is very typical of how Japanese enterprises like to conduct business. In other words, the core business model and approach of Sumo Logic are immediately applicable to the Japanese market. Anyone with experience in global IT will understand the simple, but powerful meaning of this; Sumo Logic`s native affinity with Japan is an enormous anomaly. And, Japan can be a very unforgiving market. It’s not a place where you want to come with a half-baked products or a smoke and mirrors approach. Solid products, solutions and hard work are, on the other hand, highly respected. Vertical Focus As I’ve mentioned above, the Japan market is mostly enterprise, which is a sweet spot for Sumo Logic, and there’s also a heavy influence of automotive and internet of things (IoT) companies here. In fact, four of the world’s largest automotive companies are headquartered in Japan and their emerging autonomous driving needs, in particular, align directly with the real-time monitoring, troubleshooting and security analytics capabilities that are crucial for modern innovations around connected cars and IoT, which both generate massive amounts of data. Customers like Samsung SmartThings, Sharp and Panasonic leverage our platform for DevSecOps teams that want to visualize, build, run and secure that data. The connected car today has become less about the engine and more about the driver experience, which is 100 percent internet-enabled. Japan is also one of the two major cryptocurrency exchange centers in the world, which is why financial services, especially fintech, bitcoin and cryptocurrencies companies, is another focus vertical for Sumo Logic Japan. Our DevSecOps approach and cloud-native multi-tenant platform provides massive mission-critical operations and security analytics capabilities for crypto companies. Most of these financial services companies are struggling to stay on top of increasingly stringent regulatory and data requirements, and one of the biggest use cases for these industries is compliance monitoring. Japan is regulatory purgatory, and so our customers look for us to help automate parts of their compliance checks and security audits. Strong Partner Ecosystem Having a strong partner ecosystem was another very important piece of our overall GTM strategy in Japan. We were very fortunate to have forged an early partnership with AWS Japan that led to an introduction to one of their premium consulting partners, Classmethod, the first regional partnership with AWS APN. The partnership is already starting to help Japanese customers maximize their investment in the Sumo Logic platform by providing local guided deployment, support and storage in AWS. In addition, Sumo Logic provides the backbone for Classmethod’s AWS infrastructure to provide the continuous intelligence needed to serve and expand their portfolio of customers. Going forward, we’ll continue to grow our partner ecosystem with the addition of service providers, telecoms, MSPs and MSSPs for security to meet our customer’s evolving needs. Trusted Advisor At the end of the day, our mission is to help our customers continue to innovate and provide support in areas where they most need it — economically visualizing data across their modern application stacks and cloud infrastructures. We’re in a position to help all kinds of Japanese customers across varying industries modernize their architectures. Japanese customers know that they need to move to the cloud and continue to adopt modern technologies. We’re in the business of empowering our customers to focus on their core competencies while they leave the data piece component to us. By centralizing all of this disparate data into one platform, they can better understand their business operations, be more strategic and focus on growing their business. We’ve gone beyond selling a service to becoming both “data steward” as well as trusted data advisors for our customers. Japanese business is famous for its organic partnering model — think of supply chain management, and so on. Sumo Logic’s core strategy of pioneering machine data stewardship is a perfect extension of this to meet the rapidly evolving needs of the digital economy in Japan. Now that we have a local presence with a ground office and support team, we can deliver a better and more comprehensive experience to new and existing customers, like Gree and OGIS-RI, and look forward to continued growth and success in this important global market. Additional Resources Read the press release for more on Sumo Logic's expansion into Japan Download the white paper on cloud migration Download the ‘State of Modern Apps & DevSecOps in the Cloud’ report

AWS

January 31, 2019

Blog

The Key Message from KubeCon NA 2018: Prometheus is King

Blog

Recapping the Top 3 Talks on Futuristic Machine Learning at Scale By the Bay 2018

As discussed in our previous post, we recently had the opportunity to present some interesting challenges and proposed directions for data science and machine learning (ML) at the 2018 Scale By the Bay conference. While the excellent talks and panels at the conference were too numerous to cover here, I wanted to briefly summarize three talks in particular that I found to represent some really interesting (to me) directions for ML on the java virtual machine (JVM). Talk 1: High-performance Functional Bayesian Inference in Scala By Avi Bryant (Stripe) | Full Video Available Here Probabilistic programming lies at the intersection of machine learning and programming languages, where the user directly defines a probabilistic model of their data. This formal representation has the advantage of neatly separating conceptual model specification from the mechanics of inference and estimation, with the intention that this separation will make modeling more accessible to subject matter experts while allowing researchers and engineers to focus on optimizing the underlying infrastructure. (image source) Rainier in an open-source library in Scala that allows the user to define their model and do inference in terms of monadic APIs over distributions and random variables. Some key design decisions are that Rainier is “pure JVM” (ie, no FFI) for ease of deployment, and that the library targets single-machine (ie, not distributed) use cases but achieves high performance via the nifty technical trick of inlining training data directly into dynamically generated JVM bytecode using ASM. Talk 2: Structured Deep Learning with Probabilistic Neural Programs By Jayant Krishnamurthy (Semantic Machines) | Full Video Available Here Machine learning examples and tutorials often focus on relatively simple output spaces: Is an email spam or not? Binary outputs: Yes/No, 1/0, +/-, … What is the expected sale price of a home? Numerical outputs – $1M, $2M, $5M, … (this is the Bay Area, after all!) However, what happens when we want our model to output a more richly structured object? Say that we want to convert a natural language description of an arithmetic formula into a formal binary tree representation that can then be evaluated, for example “three times four minus one” would map to the binary expression tree “(- (* 3 4) 1)”. The associated combinatorial explosion in the size of the output space makes “brute-force” enumeration and scoring infeasible. The key idea of this approach is to define the model outputs in terms of a probabilistic program (which allows us to concisely define structured outputs), but with the probability distributions of the random variables in that program being parameterized in terms of neural networks (which are very expressive and can be efficiently trained). This talk consisted mostly of live-coding, using an open-source Scala implementation which implements a monadic API for a function from neural network weights to a probability distribution over outputs. Talk 3: Towards Typesafe Deep Learning in Scala By Tongfei Chen (Johns Hopkins University) | Full Video Available Here (image source) For a variety of reasons, the most popular deep learning libraries such as TensorFlow & PyTorch are primarily oriented around the Python programming language. Code using these libraries consists primarily of various transformation or processing steps applied to n-dimensional arrays (ndarrays). It can be easy to accidentally introduce bugs by confusing which of the n axes you intended to aggregate over, mis-matching the dimensionalities of two ndarrays you are combining, and so on. These errors will occur at run time, and can be painful to debug. This talk proposes a collection of techniques for catching these issues at compile time via type safety in Scala, and walks through an example implementation in an open-source library. The mechanics of the approach are largely based on typelevel programming constructs and ideas from the shapeless library, although you don’t need to be a shapeless wizard yourself to simply use the library, and the corresponding paper demonstrates how some famously opaque compiler error messages can be made more meaningful for end-users of the library. Conclusion Aside from being great, well-delivered talks, several factors made these presentations particularly interesting to me. First, all three had associated open-source Scala libraries. There is of course no substitute for actual code when it comes to exploring the implementation details and trying out the approach on your own test data sets. Second, these talks shared a common theme of using the type system and API design to supply a higher-level mechanism for specifying modeling choices and program behaviors. This can both make end-user code easier to understand as well as unlock opportunities for having the underlying machinery automatically do work on your behalf in terms of error-checking and optimization. Finally, all three talks illustrated some interesting connections between statistical machine learning and functional programming patterns, which I found interesting as a longer-term direction for trying to build practical machine learning systems. Additional Resources Learn how to analyze Killer Queen game data with machine learning and data science with Sumo Logic Notebooks Interested in working with the Sumo Logic engineering team? We’re hiring! Check out our open positions here Sign up for a free trial of Sumo Logic

Blog

Sumo Logic Experts Reveal Their Top Enterprise Tech and Security Predictions for 2019

Blog

SnapSecChat: Sumo Logic's CSO Explains the Next-Gen SOC Imperative

Blog

How to Analyze Game Data from Killer Queen Using Machine Learning with Sumo Logic Notebooks

Blog

The Insider’s Guide to Sumo Cert Jams

What are Sumo Cert Jams? Sumo Logic Cert Jams are one and two-day training events held in major cities all over the world to help you ramp up your product knowledge, improve your skills and walk away with a certification confirming your product mastery. We started doing Cert Jams about a year ago to help educate our users around the world on what Sumo can really do and give you a chance to network and share use cases with other Sumo Logic users. Not to mention, you get a t-shirt. So far, we’ve had over 4,700 certifications from 2,700+ unique users across 650+ organizations worldwide. And we only launched the Sumo Cert Jam program in April! If you’re still undecided, check out this short video where our very own Mario Sanchez, Director of the Sumo Logic Learn team, shares why you should get the credit and recognition you deserve! Currently there are four certifications for Sumo Logic: Pro User Power User Power Admin Security User And these are offered in a choose-your-own-adventure format. While everyone starts out with the Pro User certification to learn the fundamentals, you can take any of the remaining exams depending on your interest in DevOps (Power User), Security, or Admin. Once you complete Sumo Pro User, you can choose your own path to Certification success. For a more detailed breakdown on the different certification levels, check out our web page, or our Top Reasons to Get Sumo Certified blog. What’s the Value? Often customers ask me in one-on-one situations what is the value of certification, and I tell them that we have seen significant gains in user understanding, operator usage and search performance once we get users certified. Our first Cert Jam in Delhi, India with members from the Bed, Bath and Beyond team showing their certification swag! First, there’s the ability to rise above “Mere Mortals” (those who haven’t been certified) and write better and more complex queries. From parsing to correlation, there’s a significant increase by certified users taking Pro (Level 1), Power User (Level 2), Admin (Level 3) and Security. Certified users are taking advantage of more Sumo Logic features, not only getting more value out of their investment, but also creating more efficient/performant queries. And from a more general perspective, once you know how to write better queries and dashboards, you can create the kind of custom content that you want. When it comes to monitoring and alerting, certified users are more likely to create dashboards and alerts to stay on top of what’s important to their organizations, further benefiting from Sumo Logic as a part of their daily workload. Here we can see that certified users show an increase in the creation of searches, dashboards and alerts, as well as key optimization features such as Field Extraction Rules (FERs), scheduled views and partitions: Join Us If you’re looking to host a Cert Jam at your company, and have classroom space for 50, reach out to our team. We are happy to work with you and see if we can host one in your area. If you’re looking for ways to get certified, or know someone who would benefit, check out our list of upcoming Cert Jams we’re offering. Don’t have Sumo Logic, but want to get started? Sign up for Sumo Logic for free! Our Cert Jam hosted by Tealium in May. Everyone was so enthusiastic to be certified.

Blog

Understanding the Impact of the Kubernetes Security Flaw and Why DevSecOps is the Answer

Blog

Careful Data Science with Scala

This post gives a brief overview of some ideas we presented at the recent Scale By the Bay conference in San Francisco, for more details you can see a video of the talk or take a look at the slides. The Problems of Sensitive Data and Leakage Data science and machine learning have gotten a lot of attention recently, and the ecosystem around these topics is moving fast. One significant trend has been the rise of data science notebooks (including our own here at Sumo Logic): interactive computing environments that allow individuals to rapidly explore, analyze, and prototype against datasets. However, this ease and speed can compound existing risks. Governments, companies, and the general public are increasingly alert to the potential issues around sensitive or personal data (see, for example, GDPR). Data scientists and engineers need to continuously balance the benefits of data-driven features and products against these concerns. Ideally, we’d like a technological assistance that makes it easier for engineers to do the right thing and avoid unintended data processing or revelation. Furthermore, there is also a subtle technical problem known in the data mining community as “leakage”. Kaufman et al won the best paper award at KDD 2011 for Leakage in Data Mining: Formulation, Detection, and Avoidance, which describes how it is possible to (completely by accident) allow your machine learning model to “cheat” because of unintended information leaks in the training data contaminating the results. This can lead machine learning systems which work well on sample datasets but whose performance is significantly degraded in the real world. As this can be a major problem, especially in systems that pull data from disparate sources to make important predictions. Oscar Boykin of Stripe presented an approach to this problem at Scale By the Bay 2017 using functional-reactive feature generation from time-based event streams. Information Flow Control (IFC) for Data Science My talk at Scale By the Bay 2018 discussed how we might use Scala to encode notions of data sensitivity, privacy, or contamination, thereby helping engineers and scientists avoid these problems. The idea is based on programming languages (PL) research by Russo et al, where sensitive data (“x” below) is put in a container data type (the “box” below) which is associated with some security level. Other code can apply transformations or analyses to the data in-place (known as Functor “map” operation in functional programming), but only specially trusted code with an equal or greater security level can “unbox” the data. To encode the levels, Russo et al propose using the Lattice model of secure information flow developed by Dorothy E. Denning. In this model, the security levels form a partially ordered set with the guarantee that any given pair of levels will have a unique greatest lower bound and least upper bound. This allows for a clear and principled mechanism for determining the appropriate level when combining two pieces of information. In the Russo paper and our Scale By the Bay presentation, we use two levels for simplicity: High for sensitive data, and Low for non-sensitive data. To map this research to our problem domain, recall that we want data scientists and engineers to be able to quickly experiment and iterate when working with data. However, when data may be from sensitive sources or be contaminated with prediction target information, we want only certain, specially-audited or reviewed code to be able to directly access or export the results. For example, we may want to lift this restriction only after data has been suitably anonymized or aggregated, perhaps according to some quantitative standard like differential privacy. Another use case might be that we are constructing data pipelines or workflows and we want the code itself to track the provenance and sensitivity of different pieces of data to prevent unintended or inappropriate usage. Note that, unlike much of the research in this area, we are not aiming to prevent truly malicious actors (internal or external) from accessing sensitive data – we simply want to provide automatic support in order to assist engineers in handling data appropriately. Implementation and Beyond Depending on how exactly we want to adapt the ideas from Russo et al, there are a few different ways to implement our secure data wrapper layer in Scala. Here we demonstrate one approach using typeclass instances and implicit scoping (similar to the paper) as well as two versions where we modify the formulation slightly to allow changing the security level as a monadic effect (ie, with flatMap) having last-write-wins (LWW) semantics, and create a new Neutral security level that always “defers” to the other security levels High and Low. Implicit scoping Most similar to the original Russo paper, we can create special “security level” object instances, and require one of them to be in implicit scope when de-classifying data. (Thanks to Sergei Winitzki of Workday who suggested this at the conference!) Value encoding For LWW flatMap, we can encode the levels as values. In this case, the security level is dynamically determined at runtime by the type of the associated level argument, and the de-classify method reveal() returns a type Option[T] where it is None if the level is High. This implementation uses Scala’s pattern-matching functionality. Type encoding For LWW flatMap, we can encode the levels as types. In this case, the compiler itself will statically determine if reveal() calls are valid (ie, against the Low security level type), and simply fail to compile code which accesses sensitive data illegally. This implementation relies on some tricks derived from Stefan Zeiger’s excellent Type-Level Computations in Scala presentation. Data science and machine learning workflows can be complex, and in particular there are often potential problems lurking in the data handling aspects. Existing research in security and PL can be a rich source of tools and ideas to help navigate these challenges, and my goal for the talk was to give people some examples and starting points in this direction. Finally, it must be emphasized that a single software library can in no way replace a thorough organization-wide commitment to responsible data handling. By encoding notions of data sensitivity in software, we can automate some best practices and safeguards, but it will necessarily only be a part of a complete solution. Watch the Full Presentation at Scale by the Bay Learn More

Blog

Why European Users Are Leveraging Machine Data for Security and Customer Experience

To gain a better understanding of the adoption and usage of machine data in Europe, Sumo Logic commissioned 451 Research to survey 250 executives across the UK, Sweden, the Netherlands and Germany, and to compare this data with a previous survey of U.S. respondents that were asked the same questions. The research set out to answer a number of questions, including: Is machine data in fact an important source of fuel in the analytics economy? Do businesses recognize the role machine data can play in driving business intelligence? Are businesses that recognize the power of machine data leaders in their fields? The report, “Using Machine Data Analytics to Gain Advantage in the Analytics Economy, the European Edition,” released at DockerCon Europe in Barcelona this week, reveals that companies in the U.S. are currently more likely to use and understand the value of machine data analytics than their European counterparts, but that Europeans lead the U.S. in using machine data for security use cases. Europeans Trail US in Recognizing Value of Machine Data Analytics Let’s dig deeper into the stats regarding U.S. respondents that stated they were more likely to use and understand the value of machine data analytics. For instance, 36 percent of U.S. respondents have more than 100 users interacting with machine data at least once a week, while in Europe, only 21 percent of respondents have that many users. Likewise, 64 percent of U.S. respondents said that machine data is extremely important to their company’s ability to meet its goals, with 54 percent of European respondents saying the same. When asked if machine data tools are deployed on-premises, only 48 percent of European correspondents responded affirmatively, compared to 74 percent of U.S. respondents. The gap might be explained by idea that U.S. businesses are more likely to have a software-centric mindset. According to the data, 64 percent of U.S. respondents said most of their company had software-centric mindsets, while only 40 percent of European respondents said the same. Software-centric businesses are more likely to recognize that machine data can deliver critical insights, from both an operational and business perspective, as they are more likely to integrate their business intelligence and machine data analytics tools. Software-centric companies are also more likely to say that a wide variety of users, including head of IT, head of security, line-of-business users, product managers and C-level executives recognize the business value of machine data. Europeans Lead US in Using Machine Data for Security At 63 percent, European companies lead the way in recognising the benefit of machine data analytics in security use cases, which is ahead of the U.S. Given strict data privacy regulations in Europe, including the new European Union (EU) General Data Protection Regulation (GDPR), it only seems natural that security is a significant driver for machine data tools in the region. Business Insight Recognized by Europeans as Valuable Beyond security, other top use cases cited for machine data in Europe are monitoring (55 percent), troubleshooting (48 percent) and business insight (48 percent). This means Europeans are clearly recognizing the value of machine data analytics beyond the typical security, monitoring and troubleshooting use-cases — they’re using it as a strategic tool to move the business forward. When IT operations teams have better insight into business performance, they are better equipped to prioritize incident response and improve their ability to support business goals. A Wide Array of European Employees in Different Roles Use Machine Data Analytics The data further show that, in addition to IT operations teams, a wide array of employees in other roles commonly use machine data analytics. Security analysts, product managers and data analysts — some of whom may serve lines of business or senior executives — all appeared at the top of the list of the roles using machine data analytics tools. The finding emphasizes that companies recognize the many ways that machine data can drive intelligence across the business. Customer Experience and Product Development Seen as Most Beneficial to Europeans Although security emerged as an important priority for users of machine data, improved customer experience and more efficient product development emerged as the top benefit of machine data analytics tools. Businesses are discovering that the machine analytic tools they use to improve their security posture can also drive value in other areas, including better end-user experiences, more efficient and smarter product development, optimized cloud and infrastructure spending, and improved sales and marketing performance. Barriers Preventing Wider Usage of Machine Data The report also provided insight into the barriers preventing wider usage of machine data analytics. The number one capability that users said was lacking in their existing tools was real-time access to data (37 percent), followed by fast, ad hoc querying (34 percent). Another notable barrier to broader usage is the lack of capabilities to effectively manage different machine data analytics tools. European respondents also stated that the adoption of modern technologies does make it harder to get the data they need for speedy decision-making (47 percent). Whilst moving to microservices and container-based architectures like Docker makes it easier to deploy at scale, it seems it is hard to effectively monitor activities over time without the right approach to logs and metrics in place. In Conclusion Europe is adopting modern tools and technologies at a slower rate than their U.S. counterparts, and fewer companies currently have a ‘software-led’ mindset in place. Software-centric businesses are doing more than their less advanced counterparts to make the most out of the intelligence available to them in machine data analytics tools. However, a desire for more continuous insights derived from machine data is there: the data show is that once European organisations start using machine data analytics to gain visibility into their security operations, they start to see the value for other use cases across operations, development and the business. The combination of customer experience and compliance with security represent strong value for European users of machine data analytics tools. Users want their machine data tools to drive even more insight into the customer experience, which is increasingly important to many businesses, and at the same time help ensure compliance. Additional Resources Download the full 451 Research report for more insights Check out the Sumo Logic DockerCon Europe press release Download the Paf customer case study Read the European GDPR competitive edge blog Sign up for a free trial of Sumo Logic

Blog

Announcing Extended AWS App Support at re:Invent for Security and Operations

Blog

Complete Visibility of Amazon Aurora Databases with Sumo Logic

Sumo Logic provides digital businesses a powerful and complete view of modern applications and cloud infrastructures such as AWS. Today, we’re pleased to announce complete visibility into performance, health and user activity of the leading Amazon Aurora database via two new applications – the Sumo Logic MySQL ULM application and the Sumo Logic PostgreSQL ULM application. Amazon Aurora is a MySQL and PostgreSQL-compatible relational database available on the AWS RDS platform. Amazon Aurora is up to five times faster than standard MySQL databases and three times faster than standard PostgreSQL databases. By providing complete visibility across your Amazon Aurora databases with these two applications, Sumo Logic provides the following benefits via advanced visualizations: Optimize your databases by understanding query performance, bottlenecks and system utilization Detect and troubleshoot problems by identifying new errors, failed connections, database activity, warnings and system events Monitor user activity by detecting unusual logins, failed events and geo-locations In the following sections of this blog post, we discuss details how these applications provide value to customers. Amazon Aurora Logs and Metrics Sources Amazon provides a rich set of log and metrics sources for monitoring and managing Aurora databases. The Sumo Logic Aurora MySQL ULM app works on the following three log types: AWS CloudTrail event logs AWS CloudWatch metrics AWS CloudWatch logs For Aurora MySQL databases, error logs are enabled by default to be pushed to CloudWatch. Aurora MySQL also supports slow query logs, audit logs, and general logs to be pushed to CloudWatch, however, you need to select this feature on CloudWatch. The Sumo Logic Aurora PostgreSQL ULM app works on the following log types: AWS Cloud Trail event logs AWS CloudWatch metrics For more details on setting up logs, please check the documentation for the Amazon Aurora PostgreSQL app and the Amazon Aurora MySQL app. Installing the Apps for Amazon Aurora Analyzing each of the above logs in isolation to debug a problem, or understand how your database environments are performing can be a daunting and time-consuming task. With the two new Sumo applications, you can instantly get complete visibility into all aspects of running your Aurora databases. Once you have configured your log sources, the Sumo Logic apps can be installed. Navigate to the Apps Catalog in your Sumo Logic instance and add the “Aurora MySQL ULM” or “Aurora PostgreSQL ULM” apps to your library after providing references to sources configured in the previous step. Optimizing Database Performance As part of running today’s digital businesses, customer experiences is a key outcome and towards that end closely monitoring the health of your databases is critical. The following dashboards provide an instant view on how your Amazon Aurora MySQL and PostGreSQL databases are performing across various important metrics. Using the queries from these dashboards, you can build scheduled searches and real-time alerts to quickly detect common performance problems. The Aurora MySQL ULM Logs – Slow Query Dashboard allows you to view log details on slow queries, including the number of slow queries, trends, execution times, time comparisons, command types, users, and IP addresses. The Aurora MySQL ULM Metric – Resource Utilization Monitoring dashboard allows you to view analysis of resource utilization, including usage, latency, active and blocked transactions, and login failures. The Aurora PostgreSQL ULM Metric – Latency, Throughput, and IOPS Monitoring Dashboard allows you to view granular details of database latency, throughput, IOPS and disk queue depth. It is important to monitor the performance of database queries. Latency and throughput are the key performance metrics. Detect and Troubleshoot Errors To provide the best service to your customers, you need to take care of issues quickly and minimize impacts to your users. Database errors can be hard to detect and sometimes surface only after users report application errors. The following set of dashboards help quickly surface unusual or new activity across your AWS Aurora databases. The Aurora MySQL ULM Logs – Error Logs Analysis Dashboard allows you to view details for error logs, including failed authentications, error outliers, top and recent warnings, log levels, and aborted connections. Monitor user activity With cloud environments, its becoming even more critical to investigate user behavior patterns and make sure your database is being accessed by the right staff. The following set of dashboards track all user and database activity and can help prioritize and identify patterns of unusual behavior for security and compliance monitoring. The Aurora MySQL ULM Logs – Audit Log Analysis Dashboard allows you to view an analysis of events, including accessed resources, destination and source addresses, timestamps, and user login information. These logs are specifically enabled to audit activities that are of interest from an audit and compliance perspective. The Aurora MySQL Logs – Audit Log SQL Statements Dashboard allows you to view details for SQL statement events, including Top SQL commands and statements, trends, user management, and activity for various types of SQL statements. You can drill deeper into various SQL statements and commands executed by clicking on the “Top SQL Commands” panel in the dashboard. This will open up the Aurora MySQL ULM – Logs – Audit Log SQL Statements dashboard, which will help with identifying trends, specific executions, user management activities performed and dropped objects. The Aurora PostgreSQL ULM CloudTrail Event – Overview Dashboard allows you to view details for event logs, including geographical locations, trends, successful and failed events, user activity, and error codes. In case you need to drill down for details, the CloudTrail Event – Details dashboard will help you with monitoring the most recent changes made to resources in your Aurora database ecosystem, including creation, modification, deletion and , reboot of Aurora clusters and or instances. Get Started Now! The Sumo Logic apps for Amazon Aurora helps optimize, troubleshoot and secure your AWS Aurora database environments. To get started check out the the Sumo Logic MySQL ULM application and the Sumo Logic PostgreSQL ULM application. If you don’t yet have a Sumo Logic account, you can sign up for a free trial today. For more great DevOps-focused reads, check out the Sumo Logic blog.

November 27, 2018

Blog

The Latest Trends for Modern Apps Built on AWS

Blog

Comparing a Multi-Tenant SaaS Solution vs. Single Tenant

Blog

An Organized Workflow for Prototyping

In the world of agile there’s a demand to solve grey areas throughout the design process at lightning speed. Prototypes help the scrum team test ideas and refine them. Without prototypes, we can’t test ideas until the feature or product has been built which can be a recipe for disaster. It’s like running a marathon without training. During a two week sprint, designers often need to quickly turn around prototypes in order to test. It can be hectic to juggle meetings, design and prototyping without a little planning. The guiding principles below, inspired by my time working with one our lead product designers at Sumo Logic — Rebecca Sorensen, will help you build prototypes more effectively for usability testing under a time crunch. Define the Scope From the start, it’s essential that we understand who is the audience and what is the goal of the prototype so we can determine other parts of the prototyping process like content, fidelity level and tools for the job. We can easily find out the intent by asking the stakeholder what he or she wants to learn. By defining the scope from the beginning we are able to prioritize our time more effectively throughout the prototyping process and tailor the content for the audience. For testing usually our audience are internal users or customers. The scrum team wants to know if the customer can complete a task successfully with the proposed design. Or they may also want to validate a flow to determine design direction. If we’re testing internally, we have more flexibility showing a low or mid fidelity prototype. However, when testing with customers, sometimes we have to consider more polished prototypes with real data. Set Expectations There was a time when designers made last minute changes to the prototype — sometimes while the prototype was being tested because a stakeholder added last-minute feedback — that impacted the outcome and did not provide enough time for the researcher to understand the changes. Before jumping into details, we create milestones to set delivery expectations. This helps the scrum team understand when to give feedback on the prototype and when the research team will receive the final prototype for testing. This timeline is an estimate and it might vary depending on the level of fidelity. We constantly experiment until we find our sweet spot. The best way to get started is to start from a desired end state, like debriefing the researcher on the final prototype, and work backward. The draft of the prototype doesn’t have to be completely finished and polished. It just needs to have at least structure so we can get it in front of the team for feedback. Sometimes, we don’t have to add all the feedback. Instead, we sift through the feedback and choose what makes sense given the time constraints. Tailor the Content for your Audience Content is critical to the success of a prototype. The level of details we need in the prototype depends on the phase our design process. Discovery In the exploration phase we are figuring out what are we building and why, so at this point the content tends to be more abstract. We’re trying to understand the problem space and our users so we shouldn’t be laser focused on details, only structure and navigation matter. Abstraction allows us to have a more open conversation with users that’s not solution focused. Sometimes we choose metaphors that allow us to be on the same playing field as our users to deconstruct their world more easily. We present this in the form of manipulatives — small cut outs of UI or empty UI elements the customer can draw on during a quick participatory design session. Cutting and preparing manipulatives is also a fun team activity Delivery When we move into the delivery phase of design where our focus is on how are we building the product, content needs to reflect the customer’s world. We partner closely with our Product Manager to structure a script. Context in the form of relevant copy, charts, data and labels help support the the script and various paths the user can take when interacting with the prototype. Small details like the data ink, data points along with the correct labels help us make the prototype more realistic so the user doesn’t feel he’s stepping into an unfamiliar environment. Even though a prototype is still an experiment, using real data gives us a preview of design challenges like truncation or readability. We are lucky to have real data from our product. CSVJSON helps us convert files into JSON format so we can use the data with chart plugins and CRAFT. Collaborate to Refine Prototyping is fun and playful — too much that it can be easy to forget that there are also other people who are part of the process. Prototyping is also a social way to develop ideas with non designers so when choosing which tool to present our prototype in we need to keep in mind collaboration outside the design team, not just the final output. We use InVision to quickly convey flows along with annotations but it has a downside during this collaborative process. Annotations can leave room for interpretation since every stakeholder has his own vocabulary. Recently, a couple of our Engineers in Poland started using UXPin. At first it was used to sell their ideas but for usability testing, it has also become a common area where we can work off each others’ prototypes. They like the ability to duplicate prototypes, reshuffle screens so the prototypes can be updated quickly without having to write another long document of explanations. By iterating together we are able to create a common visual representation and move fast. UXPin came to the rescue when collaborating with cross regional teams. It’s an intuitive tool for non designers that allows them to duplicate the prototype and make their own playground too. Tools will continue to change so it’s important to have an open mindset and be flexible to learning and making judgments about when to switch tools to deliver the prototype on time to research. Architect Smartly Although we are on a time crunch when prototyping for research, we can find time to experiment by adjusting the way we build our prototype. Make a playground Our lead productdesigner Rohan Singh invented the hamster playground to go wild with design explorations. The hamster playground is an experimental space which comes in handy when we may need to quickly whip something up without messing the rest of the design. It started as a separate page in our sketch files and now this is also present in our prototyping workspace. When we design something in high fidelity, we become attached to the idea right away. This can cripple experimentation. We need that sacred space detached from the main prototype that allows us to experiment with animations or dynamic elements. The hamster playground can also be a portable whiteboard or pen and paper. Embrace libraries Libraries accelerate the prototyping process exponentially! For the tool you’re commonly using to prototype invest some time (hackathons or end of quarter) to create a pattern library of the most common interactions(this is not a static UI Kit). If the prototype we’re building has some of those common elements, we will save them into the library so other team members can reuse them on another project. Building an interactive library is time consuming but it pays off because it allows the team to easily drag, drop and combine elements like legos. Consolidate the flow We try to remove non essential items from the prototype and replace them with screenshots or turn them into loops so we can focus only on the area that matters for testing. Consolidation also forces us to not overwhelm the prototype with many artboards otherwise we risk having clunky interactions during testing. The other advantage of consolidating is that you can easily map out interactions by triggers, states and animations/transitions. Prepare Researchers for Success Our job is not done until research, our partners, understand what we built. As a best practice, set up some time with the researcher to review the prototype. Present the limitations, discrepancies in different browsers and devices and any other instructions that are critical for the success of the testing session. A short guide that outlines the different paths with screenshots of what the successful interactions look like can aid researchers a lot when they are writing the testing script. Ready, Set…Prototype! Just like marathoners, who intuitively know when to move fast, adjust and change direction, great prototypers work from principles to guide their process. Throughout the design process the scrum team constantly needs answers to many questions. By becoming an effective prototyper, not the master of x tool, you can help the team find the answers right away. The principles outlined above will guide your process so you are more aware of how you spend your time and to know when you’re prototyping too much, too little or the wrong thing. Organization doesn’t kill experimentation; it makes more time for playfulness and solving the big grey areas. This post originally appeared on Medium. Check it out here. Additional Resources Check out this great article to learn how our customers influence the Sumo Logic product and how UX research is key to improving overall experiences Curious to know how I ended up at Sumo Logic doing product design/user experience? I share my journey in this employee spotlight article. Love video games and data? Then you’ll love this article from one of our DevOps engineers on how we created our own game (Sumo Smash bros) to demonstrate the power of machine data

Blog

Understanding Transaction Behavior with Slick + MySQL InnoDB

MySQL has always been among one of the top few database management systems used worldwide, according to DB-engines, one of the leading ranking websites. And thanks to the large open source community behind MySQL, it also solves a wide variety of use cases. In this blog post, we are going to focus on how to achieve transactional behavior with MySQL and Slick. We will also discuss how these transactions resulted in one of our production outages. But before going any further into the details, let’s first define what a database transaction is. In the context of relational databases, a sequence of operations that satisfies some common properties is known as a transaction. This common set of properties, which determine the behavior of these transactions are referred to as atomic, consistent, isolated and durable (ACID) properties. These properties are intended to guarantee the validity of the underlying data in case of power failure, errors, or other real-world problems. The ACID model talks about the basic supporting principles one should think about before designing database transactions. All of these principles are important for any mission-critical applications. One of the most popular storage engines we use in MySQL is InnoDB, whereas Slick is the modern database query and access library for Scala. Slick exposes the underlying data stored in these databases as Scala collections so that data stored onto these databases is seamlessly available. Database transactions come with their own set of overhead, especially in cases when we have long running queries wrapped in a transaction. Let’s understand the transaction behavior, which we get with Slick. Slick offers ways to execute transactions on MySQL. pre{ font-family: Consolas, Menlo, Monaco, Lucida Console, Liberation Mono, DejaVu Sans Mono, Bitstream Vera Sans Mono, Courier New, monospace, serif; margin-bottom: 10px; overflow: auto; width: auto; padding: 5px; background-color: #eee; width: 650px!ie7; padding-bottom: 20px!ie7; max-height: 600px; } function fetchCustomers(): Observable<Customer[]> { ... } val a = (for { ns <- coffees.filter(_.name.startsWith("ESPRESSO")).map(_.name).result _ <- DBIO.seq(ns.map(n => coffees.filter(_.name === n).delete): _*) } yield ()).transactionally These transactions are executed with the help of the Auto-Commit feature provided by the InnoDB engine. We will go into this auto-commit feature later in this article, but first, let me tell you about an outage, which happened on our production services at Sumo Logic and resulted in one of the outages. For the rest of the article, I will be talking about one of our minor outages which happened due to this lack of understanding in this transaction behavior. Whenever any user fires a query, the query follows this course of action before getting started: Query metadata i.e. user, customerID is first sent to Service A Service A asks this common Amazon MySQL RDS for the number of concurrent sessions for this user running across all the instances for this Service A If the number is greater than some threshold we throttle the request and send 429 to the user. Otherwise, we just add the metadata of the session to the table stored in RDS All of these actions are executed within the scope of a single slick transaction. Recently we started receiving lots of lock wait timeouts on this Service A. On debugging further, we saw that from the time we started getting lots of lock wait timeouts, there was also an increase in the average CPU usage across the Service A cluster. Looking into some of these particular issues of lock wait timeouts, we noticed that whenever we had an instance in the cluster going through full GC cycles, that resulted in a higher number of lock wait timeouts across the cluster. But interestingly enough, these lock wait timeouts were all across the cluster and not isolated on the single instance, which suffered from full GC cycles. Based on that, we knew that full GC cycles on one of the nodes were somewhat responsible for causing those lock wait timeouts across the cluster. As already mentioned above, we used the transaction feature provided by slick to execute all of the actions as a single command. So the next logical step was to dig deeper into understanding the question: “how does Slick implement these transactions”? We found out that Slick uses the InnoDB feature of auto-commits to execute transactions. In the auto-commit disabled mode, the transaction is kept open until the transaction is committed from the client side, which essentially means that the connection executing the current transaction holds all the locks until the transaction is committed. Auto-Commit Documentation from the InnoDB Manual In InnoDB, all user activity occurs inside a transaction. If auto-commit mode is enabled, each SQL statement forms a single transaction on its own. By default, MySQL starts the session for each new connection with auto-commit enabled, so MySQL does a commit after each SQL statement if that statement did not return an error. If a statement returns an error, the commit or rollback behavior depends on the error. See Section 14.21.4, “InnoDB Error Handling”. A session that has auto-commit enabled can perform a multiple-statement transaction by starting it with an explicit START TRANSACTION or BEGIN statement and ending it with a COMMIT or ROLLBACK statement. See Section 13.3.1, “START TRANSACTION, COMMIT, and ROLLBACK Syntax”. If auto-commit mode is disabled within a session with SET auto-commit = 0, the session always has a transaction open. A COMMIT or ROLLBACK statement ends the current transaction and a new one starts. Pay attention to the last sentence above. This means if auto-commit is disabled, then the transaction is open, which means all the locks are still with this transaction. All the locks, in this case, will be released only when we explicitly COMMIT the transaction. So in our case, our inability to execute the remaining commands within the transaction due to a high GC, meant that we were still holding onto the locks on the table and therefore would mean that other JVMs executing the transaction touching the same table (which is, in fact, the case), would also suffer from high latencies. But we needed to be sure that was the case on our production environments. So we went ahead with reproducing the production issue on the local testbed, making sure that locks were still held by the transaction on the node undergoing high GC cycles. Steps to Reproduce the High DB Latencies on One JVM Due to GC Pauses on Another JVM Step One We needed some way to know when the queries in the transactions were actually getting executed by the MySQL server. mysql> SET global general_log = 1; mysql> SET global log_output = 'table'; mysql> SELECT * from mysql.general_log; So MySQL general logs show the recent queries which were executed by the server. Step Two We needed two different transactions to execute at the same time in different JVMs to understand this lock wait timeout. Transaction One: val query = (for { ns <- userSessions.filter(_.email.startsWith(name)).length.result _ <- { println(ns) if (ns > n) DBIOAction.seq(userSessions += userSession) else DBIOAction.successful(()) } } yield ()).transactionally db.run(query) Transaction Two: db.run(userSessions.filter(_.id === id).delete) Step Three Now we needed to simulate the long GC pauses or pauses in one of the JVMs to mimic the production environments. On mimicking those long pauses, we need to monitor the mysql.general logs for finding out when did the command reached the MySQL server for asking to be executed. The below chart depicts the order of SQL statements getting executed on both JVMs: JVM 1( Adding the session of the user ) JVM 2 ( Delete the session of the user if present ) SET auto-commit = 0 ( as in false ) SELECT count(*) FROM USERS where User_id = “temp” ( LOCKS ACQUIRED ) SET auto-commit = 0 INSERT INTO USERS user_session DELETE FROM USERS where sessionID = “121” ( Started ) INTRODUCED HIGH LATENCY ON THE CLIENT SIDE FOR 40 SECONDS DELETE OPERATION IS BLOCKED DUE TO THE WAITING ON THE LOCK COMMIT DELETE FROM USERS where sessionID = “121” ( Completed ) COMMIT In the below image, you can see the SQL statements getting executed on both the JVMs: This image shows the lock wait time of around 40 seconds on JVM 2 on “Delete SQL” command: We can clearly see from the logs how pauses in one JVM causes high latencies across the different JVMs querying on MySQL servers. Handling Such Scenarios with MySQL We more than often need to handle this kind of scenario where we need to execute MySQL transactions across the JVMs. So how can we achieve low MySQL latencies for transactions even in cases of pauses in one of the JVMs? Here are some solutions: Using Stored Procedures With stored procedures, we could easily extract out this throttling logic into a function call and store it as a function on MySQL server. They can be easily called out by clients with appropriate arguments and they can be executed all at once on the server side without being afraid of the client side pauses. Along with the use of transactions in the procedures, we can ensure that they are executed atomically and results are hence consistent for the entire duration of the transaction. Delimit Multiple Queries With this, we can create transactions on the client side and execute them atomically on the server side without being afraid of the pauses. Note: You will need to enable allowMultiQueries=true because this flag allows batching multiple queries together into a single query and hence you will be able to run transactions as a single query. Better Indexes on the Table With better indices, we can ensure that while executing SELECT statements with WHERE condition we touch minimal rows and hence ensuring minimal row locks. Let’s suppose we don’t have any index on the table, then in that case for any select statement, we need to take a shared row lock on all the rows of the table, which will mean that during the execution phase of this transaction all the delete or updates would be blocked. So it’s generally advised to have WHERE condition in SELECT to be on an index. Lower Isolation Levels for executing Transactions With READ UNCOMMITTED isolation levels, we can always read the rows which still have not been committed. Additional Resources Want more articles like this? Check out the Sumo Logic blog for more technical content! Read this blog to learn how to triage test failures in a continuous delivery lifecycle Check out this article for some clear-cut strategies on how to manage long-running API queries using RxJS Visit the Sumo Logic App for MySQL page to learn about cloud-native monitoring for MySQL https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

Blog

Exploring Nordcloud’s Promise to Deliver 100 Percent Alert-Based Security Operations to Customers

Blog

Strategies for Managing Long-running API Calls with RxJS

Blog

Near Real-Time Log Collection From Amazon S3 Storage

Blog

SnapSecChat: Sumo Logic CSO Recaps HackerOne's Conference, Security@

Blog

Illuminate 2018 Product Update Q&A with Sumo Logic CEO Ramin Sayar

Blog

How to Triage Test Failures in a Continuous Delivery Lifecycle

Blog

Gain Visibility into Your Puppet Deployments with the New Sumo Logic Puppet App

Puppet is a software configuration management and deployment tool that is available both as an open source tool and commercial software. It’s most commonly used on Linux and Windows to pull the strings on multiple application servers at once. It includes its own declarative language to describe system configurations. In today’s cloud environments that consist of hundreds of distributed machines, Puppet can help in reducing development time and resources by automatically applying these configurations. Just like any other DevOps tool there can be errors and configuration issues. However, with the new Sumo Logic Puppet integration and application, customers can now leverage the Sumo Logic platform to help monitor Puppet performance, configurations and errors. Puppet Architecture and Logging Puppet can apply required configurations across new and existing servers or nodes. You can configure systems with Puppet either in a client-server architecture or in stand-alone architectures. The client-server architecture is the most commonly used architecture for Puppet implementations. Puppet agents apply the required changes and send the reports to the Puppet master describing the run and details of the client resources. These reports can help answer questions like “how often are the resources modified,” “how many events were successful in the past day” and “what was the status of the most recent run?” In addition to reports, Puppet also generates an extensive set of log files. From a reporting and monitoring perspective, the two log files of interest are the Puppet server logs and the HTTP request logs. Puppet server messages and errors are logged to the file /var/log/puppetlabs/puppetserver/puppetserver.log. Logging can be configured using the /etc/puppetlabs/puppetserver/logback.xml file, which can be used to monitor the health of the server. The /var/log/puppetlabs/puppetserver/puppetserver-access.log file contains HTTP traffic being routed via your Puppet deployment. This logging can be handled using the configuration file: /etc/puppetlabs/puppetserver/request-logging.xml. Puppet agent requests to the master are logged into this file. Sumo Logic Puppet App The Sumo Logic Puppet app is designed to effectively manage and monitor Puppet metrics, events and errors across your deployments. With Sumo Logic dashboards you will be able to easily identify: Unique nodes Puppet node runs activity Service times Catalog application times Puppet execution times Resource transition (failures, out-of-sync, modifications, etc.) Error rates and causes Installation In order to get started, the app requires three data sources: Puppet server logs Puppet access logs Puppet reports The puppet server logs and puppet access logs are present in the directory var/log/puppetlabs/puppetserver/. Configure separate local file resources for both of these log files. Puppet reports are generated as yaml files. These need to be converted into JSON files before ingesting into Sumo Logic. To ingest Puppet reports, you must configure a script source. Once the log sources are configured, the Sumo Logic app can be installed. Simply navigate to the apps Catalog in your Sumo Logic instance and add the Puppet app to the library after providing the sources configured in the previous step. For more details on app configuration, please see instructions on Sumo Logic’s DocHub. Sumo Logic Puppet App Visualizations In any given Puppet deployment, there can be a large number of nodes. Some of the nodes may be faulty or others may be very active. The Puppet server manages the nodes and it may be suffering from issues itself. The Sumo Logic Puppet app consists of predefined dashboards and search queries which help you monitor the Puppet infrastructure. The Puppet Overview dashboard shown below gives you an overview of activity across nodes and servers. If a Puppet node is failing, you can quickly find out when the node made requests, what version it is running on and how much time it is taking to prepare the catalog for the node by the server. Puppet Overview Dashboard Let’s take a closer look at the Error Rate panel. The Error Rate panel displays the error rates per hour. This helps identify when error rates spiked, and by clicking on the panel, you can identify the root cause on either the node-level or the server-level via the Puppet Error Analysis dashboard. In addition, this dashboard highlights the most erroneous nodes along with the most recent errors and warnings. With this information, it will be easier to drill down into the root cause of the issues. The panel Top Erroneous Nodes helps in identifying the most unstable nodes. Drill down to view the search query by clicking on the “Show in Search” icon highlighted in the above screenshot. The node name and the errors can be easily identified and corrective actions can be performed by reviewing the messages in the search result as shown in the screenshot below: With the help of information on the Puppet – Node Puppet Run Analysis dashboard, node health can be easily determined across different deployments such as production and pre-production. The “Slowest Nodes by Catalog Application Time” panel helps you determine the slowest nodes, which can potentially be indicative of problems and issues within those nodes. From there, you can reference the Puppet Error Analysis dashboard to determine the root cause. The “Resource Status” panel helps you quickly determine the status of various resources, further details around which can be obtained by drilling down to the query behind it. By reviewing the panels on this dashboard, highest failing or out-of-sync resources can be easily determined, which may be indicative of problems on respective nodes. To compare the average catalog application times, take a look at the “Average Catalog Application Time” and “Slowest Nodes by Catalog Application Time” panels. The resources panels show resources that failed, modified, are out-of-sync and skipped. Drilling down to the queries of the panels will help in determining the exact resource list with the selected status. Note: All the panels in the Puppet Node Puppet Run Analysis dashboard and some panels of the Puppet Overview dashboard can be filtered based on the environment, such as production, pre-production, etc. as shown below: Get Started Now! The Sumo Logic app for Puppet monitors your entire Puppet infrastructure potentially spanning hundreds of nodes and helps determine the right corrective and preventative actions. To get started check out the Sumo Logic Puppet app help doc. If you don’t yet have a Sumo Logic account, you can sign up for a free trial today. For more great DevOps-focused reads, check out the Sumo Logic blog.

Blog

Pokemon Co. International and Sumo Logic's Joint Journey to Build a Modern Day SOC

The world is changing. The way we do business, the way we communicate, and the way we secure the enterprise are all vastly different today than they were 20 years ago. This natural evolution of technology innovation is powered by the cloud, which has not only freed teams from on-premises security infrastructure, but has also provided them with the resources and agility needed to automate mundane tasks. The reality is that we have to automate in the enterprise if we are to remain relevant in an increasingly competitive digital world. Automation and security are a natural pairing, and when we think about the broader cybersecurity skills talent gap, we really should be thinking about how we can replace simple tasks through automation to make way for teams and security practitioners to be more innovative, focused and strategic. A Dynamic Duo That’s why Sumo Logic and our partner, The Pokemon Co. International, are all in on bringing together the tech and security innovations of today and using those tools and techniques to completely redefine how we do security operations, starting with creating a new model for how security operations center (SOC) should be structured and how it should function. So how exactly are we teaming up to build a modern day SOC, and what does it look like in terms of techniques, talent and tooling? We’ll get into that, and more, in this blog post. Three Pillars of the Modern Day SOC Adopt Military InfoSec Techniques The first pillar is all about mindset and adopting a new level of rigor and way of thinking for security. Both the Sumo Logic and Pokemon security teams are built on the backbone of a military technique called the OODA loop, which was originally coined by U.S. Air Force fighter pilot and Pentagon consultant of the late twentieth century, John Boyd. Boyd created the OODA loop to implement a change in military doctrine that focused on an air-to-air combat model. OODA stands for observe, orient, decide and act, and Boyd’s thinking was that if you followed this model and ensured that your OODA loop was faster than that of your adversary’s, then you’d win the conflict. Applying that to today’s modern security operations, all of the decisions made by your security leadership — whether it’s around the people, process or tools you’re using — should be aimed at reducing your OODA loop to a point where, when a situation happens, or when you’re preparing for a situation, you can easily follow the protocol to observe the behavior, orient yourself, make effective and efficient decisions, and then act upon those decisions. Sound familiar? This approach is almost identical to most current incident response and security protocols, because we live in an environment where every six, 12 or 24 months we’re seeing more tactics and techniques changing. That’s why the SOC of the future is going to be dependent on a security team’s ability to break down barriers and abandon older schools of thought for faster decision making models like the OODA loop. This model is also applicable across an organization to encourage teams to be more efficient and collaborative cross-departmentally, and to move faster and with greater confidence in order to achieve mutually beneficial business goals. Build and Maintain an Agile Team But it’s not enough to have the right processes in place. You also need the right people that are collectively and transparently working towards the same shared goal. Historically, security has been full of naysayers, but it’s time to shift our mindset to that of transparency and enablement, where security teams are plugged into other departments and are able to move forward with their programs as quickly and as securely as they can without creating bottlenecks. This dotted line approach is how Pokemon operates and it’s allowed the security team to share information horizontally, which empowers development, operations, finance and other cross-functional teams to also move forward in true DevSecOps spirit. One of the main reasons why this new and modern Sumo Logic security team structure has been successful is because it’s enabled each function — data protection/privacy, SOC, DevSecOps and federal — to work in unison not only with each other, but also cross-departmentally. In addition to knowing how to structure your security team, you also need to know what to look for when recruiting new talent. Here are three tips from Pokemon’s Director of Information Security and Data Protection Officer, John Visneski: Go Against the Grain. Unfortunately there are no purple security unicorns out there. Instead of finding the “ideal” security professional, go against the grain. Find people with the attitude and aptitude to succeed, regardless of direct security experience. The threat environment is changing rapidly, and burnout can happen fast, which is why it’s more important to have someone on in your team with those two qualities.Why? No one can know everything about security and sometimes you have to adapt and throw old rules and mindsets out the window. Prioritize an Operational Mindset. QAs and test engineers are good at automation and finding gaps in seams, very applicable to security. Best Security Engineers didn’t know a think about security before joining Pokemon, but he had a valuable skill set.Find talent pools that know how the sausage is made. Best and brightest security professionals didn’t even start out in security but their value add is that they are problem solvers first, and security pros secondary. Think Transparency. The goal is to get your security team to a point where they’re sharing information at a rapid enough pace and integrating themselves with the rest of the business. This allows for core functions to help solve each other’s problems and share use-cases, and it can only be successful if you create a culture that is open and transparent. The bottom line: Don’t be afraid to think outside of the box when it comes to recruiting talent. It’s more important to build a team based on want, desire and rigor, which is why bringing in folks with military experience has been vital to both Sumo Logic’s and Pokemon’s security strategies. Security skills can be learned. What delivers real value to a company are people that have a desire to be there, a thirst for knowledge and the capability to execute on the job. Build a Modern Day Security Stack Now that you have your process, and your people, you need your third pillar — tools sets. This is the Sumo Logic reference architecture that empowers us to be more secure and agile. You’ll notice that all of these providers are either born in the cloud or are open source. The Sumo Logic platform is at the core of this stack, but its these partnerships and tools that enable us to deliver our cloud-native machine data analytics as a service, and provide SIEM capabilities that easily prioritize and correlate sophisticated security threats in the most flexible way possible for our customers. We want to grow and transform with our own customer’s modern application stacks and cloud architectures as they digitally transform. Pokemon has a very similar approach to their security stack: The driving force behind Pokemon’s modern toolset is the move away from old school customer mentality of presenting a budget and asking for services. The customer-vendor relationship needs to mirror a two way partnership with mutually invested interests and clear benefits on both sides. Three vendors — AWS, CrowdStrike and Sumo Logic — comprise the core base of the Pokemon security platform, and the remainder of the stack is modular in nature. This plug and play model is key as the security and threat environments continue to evolve because it allows for flexibility in swapping in and out new vendors/tools as they come along. As long as the foundation of the platform is strong, the rest of the stack can evolve to match the current needs of the threat landscape. Our Ideal Model May Not Be Yours We’ve given you a peek inside the security kimono, but it’s important to remember that every organization is different, and what works for Pokemon or Sumo Logic may not work for every particular team dynamic. While you can use our respective approaches as a guide to implement your own modern day security operations, the biggest takeaway here is that you find a framework that is appropriate for your organization’s goals and that will help you build success and agility within your security team and across the business. The threat landscape is only going to grow more complex, technologies more advanced and attackers more sophisticated. If you truly want to stay ahead of those trends, then you’ve got to be progressive in how you think about your security stack, teams and operations. Because regardless of whether you’re an on-premises, hybrid or cloud environment, the industry and business are going to leave you no choice but to adopt a modern application stack whether you want to or not. Additional Resources Learn about Sumo Logic's security analytics capabilities in this short video. Hear how Sumo Logic has teamed up with HackerOne to take a DevSecOps approach to bug bounties in this SnapSecChat video. Learn how Pokemon leveraged Sumo Logic to manage its data privacy and GDPR compliance program and improve its security posture.

Blog

The 3 Phases Pitney Bowes Used to Migrate to AWS

Blog

How to Use the New Sumo Logic Terraform Provider for Hosted Collectors

Over the years, automation has become a key component in the management of the entire software release lifecycle. Automation helps teams get code from development into the hands of users faster and more reliably. While this principle is critical to your source code and continuous integration and delivery processes, it is equally essential to the underlying infrastructure you depend on. As automation has increased, a new principle for managing infrastructure has emerged to prevent environment drift and ensure your infrastructure is consistently and reliably provisioned. What Is Infrastructure as Code? Infrastructure-as-code (IaC) is a principle where infrastructure is defined using a declarative model and version controlled right along with your source code. The desired infrastructure is declared in a higher level descriptive language. Every aspect of your infrastructure, including servers, networks, firewalls and load balancers can be declared using this model. The infrastructure gets provisioned from the defined model automatically with no manual intervention required. This provisioning happens with a tool that interacts with APIs to spin up your infrastructure as needed. IaC ensures that your infrastructure can be created and updated reliably, safely and consistently anytime you need. It can be challenging to implement or practice IaC without the proper tools because it requires a lot of scripting, which can also be very time consuming. Luckily, there are a few out there that currently exist to help DevOps teams practice IaC, including one well-known and widely-used tool, Terraform. Why Terraform? Terraform is an open source tool developed by HashiCorp to address the needs of IaC. Terraform can be used to create, manage and update infrastructure resources. You can use Terraform to manage physical machines, virtual machines (VMs) load balancers, firewalls and many other resources. It provides ways to represent almost any type of infrastructure. In Terraform, you use a “provider” to define these resources. A provider understands the various APIs and contracts required to create, manage and update the various resources. Providers are created by IaaS offerings such as Amazon Web Services (AWS), Google Cloud Platform (GCP) and Microsoft Azure or OpenStack or SaaS offerings like Terraform Enterprise and CloudFlare. The provider defines a declarative model to create the various infrastructure resources it offers. Introducing the Sumo Logic Terraform Provider That’s why today we’re happy to announce that we have now released a Terraform Provider for Sumo Logic, and just in time for HashiConf ‘18 this week in San Francisco. This provider helps you treat your Sumo Logic Hosted Collectors and Sources as code, ensuring consistency across your cloud infrastructure monitoring for AWS, GCP, Azure, and other cloud environments supported by Terraform. Using our Terraform provider along with the existing provider you use to manage your cloud infrastructure, you can easily set up Sumo Logic to monitor those resources and link that set up directly to the provisioning of your infrastructure. For example, if you are using the AWS provider to create your Elastic Load Balancer (ELB) and an S3 bucket to capture those logs, you can also define a Sumo Logic Hosted with an ELB source that brings those logs from the S3 bucket straight into the Sumo Logic platform. This configuration is declared in code and version controlled to give a consistent and reliable set up to create and monitor your infrastructure. Let’s walk through an example of how you can use the Terraform provider. In this example, we will use the Sumo Logic Terraform provider to create a Hosted Collector with an HTTP Source. I will be demonstrating this example using my Mac. Step-by-Step Instructions The first thing we need to do is install Terraform. I will be installing Terraform using Homebrew, a package manager for Mac OSX. Once Terraform is installed, we need to download the Sumo Logic Terraform Provider from the GitHub release page. In this case, I will be downloading the binary for Mac OSX. We then need to copy it to the Terraform plugins directory. Next we need to initialize Terraform to ensure it is ready to go by running ‘terraform init.’ With the provider in place and Terraform initialized, we can now define a configuration file for our Hosted Collector and HTTP Source. The following Terraform configuration will create a Hosted Collector with an HTTP source. provider “sumologic” { access_id = “sumo-logic-access-id” access_key = “sum-logic-access-key” environment = “us2” } resource “sumologic_collector” “example_collector” { name = “Hosted Collector” category = “my/source/category” } resource “sumologic_http_source” “example_http_source” { name = “HTTP Source” category = “my/source/category” collector_id = “${sumologic_collector.example_collector.id}” } Let’s break down the above file. The provider section defines the required properties for the Sumo Logic provider. You need to create an Access ID and Access Key which will be the credentials for the Sumo Logic API. The environment should be set based on where your Sumo Logic account is located, in this case it is US2. There are then two resource sections, where we define the Hosted Collector and the HTTP Source. An HTTP source requires the ID of the collector you wish to assign it to. In the HTTP source resource section you will see we reference the ID of the collector by using the hosted collector we created above. With the file in place and all inputs in, we can now spin up our collector. To have Terraform create this for us, we simply need to run terraform apply. Terraform will prompt us to ensure we want to do this action. After entering yes, you should see the following output indicating that our resources have been created. And now if we go to Sumo Logic and look at our Collector page, we see we have our Hosted Collector and HTTP source, just like we defined in our configuration file! Our Terraform provider allows you to configure your Hosted Collectors and Sources with all the same properties you would expect and you can see all the different options in our Documentation page. More To Come At Sumo Logic, we are developing many new APIs to give our users full control over the provisioning of all their Sumo Logic configurations. As these APIs continue to roll out, we will be updating our provider to expose these additional resources. This will allow users to manage all aspects of Sumo Logic, including Collection, User Management, and Content as code, as well as have the ability to automate every aspect of Sumo Logic. Additional Resources Download our 2018 State of Modern Apps and DevSecOps in the Cloud report for trending insights into how some of the world’s top cloud savvy companies like Twitter, Airbnb, Adobe, Salesforce, etc. build and manage their modern applications. Want to know how to how to integrate Sumo Logic’s monitoring platform into your Terraform-scripted cloud infrastructure for EC2 resources? Read the blog. Thinking about adopting modern microservices-based infrastructure? Check out part one and part two of our blog series on how to manage Kubernetes/Docker with Sumo Logic.

Blog

Exploring the Future of MDR and Cloud SIEM with Sumo Logic, eSentire and EMA

At Sumo Logic’s annual user conference, Illuminate, we announced a strategic partnership with eSentire, the largest pure-play managed detection and response (MDR) provider, that will leverage security analytics from the Sumo Logic platform to deliver full spectrum visibility across the organization, eliminating common blind spots that are easily exploited by attackers. Today’s digital organizations operate on a wide range of modern applications, cloud infrastructures and methodologies such as DevSecOps, that accumulate and release massive amounts of data. If that data is managed incorrectly, it could allow malicious threats to slip through the cracks and negatively impact the business. This partnership combines the innovative MDR and cloud-based SIEM technologies from eSentire and Sumo Logic, respectively, that provide customers with improved analytics and actionable intelligence to rapidly detect and investigate machine data to identify potential threats to cloud or hybrid environments and strengthen overall security posture. Watch the video to learn more about this joint effort as well as the broader security, MDR, and cloud SIEM market outlook from Jabari Norton, VP global partner sales & alliances at Sumo Logic, Sean Blenkhorn, field CTO and VP sales engineering & advisory services at eSentire, and Dave Monahan, managing research director at analyst firm, EMA. For more details on the specifics of this partnership, read the joint press release.

Blog

Accelerate Security and PCI Compliance Visibility with New Sumo Logic Apps for Palo Alto Networks

Blog

Artificial Intelligence vs. Machine Learning vs. Deep Learning: What's the Difference?

Blog

Illuminate 2018 Video Q&A with Sumo Logic CEO Ramin Sayar

Blog

Intrinsic vs Meta Tags: What’s the Difference and Why Does it Matter?

Tag-based metrics are typically used by IT operations and DevOps teams to make it easier to design and scale their systems. Tags help you to make sense of metrics by allowing you to filter on things like host, cluster, services, etc. However, knowing which tags to use, and when, can be confusing. For instance, have you ever wondered about the difference between intrinsic tags (or dimensions) and meta tags with respect to custom application metrics? If so, you’re not alone. It is pretty common to get the two confused, but don’t worry because this blog post will help explain the difference. Before We Get Started First, let’s start with some background. Metrics in Carbon 2.0 take on the following format: Note that there are two spaces between intrinsic_tags and meta_tags. If a tag is listed before the double space, then it is an intrinsic tag. If a tag is listed after the double space, then it is a meta tag. Meta_tags are also optional. If no meta_tags are provided, there must be two spaces between intrinsic_tags and value. Some examples of Carbon 2.0 metrics might be: Understanding Intrinsic Tags Intrinsic tags may also be referred to as dimensions and are metric identifiers. If you have two data points sent with same set of dimension values then they will be values in the same metric time series. In the examples above, each metric has different dimensions so they will be separate time series. Understanding Meta Tags On the other hand, meta tags are not used as metric identifiers. This means that if two data points have the same intrinsic tags or dimensions, but different meta tags, they will still be values in the same metric time series. Meta tags are meant to be used in addition to intrinsic tags so that you can more conveniently select the metrics. Let’s Look at an Example To make that more clear, let’s use another example. Let’s say that you have 100 servers in your cluster that are reporting host metrics like “metric=cpu_idle.” This would be an intrinsic tag. You may also want to track the version of your code running on that cluster. Now if you put the code version in an intrinsic tag, you’ll get a completely new set of metrics every time you upgrade to a new code version. Unless you want to maintain the metrics “history” of the old code version, you probably don’t want this behavior. However, if you put the version in a meta tag instead then you will be able to change the version without creating a new set of metrics for your cluster. To take the example even further, let’s say you have upgraded half of your cluster to a new version and want to compare the CPU idle of the old and new code version. You could do this in Sumo Logic using the query “metric = cpu_idle | avg by version.” Knowing the Difference To summarize, if you want two values of a given tag to be separate metrics at the same time then the values should be an intrinsic tag and not a meta tag. Hopefully this clears up some of the confusion regarding intrinsic versus meta tags. By tagging your metrics appropriately you will make them easier to search and ensure that you are tracking all the metrics you expect. If you already have a Sumo Logic account, then you are ready to start ingesting custom metrics. If you are new to Sumo Logic, start by signing up for a free account here. Additional Resources Learn how to accelerate data analytics with Sumo Logic’s Logs to Metrics solution in this blog Want to know how to transform Graphite data into metadata-rich metrics? Check out our Metrics Rules solution Read the case study to learn how Paf leveraged the Sumo Logic platform to derive critical insights that enabled them to analyze log and metric data, perform root-cause analysis, and monitor apps and infrastructure

Blog

Why is Oracle and Microsoft SQL Adoption Low for Developers on AWS?

Blog

Why Decluttering Complex Data in Legends is Hard

Blog

5 Best Practices for Using Sumo Logic Notebooks for Data Science

This year, at Sumo Logic’s third annual user conference, Illuminate 2018, we presented Sumo Logic Notebooks as a way to do data science in Sumo Logic. Sumo Logic Notebooks are an experimental feature that integrate Sumo Logic, notebooks and common machine learning frameworks. They are a bold attempt to go beyond what the current Sumo Logic product has to offer and enable a data science workflow leveraging our core platform. Why Notebooks? In the data science world, notebooks have emerged as an important tool to do data science. Notebooks are active documents that are created by individuals or groups to write and run code, display results, and share outcomes and insights. Like every other story, a data science notebook follows a structure that is typical for its genre. We usually have four parts. We (a) start with defining a data set, (b) continue to clean and prepare the data, (c) perform some modeling using the data, and (d) interpret the results. In essence, a notebook should record an explanation of why experiments were initiated, how they were performed, and then display the results. Anatomy of a Notebook A notebook segments a computation in individual steps called paragraphs. A paragraph contains an input and an output section. Each paragraph executes separately and modifies the global state of the notebook. State can be defined as the ensemble of all relevant variables, memories, and registers. Paragraphs must not necessarily contain computations, but also can contain text or visualizations to illustrate the workings of the code. The input section (blue) will contain the instruction to the notebook execution engine (sometimes called kernel or interpreter). The output section (green) will display a trace of the paragraph’s execution and/or an intermediate result. In addition, the notebook software will expose some controls (purple) for managing and versioning notebook content as well as operational aspects such as starting and stopping executions. Human Speed vs Machine Speed The power of the notebook roots in its ability to segment and then slow down computation. Common executions of computer programs are done at machine speed. Machine speed suggests that when a program is submitted to the processor for execution, it will run from start to end as fast as possible and only block for IO or user input. Consequently, the state of the program changes so fast that it is neither observable, nor modifiable by humans. Programmers would typically attach debuggers physically or virtually to stop programs during execution at so-called breakpoints and read out and analyze their state. Thus, they would slow down execution to human speed. Notebooks make interrogating the state more explicit. Certain paragraphs are dedicated to make progress in the computation, i.e., advance the state, whereas other paragraphs would simply serve to read out and display the state. Moreover, it is possible to rewind state during execution by overwriting certain variables. It is also simple to kill the current execution, thereby deleting the state and starting anew. Notebooks as an Enabler for Productivity Notebooks increase productivity, because they allow for incremental improvement. It is cheap to modify code and rerun only the relevant paragraph. So when developing a notebook, the user builds up state and then iterates on that state until progress is made. Running a stand-alone program on the contrary will incur more setup time and might be prone to side-effects. A notebook will most likely keep all its state in the working memory whereas every new execution of a stand-alone program will need to build up the state on every time it is run. This takes more time and the required IO operations might fail. Working off a program state in the memory and iterating on that proved to be very efficient. This is particularly true for data scientists, as their programs usually deal with a large amount of data that has to be loaded in and out of memory as well as computations that can be time-consuming. From an the organizational point of view, notebooks are a valuable tool for knowledge management. As they are designed to be self-contained, sharable units of knowledge, they amend themselves for: Knowledge transfer Auditing and validation Collaboration Notebooks at Sumo Logic At Sumo Logic, we expose notebooks as an experimental feature to empower users to build custom models and analytics pipelines on top of log metrics data sets. The notebooks provide the framework to structure a thought process. This thought process can be aimed at delivering a special kind of insight or outcome. It could be drilling down on a search. Or an analysis specific to a vertical or an organization. We provide notebooks to enable users to go beyond what Sumo Logic operators have to offer, and train and test custom machine learning (ML) algorithms on your data. Inside notebooks we deliver data using data frames as a core data structure. Data frames make it easy to integrate logs and metrics with third-party data. Moreover, we integrate with other leading data wrangling, model management and visualization tools/services to provide a blend of the best technologies to create value with data. Technology Stack Sumo Logic Notebooks are an integration of several software packages to make it easy to define data sets using the Sumo Query language and use the result data set as a data frame in common machine learning frameworks. Notebooks are delivered as a Docker container and can therefore be installed on laptops or cloud instances without much effort. The most common machine learning libraries such as Apache Spark, pandas, and TensorFlow are pre-installed, but others are easy to add through python’s pip installer, or using apt-get and other package management software from the command line. Changes can be made persistent by committing the Docker image. The key of Sumo Logic Notebooks is the integration of the Sumo Logic API data adapter with Apache Spark. After a query has been submitted, the adapter will load the data and ingest it into Spark. From there we can switch over to a python/pandas environment or continue with Spark. The notebook software provides the interface to specify data science workflows. Best Practices for Writing Notebooks #1 One notebook, one focus A notebook contains a complete record of procedures, data, and thoughts to pass on to other people. For that purpose, they need to be focused. Although it is tempting to put everything in one place, this might be confusing for users. Better write two or more notebooks than overloading a single notebook. #2 State is explicit A common source of confusion is that program state gets passed on between paragraphs through hidden variables. The set of variables that represent the interface between two subsequent paragraphs should be made explicit. Referencing variables from other paragraphs than the previous one should be avoided. #3 Push code in modules A notebook integrates code, it is not a tool for code development. That would be an Integrated Development Environment (IDE). Therefore, a notebook should one contain glue code and maybe one core algorithm. All other code should be developed in an IDE, unit tested, version controlled, and then imported via libraries in the notebook. Modularity and all other good software engineering practices are still valid in notebooks. As in practice number one too much code clutters the notebook and distracts from the original purpose or analysis goal. #4 Use speaking variables and tidy up your code Notebooks are meant to be shared and read by others. Others might not have an easy time following our thought process, if we did not come up with good, self-explaining names. Tidying up the code goes a long way, too. Notebooks impose an even higher standard than traditional code on quality. #5 Label diagrams A picture is worth a thousand words. A diagram, however, will need some words to label axes, describe lines and dots, and comprehend other important informations such sample size, etc. A reader can have a hard time to seize the proportion or importance of a diagram without that information. Also keep in mind that diagrams are easily copy-pasted from the notebook into other documents or in chats. Then they lose the context of the notebook in which they were developed. Bottom Line The segmentation of a thought process is what fuels the power of the notebook. Facilitating incremental improvements when iterating on a problem boosts productivity. Sumo Logic enables the adoption of notebooks to foster the use of data science with logs and metrics data. Additional Resources Visit our Sumo Logic Notebooks documentation page to get started Check out Sumo Logic Notebooks on DockerHub or Read the Docs Read our latest press release announcing new platform innovations, including our new Data Science Insights innovation

Blog

How to Monitor Azure Services with Sumo Logic

Blog

Illuminate Day Two Keynote Top Four Takeaways

Day two of Illuminate, Sumo Logic’s annual user conference, started with a security bang, hearing from our founders, investors, customers and a special guest (keep reading to see who)! If you were unable attend the keynote in person, or watch via the Facebook Livestream, we’ve recapped the highlights below for you. If you are curious about the day one keynote, check out that recap blog post, as well. #1: Dial Tones are Dead, But Reliability Is Forever Two of our founders, Christian Beedgen and Bruno Kurtic, took the stage Thursday morning to kick off the second day keynote talk, and they did not disappoint. Sumo Logic founders Bruno Kurtic (left) and Christian Beedgen (right) kicking off the day two Illuminate keynote Although the presentation was full of cat memes, penguins and friendly banter, they delivered an earnest message: reliability, availability and performance are important to our customers, and are important to us at Sumo Logic. But hiccups happen, it’s inevitable, and that’s why Sumo Logic is committed to constantly monitoring for any hiccups so that we can troubleshoot instantly when they happen. The bottom line: our aspiration at Sumo Logic is to be the dial tone for those times when you absolutely need Sumo Logic to work. And we do that through total transparency. Our entire team has spent time on building a reliable service, built on transparency and constant improvement. It really is that simple. #2: The Platform is the Key to Democratizing Machine Data (and, Penguins) We also announced a number of new platform enhancements, solutions and innovations at Illuminate, all with the goal of improving our customers’ experiences. All of that goodness can be found in a number of places (linked at the end of this article), but what was most exciting to hear from Bruno and Christian on stage was what Sumo Logic is doing to address major macro trends. The first being proliferation of users and access. What we’ve seen from our customers, is that the Sumo Logic platform is brought into a specific group, like the security team, or the development team, and then it spreads like wildfire, until the entire company (or all of the penguins) wants access to the rich data insights. That’s why we’ve taken an API-first approach to everything we do. To keep your workloads running around the globe, we now have 20 availability zones across five regions and we will continue to expand to meet customer needs. The second being cloud scale economics because Moore’s Law is, in fact, real. Data ingest trends are going up, and for years our customers have relied on Sumo Logic to manage mission-critical data in order to keep their modern applications running and secured. Not all data is created equal, and different data sets have different requirements. Sometimes, it can be a challenge to store data outside of the Sumo Logic platform, which is why our customers now will have brand new capabilities for basic and cold storage within Sumo Logic. (Christian can confirm that the basic storage is still secure — by packs of wolves). The third trend is around the unification of modern apps and machine data. While the industry is buzzing about observability, one size does not fit all. To address this challenge, the Sumo Logic team asked, what can we do to deliver on the vision of unification? The answer is in the data. For the first time ever, we will deliver the State of Modern Applications report live, where customers can push their data to dynamic dashboards, and all of this information will be accessible in new easy to read charts that are API-first, templatized and most importantly, unified. Stay tuned for more on the launch of this new site! #3: The State of Security from Greylock, AB InBev and Pokemon One of my favorite highlights of the second day keynote was the security panel, moderated by our very own CSO, George Gerchow, with guests from one of our top investors, Greylock Partners, and two of our customers, Anheuser-Busch InBev (AB InBev) and Pokemon. From left to right: George Gerchow, CSO, Sumo Logic; Sara Guo, partner, Greylock; Khelan Bhatt, global director, security architecture, AB InBev; John Visneski, director infosecurity & DPO, Pokemon Sara Guo, general partner at Greylock, spoke about three constantly changing trends, or waves, she’s tracking in security, and what she looks when her firm is considering an investment: the environment, the business and the attackers. We all know the IT environment is changing drastically, and as it moves away from on-premises protection, it’s not a simple lift and shift process, we have to actually do security differently. Keeping abreast of attacker innovation is also important for enterprises, especially as cybersecurity resources continue to be sparse. We have to be able to scale our products, automate, know where our data lives and come together as a defensive community. When you think of Anheuser-Busch, you most likely think of beer, not digital transformation or cybersecurity. But there’s actually a deep connection, said Khelan Bhatt, global director, security architecture, AB InBev. As the largest beer distributor in the world, Anheuser Busch has 500 different breweries (brands) in all corners of the world, and each one has its own industrial IoT components that are sending data back to massive enterprise data lakes. The bigger these lakes get, the bigger targets they become to attackers. Sumo Logic has played a big part in helping the AB InBev security team digitally transform their operations, and building secure enterprise data lakes to maintain their strong connection to the consumer while keeping that data secure. John Visneski, director of information security and data protection officer (DPO) for the Pokémon Company International had an interesting take on how he and his team approach security. Be a problem solver first, and a security pro second. Although John brought on Sumo Logic to help him fulfill security and General Data Protection Regulation (GDPR) requirements, our platform has become a key business intelligence tool at Pokemon. With over 300 million active users, Pokemon collects sensitive personally identifiable information (PII) from children, including names, addresses and some geolocation data. Sumo Logic has been key for helping John and his team deliver on the company’s core values: providing child and customer safety, trust (and uninterrupted fun)! #4: Being a Leader Means Being You, First and Foremost When our very special guest, former CIA Director George Tenet, took the stage, I did not expect to walk away with some inspiring leadership advice. In a fireside chat with our CEO, Ramin Sayar, George talked about how technology has changed the threat landscape, and how nation-state actors are leveraging the pervasiveness of data to get inside our networks and businesses. Data is a powerful tool that can be used for good or bad. At Sumo Logic, we’re in it for the good. George also talked about what it means to be a leader and how to remain steadfast, even in times of uncertainty. Leaders have to lead within the context of who they are as human beings. If they try to adopt a persona of someone else, it destroys their credibility. The key to leaderships is self awareness of who you are, and understanding your limitations so that you can hire smart, talented people to fill those gaps. Leaders don’t create followers, they create other leaders. And that’s a wrap for Sumo Logic’s second annual user conference. Thanks to everyone who attended and supported the event. If we didn’t see you at Illuminate over the last two days, we hope you can join us next year! Additional Resources For data-driven industry insights, check out Sumo Logic’s third annual ‘State of Modern Applications and DevSecOps in the Cloud’ report. You can read about our latest platform innovations in our press release, or check out the cloud SIEM solution and Global Intelligence Service blogs. Check out our recent blog for a recap of the day one Illuminate keynote.

Blog

Illuminate Day One Keynote Top Five Takeaways

Today kicked off day one of Sumo Logic’s second annual user conference, Illuminate, and there was no better way to start the day than with a keynote presentation from our CEO, Ramin Sayar, and some of our most respected and valued customers, Samsung SmartThings and Major League Baseball (MLB). The event was completely sold out and the buzz and excitement could be felt as customers, industry experts, thought leaders, peers, partners and employees made their way to the main stage. If you were unable to catch the talk in person or tune in for the Facebook livestream, then read on for the top five highlights from the day one keynote. #1: Together, We’ve Cracked the Code to Machine Data At Sumo Logic, we’re experts in all things data. But, to make sure we weren’t biased, we partnered with 451 Research earlier this year to better understand how the industry is using machine data to improve overall customer experiences in today’s digital world. We found that 60 percent of enterprises are using machine data analytics using for business and customer insights, and to help support digital initiatives, usage and app performance. These unique findings have validated what we’ve been seeing within our own customer base over the past eight years — together, we can democratize machine data to make it easily accessible, understandable and beneficial to all teams within an organization. That’s why, as Ramin shared during the keynote, we’ve committed to hosting more meet-ups and global training and certification sessions, and providing more documentation, videos, Slack channels and other resources for our growing user base — all with the goal of ‘lluminating’ machine data for the masses, and to help customers win in today’s analytics economy. #2: Ask, and You Shall Receive Continued Platform Enhancements Day one was also a big day for some pretty significant platform enhancements and new solutions centered on three core areas: development, security and operations. The Sumo Logic dev and engineering teams have been hard at work, and have over 50 significant releases to show for it, all focused on meeting our customer’s evolving needs. Some of the newer releases on the Ops analytics side include Search Templates and Logs to Metrics. Search Templates empower non-technical users like customer support and product management, to leverage Sumo Logic’s powerful analytics without learning the query language. Logs to Metrics allow users to extract business KPIs from logs and cost-effectively convert them to high performance metrics for long-term trending and analysis. We’ve been hard at work on the security side of things as well, and are happy to announce the new cloud SIEM solution that’s going to take security analytics one step further. Our customers have been shouting from the rooftop for years that their traditional on-premises SIEM tool and rules-based correlation have let them down, and so they’ve been stuck straddling the line between old and new. With this entirely new, and first of its kind cloud SIEM solution, customers have a single, unified platform in the cloud, to help them meet their modern security needs. And we’re not done yet, there’s more to come. #3: Samsung SmartThings is Changing the World of Connected IoT Scott Vlaminck, co-founder and VP of engineering at Samsung SmartThings, shared his company’s vision for SmartThings to become the definitive platform for all IoT devices, in order to deliver the best possible smart home experience for their customers. And, as Scott said on stage, Sumo Logic helps make that possible by providing continuous intelligence of all operational, security and business data flowing across the SmartThings IoT platform, which receives about 200,000 requests per second a day! Scott talked about the company’s pervasive usage of the Sumo Logic platform, in which 95 percent of employees use Sumo Logic to report on KPIs, customer service, product insights, security metrics and app usage trends, and partner health metrics to drive deeper customer satisfaction. Having a fully integrated tool available to teams outside of traditional IT and DevOps is what continuous intelligence means for SmartThings. #4: Security is Everyone’s Responsibility at MLB When Neil Boland, the chief information security officer (CISO) for Major Baseball League took the stage, he shared how he and his security team are completely redefining what enterprise security means for a digital-first sports organization that has to manage, maintain and secure over 30 different leagues (which translates to 30 unique brands and 30 different attack vectors). Neil’s mission for 2018 is to blow up the traditional SIEM and MSSP models and reinvent them for his company’s 100 percent cloud-based initiatives. Neil’s biggest takeaway is that everyone at MLB is on the cybersecurity team, even non-technical groups like the help desk, and this shared responsibility helps strengthen overall security posture and continue to deliver uninterrupted sports entertainment to their fans. And Sumo Logic has been a force multiplier that helps Neil and his team achieve that collective goal. #5: Community, Community, Community Bringing the talk full circle, Ramin ended the keynote with a word about community, and how we are not only in it for our customers, but we’re in it with them, and we want to share data trends, usages, and best practices of the Sumo Logic platform with our ecosystem to provide benchmarking capabilities. That’s why today at Illuminate, we launched a new innovation — Global Intelligence Service — that focused on three key areas: Industry Insights, Community Insights and Data Science Insights. These insights will help customers extend machine learning and insights to new teams and use cases across the enterprise, and these are only possible with Sumo Logic’s cloud-native, multi-tenant architecture. For data-driven industry insights, check out Sumo Logic’s third annual ‘State of Modern Applications and DevSecOps in the Cloud’ report. You can read about our latest platform innovations in our press release, or check out the cloud SIEM solution and Global Intelligence Service blogs. Want the Day Two Recap? If you couldn’t join us live for day two of Illuminate, or were unable to catch the Facebook livestream, check out our second day keynote recap blog for the top highlights.

Blog

Announcing the Sumo Logic Global Intelligence Service at Illuminate 2018

In today’s hyper-connected world, a company’s differentiation is completely dependent upon delivering a better customer experience, at scale, and at a lower cost than the competition. This is no easy feat, and involves a combination of many things, particularly adopting new technologies and architectures, as well as making better use of data and analytics. Sumo Logic is committed to helping our customers excel in this challenging environment by making it easier to adopt the latest application architectures while also making the most of their precious data. The Power of the Platform As a multi-tenant cloud-native platform, Sumo Logic has a unique opportunity to provide context and data to our customers that is not available anywhere else. Why is this? First of all, when an enterprise wants to explore new architectures and evaluate options, it is very difficult to find broad overviews of industry trends based on real-time data rather than surveys or guesswork. Second, it is difficult to find reliable information about how exactly companies are using technologies at the implementation level, all the way down to the configurations and performance characteristics. Finally, once implemented, companies struggle to make the best use of the massive amount of machine data exhaust from their applications, particularly for non-traditional audiences like data scientists. It is with this backdrop in mind that Sumo Logic is announcing the Global Intelligence Service today during the keynote presentation at our second annual user conference, Illuminate, in Burlingame, Calif. This unprecedented initiative of data democratization is composed of three primary areas of innovation. Industry Insights — What Trends Should I be Watching? Sumo Logic is continuing to building on the success of its recently released third annual ‘State of Modern Applications and DevSecOps in the Cloud’ report to provide more real-time and actionable insights about industry trends. In order to stay on top of a constantly changing technology landscape, this report is expanding to include more frequent updates and instant-access options to help customers develop the right modern application or cloud migration strategy for their business, operational and security needs. Chart depicting clusters of AWS Services used frequently together Community Insights — What Are Companies like us, and Teams like ours, Doing? Sumo Logic is applying the power of machine learning to derive actionable insights for getting the most out of your technology investments. We have found that many engineering teams lack the right resources and education needed to make the best technology choices early on in their prototyping phases. And then, when the system is in production, it is often too late to make changes. That’s why Sumo Logic has an opportunity to save our customers pain and frustration by giving them benchmarking and comparison information when they most need it. We all like to think that our use cases are each a beautiful, unique snowflake. The reality is that, while each of us is unique, our uses of technology fall into some predictable clusters. So, looking over a customer base of thousands, Sumo Logic can infer patterns and best practices about how similar organizations are using technologies. Uses those patterns, we will be building recommendations and content for our customers that can be used to compare performance against a baseline of usage across their peers. Chart depicting how the performance behavior across customers tend to cluster Data Science Insights — Data Scientists Need Love, Too Data scientists are under more pressure than ever to deliver stunning results, while also getting pushback from society about the quality of their models and the biases that may or may not be there. At the end of the day, while data scientists have control over their models, they may have less control over the data. If the data is incomplete or biased in any way that can directly influence the results. To alleviate this issue, Sumo Logic is providing an open source integration with the industry standard Jupyter and Apache Zeppelin notebooks in order to make it easier for data scientists to leverage the treasure trove of knowledge currently buried in their application machine data. Empower the People who Power Modern Business You may still be wondering, why does all of this matter? At the end of the day, it is all about making our customers successful by making their people successful. A business is only as effective as the people who do the work, and it is our mission at Sumo Logic to empower those users to excel in their roles, which in return contributes to overall company growth and performance. And we also want to set users outside of the traditional IT, DevOps, and security teams up for success as well by making machine data analytics more accessible for them. So, don’t forget that you heard it here first: Democratizing machine data is all about empowering the people with love (and with unique machine data analytics and insights)! Additional Resources Download the 2018 ‘State of Modern Applications and DevSecOps in the Cloud’ report and/or read the press release for more detailed insights. Read the Sumo Logic platform enhancement release to learn more about our latest platform enhancements and innovations Sign up for Sumo Logic for free

Blog

Introducing Sumo Logic’s New Cloud SIEM Solution for Modern IT

Blog

Sumo Logic's Third Annual State of Modern Apps and DevSecOps in the Cloud Report is Here!

Blog

Why Cloud-Native is the Way to Go for Managing Modern Application Data

Are your on-premises analytics and security solutions failing you in today’s digital world? Don’t have the visibility you need across your full application stack? Unable to effectively monitor, troubleshoot and secure your microservices and multi-cloud architectures? If this sounds like your organization, then be sure to watch this short video explaining why a cloud-native, scalable and elastic machine data analytics platform approach is the right answer for building, running and securing your modern applications and cloud infrastructures. To learn more about how Sumo Logic is uniquely positioned to offer development, security and operations (DevSecOps) teams the right tools for their cloud environments, watch our Miles Ahead in the Cloud and DevOps Redemption videos, visit our website or sign up for Sumo Logic for free here. Video Transcription You’ve decided to run your business in the cloud. You chose this to leverage all the benefits the cloud enables – speed to rapidly scale your business; elasticity to handle the buying cycles of your customers; and the ability to offload data center management headaches to someone else so you can focus your time, energy and innovation on building a great customer experience. So, when you need insights into your app to monitor, troubleshoot or learn more about your customers, why would you choose a solution that doesn’t work the same way? Why would you manage your app with a tool that locks you into a peak support contract, one that’s not designed to handle the unpredictability of your data? Sumo Logic is a cloud-native, multi-tenant service that lets you monitor, troubleshoot, and secure your application with the same standards of scalability, elasticity and security you hold yourself to. Sumo Logic is built on a modern app stack for modern app stacks. Its scalable………….elastic…………resilient cloud architecture has the agility to move as fast as your app moves, quickly scaling up for data volume. Its advanced analytics based on machine learning are designed to cope with change. So, when that data volume spikes, Sumo Logic is there with the capacity and the answers you need. Sumo Logic is built with security as a 1st principle. That means security is baked in at the code level, and that the platform has the credentials and attestations you need to manage compliance for your industry. Sumo Logic’s security analytics and integrated threat intelligence also help you detect threats and breaches faster, with no additional costs. Sumo Logic delivers all this value in a single platform solution. No more swivel chair analytics to slow you down or impede your decision-making. You have one place to see and correlate the continuum of operations, security and customer experience analytics – this is what we call continuous intelligence for modern apps. So, don’t try to support your cloud app with a tool that was designed for the old, on-premise world, or a pretend cloud-tool. Leverage the intelligence solution that fully replicates what you’re doing with your own cloud-business — Sumo Logic, the industry leading, cloud-native, machine data analytics platform delivered to you as a service. Sumo Logic. Continuous Intelligence for Modern Applications.

Blog

Top Reasons Why You Should Get Sumo Logic Certified, Now!

Blog

How Our Customers Influence the Sumo Logic Product

Sumo Logic is no different than most companies — we are in the service of our customers and we seek to build a product that they love. As we continue to refine the Sumo Logic platform, we’re also refining our feedback loops. One of those feedback loops is internal dogfooding and learning how our own internal teams such as engineering, sales engineering and customer success, experience the newest feature. However, we know that that approach can be biased. Our second feedback loop is directly from our customers, whose thoughts are then aggregated, distilled and incorporated into the product. The UX research team focuses on partnering with external customers as well as internal Sumo Logic teams that regularly use our platform, to hear their feedback and ensure that the product development team takes these insights into account as they build new capabilities. Our Product Development Process Sumo Logic is a late-stage startup, which means that we’re in the age of scaling our processes to suit larger teams and to support new functions. The processes mentioned are in various stages of maturity, and we haven’t implemented all of these to a textbook level of perfection (yet!). Currently, there are two facets to the product development process. The first is the discovery side, for capabilities that are entirely new, while the second is focused on delivery and improving capabilities that currently exist in the product. The two sides run concurrently, as opposed to sequentially, with the discovery side influencing the delivery side. The teams supporting both sides are cross-functional in nature, consisting of engineers, product managers and product designers. Adapted from Jeff Patton & Associates Now that we’ve established the two aspects to product development, we’ll discuss how customer feedback fits into this. Customer feedback is critical to all product decisions at Sumo Logic. We seek out the opinions of our customers when the product development team has questions that need answers before they can proceed. Customer feedback usually manifests in two different categories: broad and granular. Broad Customer Questions The more high level questions typically come from the discovery side. For example, we may get a question like this: “should we build a metrics product?” For the teams focused on discovery, UX research starts with a clear hypothesis and is more open-ended and high level. It may consist of whiteboarding with our customers or observing their use cases in their workspaces. The insights from this research might spawn a new scrum team to build a capability, or the insights could indicate we should focus efforts elsewhere. Granular Customer Questions By contrast, UX research for delivery teams is much more focused. The team likely has designs or prototypes to illustrate the feature that they’re building, and their questions tend to focus on discoverability and usability. For instance, they may be wondering if customers can find which filters apply to which dashboard panels. The outcomes from this research give the team the necessary data to make decisions and proceed with design and development. Occasionally, the findings from the discovery side will influence what’s going on the delivery side. The UX Research Process at Sumo Logic The diagram below describes the milestones during our current UX research process, for both discovery and delivery teams. As a customer, the most interesting pieces of this are the Research Execution and the Report Presentation, as these include your involvement as well as how your input impacts the product. UX Research Execution Research execution takes a variety of forms, from on-site observation to surveys to design research with a prototype. As a customer, you’re invited to all types of research, and we are always interested in your thoughts. Our ideal participants are willing to share how they are using the Sumo Logic platform for their unique operational, security and business needs, and to voice candid opinions. Our participants are also all over the emotional spectrum, from delighted to irritated, and we welcome all types. The immediate product development team takes part in the research execution. For example, if we’re meeting with customers via video conference, we’ll invite engineers, product management and designers to observe research sessions. There’s a certain realness for the product development team when they see and hear a customer reacting to their work, and we’ve found that it increases empathy for our customers. This is very typical for our qualitative UX research sessions, and what you can expect as a participant. In the above clip, Dan Reichert, a Sumo Logic sales engineer, discusses his vision for a Data Allocation feature to manage ingest. Research Presentation After the UX research team has executed the research, we’ll collect all data, video, photos and notes. We’ll produce a report with the key and detailed insights from the research, and we’ll present the report to the immediate product development team. These report readouts tend to be conversational, with a lengthy discussion of the results, anecdotes and recommendations from the UX researcher. I’ve found that the teams are very interested in hearing specifics of how our customers are using the product, and how their efforts will influence that. After the report readout, the product development team will meet afterward to discuss how they’ll implement the feedback from the study. The UX researcher will also circulate the report to the larger product development team for awareness. The insights are often useful for other product development teams, and occasionally fill in knowledge gaps for them. How Can I Voice My Thoughts and Get Involved in UX Research at Sumo Logic? We’d love to hear how you’re using Sumo Logic, and your feedback for improvement. We have a recruiting website to collect the basics, as well as your specific interests within the product. Our UX research team looks forward to meeting you!

Blog

Understanding Sumo Logic Query Language Design Patterns

Blog

A Look Inside Being a Web UI Engineering Intern at Sumo Logic

Hello there! My name is Sam and this summer I’ve been an intern at Sumo Logic. In this post I’ll share my experience working on the web UI engineering team and what I learned from it. A year ago I started my Master of Computer Science degree at Vanderbilt University and since the program is only two years long, there’s only one internship slot before graduation. So I needed to find a good one. Like other students, I wanted the internship to prepare me for my future career by teaching me about work beyond just programming skills while also adding a reputable line to my resume. So after months of researching, applying, preparing and interviewing, I officially joined the Sumo Logic team in May. The Onboarding Experience The first day was primarily meeting a lot of new people, filling out paperwork, setting up my laptop and learning which snacks are best at the office (roasted almonds take the win). The first couple of weeks were a heads-down learning period. I was learning about the Sumo Logic machine data analytics platform — everything from why it is used and how it works to what it is built on. We also had meetings with team members who explained the technologies involved in the Sumo Logic application. In general, though, the onboarding process was fairly flexible and open ended, with a ton of opportunities to ask questions and learn. Specifically, I enjoyed watching React Courses as a part of my on boarding. In school I pay to learn this, but here I am the one being paid 🙂 Culture and Work Environment The culture and work environment are super nice and relaxed. The developers are given a lot of freedom in how and what they are working on, and the internship program is very adaptable. I was able to shape my role throughout the internship to focus on tasks and projects that were interesting to me. Of course, the team was very helpful in providing direction and answering my questions, but it was mostly up to me to decide what I would like to do. The phrase that I remember best was from my manager. On my second week at Sumo Logic he said: “You don’t have to contribute anything — the most important thing is for you to learn.” The thing that surprised me the most at Sumo Logic is how nice everyone is. This is probably the highest “niceness per person” ratio I’ve ever experienced in my life. Almost every single person I’ve met here is super friendly, humble, open minded and smart. These aspects of the culture helped me greatly. Summer Outside Sumo Logic One of the important factors in choosing a company for me was its location. I am from Moscow, Russia, and am currently living in Nashville while I attend Vanderbilt, but I knew that this summer I definitely wanted to find an internship in the heart of the tech industry — Silicon Valley. Lucky for me, Sumo Logic is conveniently located right in the middle of it in Redwood City. I also enjoyed going to San Francisco on weekends to explore the city, skateboarding to Stanford from my home and visiting my friend at Apple’s Worldwide Developers Conference (WWDC) in San Jose. I liked the SF Bay Area so much that I don’t want to work anywhere else in the foreseeable future! Actual Projects: What Did I Work On? The main project that I work on is a UI component library. As the company quickly grows, we strive to make the UI components more consistent — visually and written — in standard and the code more maintainable. We also want to simplify the communication about the UI between the Dev and Design teams. I was very excited about the future impact and benefit of this project for the company, and had asked the team join this effort. A cool thing about this library is that it is a collection of fresh and independent React components that will be then used by developers in creation of all parts of the Sumo Logic app. It is a pleasure to learn the best practices while working with cutting edge libraries like React. If that sounds interesting to you, check out this blog from one of my Sumo Logic colleagues on how to evaluate and implement react table alternatives into your project. Things I Learned That I Didn’t Know Before How professional development processes are structured How companies work, grow and evolve How large projects are organized and maintained How to communicate and work on a team What a web-scale application looks from the inside And, finally, how to develop high quality React components Final Reflection Overall, I feel like spending three months at Sumo Logic was one of the most valuable and educational experiences I’ve ever had. I received a huge return on investment of time and moved much closer to my future goals of gaining relevant software development knowledge and skills to set me up for a successful career post-graduation. Additional Resources Want to stay in touch with Sumo Logic? Follow & connect with us on Twitter, LinkedIn and Facebook for updates. If you want to learn more about our machine data analytics platform visit our “how it works” page!

Blog

Black Hat 2018 Buzzwords: What Was Hot in Security This Year?

It’s been a busy security year, with countless twists and turns, mergers, acquisitions and IPOs, and most of that happening in the lead up to one of the biggest security conferences of the year — Black Hat U.S.A. Each year, thousands of hackers, security practitioners, analysts, architects, executives/managers and engineers from varying industries and from all over the country (and world) descend on the desert lands of the Mandalay Bay Resort & Casino in Las Vegas for more than a week of trainings, educational sessions, networking and the good kind of hacking (especially if you stayed behind for DefCon26). Every Black Hat has its own flavor, and this year was no different. So what were some of the “buzzwords” floating around the show floor, sessions and networking areas? The Sumo Logic security team pulled together a list of the hottest, newest, and some old, but good terms that we overheard and observed during our time at Black Hat last week. Read on for more, including a recap of this year’s show trends. And the Buzzword is… APT — Short for advanced persistent threat Metasploit — Provides information about security vulnerabilities and used in pen testing Pen Testing (or Pentesting) — short for penetration testing. Used to discover security vulnerabilities OSINT — Short for open source intelligence technologies XSS — Short for cross site scripting, which is a type of attack commonly launched against web sites to bypass access controls White Hat — security slang for an “ethical” hacker Black Hat — a hacker who violates computer security for little reason beyond maliciousness or personal gain Red Team — Tests the security program (Blue Team) effectiveness by using techniques that hackers would use Blue Team — The defenders against Red Team efforts and real attackers Purple Team — Responsible for ensuring the maximum effectiveness of both the Red and Blue Teams Fuzzing or Fuzz Testing — Automated software that performs invalid, unexpected or random data as inputs to a computer program that is typically looking for structured content, i.e. first name, last name, etc. Blockchain — Widely used by cryptocurrencies to distribute expanding lists of records (blocks), such as transaction data, which are virtually “chained” together by cryptography. Because of their distributed and encrypted nature the blocks are resistant to modification of the data. SOC — Short for security operations center NOC — Short for network operations center Black Hat 2018 Themes There were also some pretty clear themes that bubbled to the top of this year’s show. Let’s dig into them. The Bigger, the Better….Maybe Walking the winding labyrinth that is the Mandalay Bay, you might have overheard conference attendees complaining that this year, Black Hat was bigger than in year’s past, and to accommodate for this, the show was more spread out. The business expo hall was divided between two rooms: a bigger “main” show floor (Shoreline), and a second, smaller overflow room (Oceanside), which featured companies new to the security game, startups or those not ready to spend big bucks on flashy booths. While it may have been a bit confusing or a nuisance for some to switch between halls, the fact that the conference is outgrowing its own space is a good sign that security is an important topic and more organizations are taking a vested interest in it. Cloud is the Name, Security is the Game One of the many themes at this year’s show was definitely all things cloud. Scanning the booths, you would have noticed terms around security in the cloud, how to secure the cloud, and similar messaging. Cloud has been around for a while, but seems to be having a moment in security, especially as new, agile cloud-native security players challenge some of the legacy on-premises vendors and security solutions that don’t scale well in a modern cloud, container or serverless environment. In fact, according to recent Sumo Logic research, 93 percent of responding enterprises face challenges with security tools in the cloud, and 49 percent state that existing legacy tools aren’t effective in the cloud. Roses are Red, Violets are Blue, FUD is Gone, Let’s Converge One of the biggest criticisms of security vendors (sometimes by other security vendors) is all of the language around fear, uncertainty and doubt (FUD). This year, it seems that many vendors have ditched the fearmongering and opted for collaboration instead. Walking the expo halls, there was a lot of language around “togetherness,” “collaboration” and the general positive sentiment that bringing people together to fight malicious actors is more helpful than going at it alone in siloed work streams. Everything was more blue this year. Usually, you see the typical FUD coloring: reds, oranges , yellows and blacks, and while there was still some of that, the conference felt brighter and more uplifting this year with purples, all shades of blues, bright greens, and surprisingly… pinks! There was also a ton of signage around converging development, security and operations teams (DevSecOps or SecOps) and messaging, again, that fosters an “in this together” mentality that creates visibility across functions and departments for deeper collaboration. Many vendors, including Sumo Logic have been focusing on security education, offering and promoting their security training, certification and educational courses to make sure security is a well-understood priority for stakeholders across all lines of the business. Our recent survey findings also validate the appetite for converging workflows, with 54 percent of respondents citing a greater need for cross-team collaboration (DevSecOps) to effectively investigate, prioritize and correlate threats for faster remediation. Three cheers for that! Sugar and Socks and Everything FREE Let’s talk swag. Now this trend is not entirely specific to Black Hat, but it seems each year, the booth swag gets sweeter (literally) with vendors offering doughnut walls, chocolates, popcorn and all sorts of tasty treats to reel people into conversation (and get those badge scans). There’s no shortage of socks either! Our friends at HackerOne were giving out some serious booth swag, and you better believe we weren’t headed home without grabbing some! Side note: Read the latest HackerOne blog or watch the latest SnapSecChat video to learn how our Sumo Logic security team has taken a DevSecOps approach to bug bounties that creates transparency and collaboration between hackers, developers, and external auditors to improve security posture. Sumo swag giveaways were in full swing at our booth, as well. We even raffled off a Vento drone for one lucky Black Hat winner to take home! Parting Thoughts As we part ways with 100 degree temps and step back into our neglected cubicles or offices this week, it’s always good to remember the why. Why do we go to Black Hat, DefCon, BSides, and even RSA? It’s more than socializing and partying, it’s to connect with our community, to learn from each other and to make the world a more secure and bette place for ourselves, and for our customers. And with that, we’ll see you next year! Additional Resources For the latest Sumo Logic cloud security analytics platform updates, features and capabilities, read the latest press release. Want to learn more about Sumo Logic security analytics and threat investigation capabilities? Visit our security solutions page. Interested in attending our user conference next month, Iluminate? Visit the webpage, or check out our latest “Top Five Reasons to Attend” blog for more information. Download and read our 2018 Global Security Trends in the Cloud report or the infographic for more insights on how the security and threat landscape is evolving in today’s modern IT environment of cloud, applications, containers and serverless computing.

Blog

Top Five Reasons to Attend Illuminate18

Last year Sumo Logic launched its first user conference, Illuminate. We hosted more than 300 fellow Sumo Logic users who spent two days getting certified, interacting with peers to share best practices and lots of mingling with Sumo’s technical experts (all while having fun). The result? Super engaged users with a new toolbox to take back to their teams to make the most of their Sumo Logic platform investment, and get the real-time operational and security insights needed to better manage and secure their modern applications and cloud infrastructures. Watch last year’s highlight reel below: This piece of feedback from one attendee sums up the true value of Illuminate: “In 48 hours I already have a roadmap of how to maximize the use of Sumo Logic at my company and got a green light from my boss to move forward.” — Sumo Logic Customer / Illuminate Attendee Power to the People This year’s theme for Illuminate is “Empowering the People Who Power Modern Business” and is expected to attract more than 500 attendees who will participate in a unique interactive experience including over 40 sessions, Ask the Expert bar, partner showcase and Birds of a Feather roundtables. Not enough to convince you to attend? Here are five more reasons: Get Certified – Back by popular demand, our multi-level certification program provides users with the knowledge, skills and competencies to harness the power of machine data analytics and maximize investments in the Sumo Logic platform. Bonus: we have a brand new Sumo Security certification available at Illuminate this year designed to teach users how to increase the velocity and accuracy of threat detection and strengthen overall security posture. Hear What Your Peers are Doing – Get inspired and learn directly from your peers like Major League Baseball, Genesys, USA TODAY NETWORK, Wag, Lending Tree, Samsung SmartThings, Informatica and more about how they implemented Sumo Logic and are using it to increase productivity, revenue, employee satisfaction, deliver the best customer experiences and more. You can read more about the keynote speaker line up in our latest press release. Technical Sessions…Lots of Them – This year we’ve broaden our breakout sessions into multiple tracks including Monitoring and Troubleshooting, Security Analytics, Customer Experience and Dev Talk covering tips, tricks and best practices for using Sumo Logic around topics including Kubernetes, DevSecOps, Metrics, Advanced Analytics, Privacy-by-Design and more. Ask the Experts – Get direct access to expert advice from Sumo Logic’s product and technical teams. Many of these folks will be presenting sessions throughout the event, but we’re also hosting an Ask the Expert bar where you can get all of your questions answered, see demos, get ideas for dashboards and queries, and see the latest Sumo Logic innovations. Explore the Modern App Ecosystem – Sumo Logic has a rich ecosystem of partners and we have a powerful set of joint integrations across the modern application stack to enhance the overall manageability and security for you. Stop by the Partner Pavilion to see how Sumo Logic works with AWS, Carbon Black, CrowdStrike, JFrog, LightStep, MongoDB, Okta, OneLogin, PagerDuty, Relus and more. By now you’re totally ready for the Illuminate experience, right? Check out the full conference agenda here. These two days will give you all of the tools you need (training, best practices, new ideas, peer-to-peer networking, access to Sumo’s technical experts and partners) so you can hit the ground running and maximize the value of the Sumo Logic platform for your organization. Register today, we look forward to seeing you there!

Blog

Get Miles Ahead of Security & Compliance Challenges in the Cloud with Sumo Logic

Blog

SnapSecChat: A DevSecOps Approach to Bug Bounties with Sumo Logic & HackerOne

Regardless of industry or size, all organizations need a solid security and vulnerability management plan. One of the best ways to harden your security posture is through penetration testing and inviting hackers to hit your environment to look for weak spots or holes in security. However, for today’s highly regulated, modern SaaS company, the traditional check-box compliance approach to pen testing is failing them because it’s slowing them down from innovating and scaling. That’s why Sumo Logic Chief Security Officer and his team have partnered with HackerOne to implement a modern bug bounty program that takes a DevSecOps approach. They’ve done this by building a collaborative community for developers, third-party auditors and hackers to interact and share information in an online portal that creates a transparent bug bounty program that uses compliance to strengthen security. By pushing the boundaries and breaking things, it collectively makes us stronger, and it also gives our auditors a peek inside the kimono and more confidence in our overall security posture. It also moves the rigid audit process into the DevSecOps workflow for faster and more effective results. To learn more about Sumo Logic’s modern bug bounty program, the benefits and overall positive impact it’s had on not just the security team, but all lines of the business, including external stakeholders like customers, partners and prospects, watch the latest SnapSecChat video series with Sumo Logic CSO, George Gerchow. And if you want to hear about the results of Sumo Logic’s four bounty challenge sprints, head on over to the HackerOne blog for more. If you enjoyed this video, then be sure to stay tuned for another one coming to a website near you soon! And don’t forget to follow George on Twitter at @GeorgeGerchow, and use the hashtag #SnapSecChat to join the security conversation! Stop by Sumo Logic’s booth (2009) at Black Hat this week Aug 8-9, 2018 at The Mandalay Bay in Las Vegas to chat with our experts and to learn more about our cloud security analytics and threat investigation capabilities. Happy hacking!

Blog

Building Replicated Stateful Systems using Kafka as a Commit Log

Blog

Employee Spotlight: A Dreamer with a Passion for Product Design & Mentoring

In this Sumo Logic Employee Spotlight we interview Rocio Lopez. A lover of numbers, Rocio graduated from Columbia University with a degree in economics, but certain circumstances forced her to forego a career in investment banking and instead begin freelancing until she found a new career that suited her talents and passions: product design. Intrigued? You should be! Read Rocio’s story below. She was a delight to interview! When Creativity Calls Q: So tell me, Rocio, what’s your story? Rocio Lopez (RL): I am a product designer at Sumo Logic and focus mostly on interaction design and prototyping new ideas that meet our customers’ needs. Q: Very cool! But, that’s not what you went to school for, was it? RL: No. I studied economics at Columbia. I wanted to be an investment banker. Ever since I was a little girl, I’ve been a nerd about numbers and I love math. Part of it was because I remember when the Peso was devalued and my mom could no longer afford to buy milk. I became obsessed with numbers and this inspired my college decision. But the culture and career path at Columbia was clear — you either went into consulting or investment banking. I spent a summer shadowing at Citigroup (this was during the height of the financial crisis), and although my passion was there, I had to turn down a career in finance because I was here undocumented. Q: That’s tough. So what did you do instead? RL: When I graduated in 2011, I started doing the things I knew how to do well like using Adobe Photoshop and InDesign to do marketing for a real estate company or even doing telemarketing. I eventually landed a gig designing a database for a company called Keller Williams. They hired an engineer to code the database, but there was no designer around to think through the customer experience so I jumped in. Q: So that’s the job that got you interested in product design? RL: Yes. And then I spent a few years at Cisco in the marketing organization where they needed help revamping their training platforms. I started doing product design without even knowing what it was until a lead engineer called it out. I continued doing small design projects, started freelancing and exploring on my own until I connected with my current manager, Daniel Castro. He was hiring for a senior role, and while I was not that senior, the culture of the team drew me in. Q: Can you expand on that? RL: Sure. The design team at Sumo Logic is very unique. I’ve spent about seven years total in the industry and what I’ve been most impressed by is the design culture here, and the level of trust and level-headedness the team has. I’ve never come across this before. You would think that because we’re designing an enterprise product that everyone would be very serious and buckled up, but it’s the opposite. The Life of a Dreamer Q: Let’s switch gears here. I heard you on NPR one morning, before I even started working at Sumo Logic. Tell me about being a dreamer. RL: People come to the U.S. undocumented because they don’t know of other ways to come legally or the available paths for a visa aren’t a match for them because they may not have the right skills. And those people bring their families. I fell into that category. I was born in Mexico but my parents came over to the U.S. seeking a better life after the Tequila crisis. I grew up in Silicon Valley and went to school like any other American kid. When Barack Obama was in office, he created an executive order known as the Deferred Action for Childhood Arrivals (DACA) program, since Congress has failed to passed legislative action since 2001. To qualify for the program, applicants had to have arrived in the U.S. before age 16 since June 15, 2007 and pass a rigorous background check by homeland security every two years. . I fell into this category and was able to register in this program. Because most of the immigrants are young children who were brought here at a very young age, we’ve sort of been nicknamed “dreamers” after the 2001 DREAM Act (short for Development, Relief and Education for Alien Minors Act). Q: And under DACA you’ve been able to apply for a work permit? RL: That’s right. I have a work permit, I pay income taxes, and I was able to attend college just like a U.S. citizen, although I am still considered undocumented and that comes with certain limitations. For instance, my employer cannot sponsor me and I cannot travel outside the United States. The hope was that Congress would create a path for citizenship for Dreamers, but now that future is a bit uncertain after they failed to meet the deadline to pass a bill in March. For now I have to wait until the Supreme Court rules the constitutionality of DACA to figure out my future plans. Q: I can only imagine how difficult this is to live with. What’s helped you through it? RL: At first I was a big advocate, but now I try to block it out and live in the present moment. And the opportunity to join the Sumo Logic design team came at the right time in my life. I can’t believe what I do every day is considered work. The team has a very unique way of nurturing talent and it’s something I wish more companies would do. Our team leaders make sure we have fun in addition to getting our work done. We usually do team challenges, dress up days, etc. that really bring us all together to make us feel comfortable, encourage continued growth, and inspire us to feel comfortable speaking up with new ideas. I feel like the work I am doing has value and is meaningful, and we are at the positive end of the “data conversation.” I read the news and see the conversations taking place with companies like Facebook and Airbnb that are collecting our personal data. It’s scary to think about. And it feels good to be on the other side of the conversation; on the good side of data and that’s what gets me excited and motivated. Sumo Logic is collecting data and encrypting it and because we’re not on the consumer-facing side, we can control the lens of how people see that data. We can control not only the way our customers collect data but also how they parse and visualize it. I feel we’re at the cusp of a big industry topic that’s going to break in the next few years. Q: I take it you’re not on social media? RL: No. I am completely off Facebook and other social media platforms. When I joined Sumo Logic, I became more cautious of who I was giving my personal data to. Advice for Breaking into Design & Tech? Q: Good for you! So what advice to you have for people thinking of switching careers? RL: From 2011 to now I’ve gone through big career changes. There are a lot of people out there that need to understand how the market is shifting, that some industries like manufacturing, are not coming back, and that requires an adaptive mindset. The money and opportunity is where technology and data are and if people can’t transition to these new careers in some capacity, they’re going to be left out of the economy and will continue to have problems adjusting. It’s a harsh reality, but we have to be able to make these transitions because in 15 or 20 years from now, the world will look very different. I’ve been very active in mentoring people that want to break into technology but aren’t sure how. Q: What’s some of the specific advice related to a career path in UX/design that you give your mentees? RL: Sometimes you have to breakaway from traditions like school or doing a masters program and prioritize the job experience. Design and engineering are about showing you’ve done something, showing a portfolio. If you can change your mindset to this, you will be able to make the transition more smoothly. I also want to reiterate that as people are looking for jobs or next careers, it’s important to find that place that is fun and exciting. A place where you feel comfortable and can be yourself and also continue to grow and learn. Find meaning, find value, and find the good weird that makes you successful AND happy. Stay in Touch Stay in touch with Sumo Logic & connect with us on Twitter, LinkedIn and Facebook for updates. Want to work here? We’re hiring! Check out our careers page to join the team. If you want to learn more about our machine data analytics platform visit our “how it works” page!

August 1, 2018

Blog

Postmortems Considered Beautiful

Outages and postmortems are a fact of life for any software engineer responsible for managing a complex system. And it can be safely said that those two words – “outage” and “postmortem,” do not carry any positive connotations in the remotest sense of the word. In fact, they are generally dreaded by most engineers. While that sentiment is understandable given the direct impact of such incidents on customers and the accompanying disruption, our individual perspective matters a lot here as well. If we are able to look beyond the damage caused by such incidents, we might just realize that outages and postmortems shouldn’t be “dreaded,” but instead, wholeheartedly embraced. One has to only try, and the negative vibes associated with these incidents may quickly give way to an appreciation of the complexity in modern big data systems. The Accidental Harmony of Layered Failures As cliche as it may sound, “beauty” indeed lies in the eyes of the beholder. And one of the most beautiful things about an outage/postmortem is the spectacular way in which modern big data applications often blow up. When they fail, there are often dozens of things that fail simultaneously, all of which collude, resulting in an outage. This accidental harmony among failures and the dissonance among the guards and defenses put in place by engineers, is a constant feature of such incidents and is always something to marvel at. It’s almost as if the resonance frequencies of various failure conditions match, thereby amplifying the overall impact. What’s even more surprising is the way in which failures at multiple layers can collude. For example, it might so happen that an outage-inducing bug is missed by unit tests due to missing test cases, or even worse, a bug in the tests! Integration tests in staging environments may have again failed to catch the bug, either due to a missing test case or disparity in the workload/configuration of staging/production environments. There could also be misses in monitoring/alerting, resulting in increased MTTIs. Similarly, there may be avoidable process gaps in the outage handling procedure itself. For example, some on-calls may have too high of an escalation timeout for pages or may have failed to update their phone numbers in the pager service when traveling abroad (yup, that happens too!). Sometimes, the tests are perfect, and they even catch the error in staging environments, but due to a lack of communication among teams, the buggy version accidentally gets upgraded to production. Outages are Like Deterministic Chaos In some sense, these outages can also be compared to “deterministic chaos” caused by an otherwise harmless trigger that manages to pierce through multiple levels of defenses. To top it off, there are always people involved at some level in managing such systems, so the possibility of a mundane human error is never too far away. All in all, every single outage can be considered as a potential case study of cascading failures and their layered harmony. An Intellectual Journey Another very deeply satisfying aspect of an outage/postmortem is the intellectual journey from “how did that happen?” to “that happened exactly because X, Y, Z.” Even at the system level, it’s necessary to disentangle the various interactions and hidden dependencies, discover unstated assumptions and dig through multiple layers of “why’s” to make sense of it all. When properly done, root cause analysis for outages of even moderately complex systems, demand a certain level of tenacity and perseverance, and the fruits of such labor can be a worthwhile pursuit in and of itself. There is a certain joy in putting the pieces of a puzzle together, and outages/postmortems present us exactly with that opportunity. Besides the above intangibles, outages and their subsequent postmortems have other very tangible benefits. They not only help develop operational knowledge, but also provide a focused path (within the scope of the outage) to learn about the nitty-gritty details of the system. At the managerial level too, they can act as road signs for course correction and help get the priorities right. Of course, none of the above is an excuse to have more outages and postmortems! We should always strive to build reliable, fault-tolerant systems to minimize such incidents, but when they do happen, we should take them in stride, and try to appreciate the complexity of the software systems all around us. Love thy outages. Love thy postmortems. Stay in Touch Want to stay in touch with Sumo Logic? Follow & connect with us on Twitter, LinkedIn and Facebook for updates. Visit our website to learn more about our machine data analytics platform and be sure to check back on the blog for more posts like this one if you enjoyed what you read!

Blog

11 New Google Cloud Platform (GCP) Apps for Continued Multi-Cloud Support

Blog

Sumo Smash Bros: What Creating a Video Game Taught Us About the Power of Data

As a longtime DevOps engineer with a passion for gaming and creating things, I truly believe that in order to present data correctly, you must first understand the utility of a tool without getting hung up on the output (data). To understand why this matters, I’ll use Sumo Logic’s machine data analytics platform as an example. With a better understanding of how our platform works, you’ll be able to turn seemingly disparate data into valuable security, operational or business insights that directly service your organization’s specific needs and goals. The Beginning of Something Great Last year, I was sitting with some colleagues at lunch and suggested that it would be super cool to have a video game at our trade show booth. We all agreed it was a great idea, and what started as a personal at-home project turned into a journey to extract game data, and present it in a compelling and instructive manner. The following is how this simple idea unfolded over time and what we learned as a result. Super Smash Bros Meets Sumo Logic The overall idea was solid, however, after looking at emulators and doing hours of research (outside of office hours), I concluded that it was a lot harder to extract data from an old school arcade game even working with an emulator. My only path forward would be to use a cheat engine to read memory addresses, and all the work would be done in Assembly, which is a low-level ‘80s era programming language. It’s so retro that the documentation was nearly impossible to find and I again found myself at another impasse. Another colleague of mine who is a board game aficionado, suggested I find an open source game online that I could add code to myself in order to extract data. Before I started my search, I set some parameters. What I was looking for was a game that had the following characteristics. It should be multiplayer It would ideally produce different types of data It would manifest multiple win conditions: game and social Enter Super Smash Bros (SSB), which met all of the above criteria. If you are not familiar with this game, it’s originally owned/produced by Nintendo and the appeal is that you and up to three other players battle each other in “King of the Hill” until there is an “official” game winner. It helps to damage your opponent first before throwing them off the hill. The game win condition is whoever has the most number of lives when the game ends, wins. And the game ends when either time runs out or only one player has lives left. However, this leaves holes for friends to argue who actually won. If you’ve ever played this game (which is one of strategy), there is a second kind of condition — a social win condition. You can “officially” win by the game rules but there’s context attached to “how” you won — a social win. Creating Sumo Smash Bros I found an open source clone of Super Smash Bros written in Javascript which runs entirely in a web browser. It was perfect. Javascript is a simple language and with the help of a friend to get started, we made it so we could group log messages that would go to the console where a developer could access it and then send it directly into the Sumo Logic platform. PRO TIP: If you want game controllers for an online video game like Super Smash Bros, use Xbox controllers not Nintendo! We would record certain actions in the code, such as: When a player’s animation changed What move a player performed Who hit who, when and for how much What each players’ lives were For example, an animation change would be whenever a player was punched by an opponent. Now by the game standards, very limited data determines who is the “official” winner of the game based on the predetermined rules, but with this stream of data now flowing into Sumo Logic, we could also identify the contextual “social win” and determine if and how they were different from the game rules. Here’s an example of a “social” win condition: Imagine there’s a group of four playing the game, and one of the players (player 1) hangs back avoiding brawls until two of the three opponents are out of lives. Player 1 jumps into action, gets a lucky punch on the other remaining player who has thus far dominated (and who is really damaged) and throws him from the ring to take the “official” game win. Testing the Theory When we actually played, the data showed exactly what I had predicted. First, some quick background on my opponents: Jason E. (AKA Jiggles) — He admits to having spent a good portion of his youth playing SSB, and he may have actually played in tournaments. Michael H. (AKA Killer) — He’s my partner in crime. We’re constantly cooking up crazy ideas to try out, both in and outside of work He also had plenty of experience with the game. Mikhail M. (AKA MM$$DOLLAB) — He has always been a big talker. He too played a lot, and talked a big talk. Originally I had intended for us to pseudo choreograph the game to get the data to come out “how I wanted” in order to show that while the game awarded a “winner” title to one player, the “actual winner” would be awarded by the friends to the player who “did the most damage to others” or some other parameter. It only took about three nanoseconds before the plan was out the window and we were fighting for the top. Our colleague Jason got the clear technical game win. We had recorded the game and had the additional streams of data, and when the dust had settled, a very different story emerged. For instance, Jason came in third place for our social win parameter of “damage dealt.” Watching the recording, it’s clear that Jason’s strategy was to avoid fighting until the end. When brawls happened, he was actively jumping around but rarely engaged with the other players. He instead waited for singled-out attacks. Smart, right? Maybe. We did give him the “game win,” however, based on the “damage dealt” social win rule, the order was: Michael, myself, then Jason, and Mikhail. Watch what happened for yourself: What’s the Bigger Picture? While this was a fun experiment, there’s also an important takeaway. At Sumo Logic, we ingest more than 100 terabytes of data each day — that’s the equivalent of data from about 200 Libraries of Congress per second. That data comes from all over — it’s a mix of log, event, metrics, security data coming not just from within an organization’s applications and infrastructure, but also from third party vendors. When you have more information, you can see trends and patterns, make inferences, technical and business decisions — you gain an entirely new level of understanding beyond the 1s and 0s staring back at you on a computer screen. People also appreciate the data for different reasons. For example, engineers only care that the website they served you is the exact page you clicked on. They don’t care if you searched for hats or dog food or sunscreen. But marketers care, a lot. Marketers care about your buying decisions and patterns and they use that to inform strong, effective digital marketing campaigns to serve you relevant content. At Sumo Logic, we don’t want our customers or prospects to get hung up on the data, we want them to look past that to first understand what our tool does, to understand how it can help them get the specific data they need to solve a unique problem or use case. “In the words of Sherlock Holmes, it’s a capital mistake to theorize before one has data.” — Kenneth Barry, Sumo Logic The types of data you are ingesting and analyzing only matters if you first understand your end goal, and have the proper tools in place — a means to an end. From there, you can extract and make sense of the data in ways that matter to your business, and each use case varies from one customer to another. Data powers our modern businesses and at Sumo Logic, we empower those who use this data. And we make sure to have fun along the way! Bonus: Behind the Scenes Video Q&A with Kenneth Additional Resources Visit our website to learn more about the power of machine data analytics and to download Sumo Logic for free to try it out for yourself Read our 2018 State of Modern Applications in the Cloud report Register to attend Illuminate, our annual user conference taking place Sept. 12-13, 2018 in Burlingame, Calif.

Blog

A Primer on Building a Monitoring Strategy for Amazon RDS

In a previous blog post, we talked about Amazon Relational Database Service (RDS). RDS is one of the most popular cloud-based database services today and extensively used by Amazon Web Services (AWS) customers for its ease of use, cost-effectiveness and simple administration. Although as a managed service, RDS doesn’t require database administrators (DBAs) to do many of the day-to-day tasks, it still needs to be monitored for performance and availability. That’s because Amazon doesn’t auto-tune any database performance — this is a shared responsibility of the customer. That’s why there should be a monitoring strategy and processes in place for DBAs and operation teams to keep an eye on their RDS fleet. In this blog post, we will talk about an overall best-practice approach for doing this. Why Database Monitoring Keeping a database monitoring regimen in place, no matter how simple, can help address potential issues proactively before they become incidents, and cost additional time and money. Most AWS infrastructure teams typically have decent monitoring in place for different types of resources like EC2, ELB, Auto Scaling Groups, Logs, etc. Database monitoring often comes at a later stage or is ignored altogether. With RDS, it’s also easy to overlook due to the low-administration nature of the service. The DBA or the infrastructure managers should therefore invest some time in formulating and implementing a database monitoring policy. Please note that designing an overall monitoring strategy is an involved process and is not just about defining database counters to monitor. It also includes areas like: Service Level Agreement Classifying incident types (Critical, Serious, Moderate, Low etc.) Creating RACI (Responsible, Accountable, Consulted, Informed) matrix Defining escalation paths etc.. A detailed discussion of all these topics is beyond the scope of this article, so we will concentrate on the technical part only. What to Monitor Database monitoring, or RDS monitoring in this case, is not about monitoring only database performance. A monitoring strategy should include the following broad categories and their components: Monitoring category Examples of what to monitor Availability Is the RDS instance or cluster endpoint accessible from client tools? Is there any instance stopping, starting, failed over or being deleted? Is there a failover of multi-AZ instances? Recoverability Is the RDS instance being backed up – both automatically and manually? Are individual databases being backed up successfully? Health and Performance What’s the CPU, memory and disk space currently in use? What’s the query latency? What’s the disk read/write latency? What’s the disk queue length? How many database connections are active? Are there any blocking and waiting tasks? Are there any errors or warnings reported in database log files? Are these related to application queries? Are they related to non-optimal configuration values? Are any of the scheduled jobs failing? Manageability Are there any changes in the RDS instances’ Tags Security groups Instance properties Parameter and option groups? Who made those changes and when? Security Which users are connecting to the database instance? What queries are they running? Cost How much each RDS instance is costing every month? While many of these things can be monitored directly in AWS, Sumo Logic can greatly help with understanding all of the logs and metrics that RDS produces. In this article, we will talk about what AWS offers for monitoring RDS. As we go along, we will point out where we think Sumo Logic can make the work easier. Monitoring Amazon CloudWatch You can start monitoring RDS using metrics from Amazon CloudWatch. Amazon RDS, like any other AWS service, exposes a number of metrics which are available through CloudWatch. There are three ways to access these metrics: From AWS Console Using AWS CLI Using REST APIs The image below shows some of these metrics from the RDS console: Amazon CloudWatch shows two types of RDS metrics: Built-in Metrics Enhanced Monitoring Metrics Built-in Metrics These metrics are available from any RDS instance. They are collected from the hypervisor of the host running the RDS virtual machine. Some of the metrics may not be available for all database engines, but the important ones are common. It is recommended the following RDS metrics are monitored from CloudWatch Metric What it means Why you should monitor it CPUUtilization % CPU load in the RDS instance. A consistent high value means one or more processes are waiting for CPU time while one or more processes are blocking it. DiskQueueDepth The number of input and output requests waiting for the disk resource. A consistent high value means disk resource contention – perhaps due to locking, long running update queries etc. DatabaseConnections The number of database connections against the RDS instance. A sudden spike should be investigated immediately. It may not mean a DDOS attack, but a possible issue with the application generating multiple connections per request. FreeableMemory The amount of RAM available in the RDS instance, expressed in bytes. A very low value means the instance is under memory pressure. FreeStorageSpace Amount of disk storage available in bytes. A small value means disk space is running out. ReadIOPS The average number of disk read operations per second. Should be monitored for sudden spikes. Can mean runaway queries. WriteIOPS The average number of disk write operations per second. Should be monitored for sudden spikes. Can mean a very large data modification ReadLatency The average time in milliseconds to perform a read operation from the disk. A higher value may mean a slow disk operation, probably caused by locking. WriteLatency The average time in milliseconds to perform a write operation to disk. A higher value may means disk contention. ReplicaLag How far in time, the read replica of MySQL, MariaDB or PostgreSQL instance is lagging behind from its master A high lag value can means read operations from replica is not serving the current data. Amazon RDS Aurora engine also exposes some extra counters which are really useful for troubleshooting. At the time of writing, Aurora supports MySQL and PostgreSQL only. We recommend monitoring these counters: Metric What it means Why you should monitor it DDLLatency The average time in milliseconds to complete Data Definition Language (DDL) commands like CREATE, DROP, ALTER etc. A high value means the database is having performance issues running DDL commands. This can be due to exclusive locks on objects. SelectLatency The average time in milliseconds to complete SELECT queries. A high value may mean disk contention, poorly written queries, missing indexes etc. InsertLatency The average time in milliseconds to complete INSERT commands. A high value may mean locking or poorly written INSERT command. DeleteLatency The average time in milliseconds to complete DELETE commands. A high value may mean locking or poorly written DELETE command. UpdateLatency The average time in milliseconds to complete UPDATE commands. A high value may mean locking or poorly written UPDATE command. Deadlocks The average number of deadlocks happening per second in the database. More than 0 should be a concern – it means the application queries are running in such a way that they are blocking each other frequently. BufferCacheHitRatio The percentage of queries that can be served by data already stored in memory It should be a high value, near 100, meaning queries are don’t have to access disk for fetching data. Queries The average number of queries executed per second This should have a steady, average value. Any sudden spike or dip should be investigated. You can use the AWS documentation for a complete list of built-in RDS metrics. Enhanced Monitoring Metrics RDS also exposes “enhanced monitoring metrics.” These are collected by agents running on the RDS instances’ operating system. Enhanced monitoring can be enabled when an instance is first created or it can be enabled later. It is recommended enabling it because it offers a better view of the database engine. Like built-in metrics, enhanced metrics are available from the RDS console. Unlike built-in metrics though, enhanced metrics are not readily accessible from CloudWatch Metrics console. When enhanced monitoring is enabled, CloudWatch creates a log group called RDSOSMetrics in CloudWatch Logs: Under this log group, there will be a log stream for each RDS instance with enhanced monitoring. Each log stream will contain a series of JSON documents as records. Each JSON document will show a series of metrics collected at regular intervals (by default every minute). Here is a sample excerpt from one such JSON document: { “engine”: “Aurora”, “instanceID”: “prodataskills-mariadb”, “instanceResourceID”: “db-W4JYUYWNNIV7T2NDKTV6WJSIXU”, “timestamp”: “2018-06-23T11:50:27Z”, “version”: 1, “uptime”: “2 days, 1:31:19”, “numVCPUs”: 2, “cpuUtilization”: { “guest”: 0, “irq”: 0.01, “system”: 1.72, “wait”: 0.27, “idle”: 95.88, “user”: 1.91, “total”: 4.11, “steal”: 0.2, “nice”: 0 },…… It’s possible to create custom CloudWatch metrics from these logs and view those metrics from CloudWatch console. This will require some extra work. However, both built-in and enhanced metrics can be streamed to Sumo Logic from where you can build your own charts and alarms. Regardless of platform, it is recommended to monitor the enhanced metrics for a more complete view of the RDS database engine. The following counters should be monitored for Amazon Aurora, MySQL, MariaDB, PostgreSQL, or Oracle: Metric Group Metric What it means and why you should monitor cpuUtilization user % of CPU used by user processes.

AWS

July 17, 2018

Blog

Comparing AWS Data Warehouse & Analytics Services — Migrating to AWS Part 3

AWS Data Warehouse and Analytics Services In this final article of our three-part blog series, we will introduce you to two popular data services from Amazon Web Services (AWS): Redshift and Elastic Map Reduce (EMR). These services are ideal for AWS customers to store large volumes of structured, semi-structured or unstructured data and query them quickly. Amazon Redshift Amazon Redshift is a fully-managed data warehouse platform from AWS. Customers can store large volumes of structured, relational datasets in Redshift tables and run analytical workloads on those tables. This can be an ideal solution for processing and summarizing high-volume sales, clickstream or other large datasets. Although you can create data warehouses in RDS, Redshift would be a better choice for the following reasons: Amazon Redshift has been created as a Massively Parallel Processing (MPP) data warehouse from ground-up. This means data is distributed to more than one node in a Redshift cluster (although you can create one-node clusters too). Redshift uses the combined power of all the computers in a cluster to process this data in a fast and efficient manner. A Redshift cluster can be scaled up from a few gigabytes to more than a petabyte. That’s not possible with RDS. With RDS, you can create a single, large instance with multi-AZ deployment and one or more read-replicas. The read-replicas can help increase read performance and the multi-AZ secondary node will keep database online during failovers, but the actual data processing still happens in one node only. With Redshift, it’s not uncommon to see 50 to 100-node clusters, all the nodes taking part in data storage and processing. The storage space in Redshift can be used more efficiently than RDS with suitable column encoding and data distribution styles. With proper column encoding and data distribution, Redshift can squeeze large amounts of data in fewer data pages, thereby dramatically reducing the table sizes. Also, a Redshift data page in 2 MB, compared to typical 8 KB of a relational database. This also helps storing larger amounts of data per page and increases read performance. Amazon Redshift offers a number of ways to monitor cluster and query performance. It’s simple to see each individual running query and its query plan from Redshift console. It’s also very easy to see how much resource a running query is consuming. This feature is not readily available in RDS yet. Finally, Redshift offers a way to prioritize different types of analytic workloads in the cluster. This allows specific types of data operations to have more priority than others. This also ensures any single query or data load doesn’t bring down the entire system. This prioritization is made possible with the Workload Management (WLM) configuration. With WLM, administrators can assign groups of similar queries to different workload queues. Each queue is then assigned a portion of the cluster’s resources. When a query running in a queue uses up all its resources or reaches the concurrency limit, it must wait. Meanwhile, unblocked queries in other queues can still run. Use Cases Data warehouse hosting very large amount of data Part of an enterprise data lake Elastic MapReduce (EMR) Amazon Elastic MapReduce (EMR) is AWS’ managed Hadoop environment in the cloud. We have already seen some of the managed systems like RDS, DynamoDB or Redshift, and EMR is no different. Like RDS, customers can spin up Apache Hadoop clusters in EMR by selecting a few options in a series of wizard-like screens. Anyone with experience manually installing a multi-node Hadoop cluster would appreciate the time and effort it takes to install all the prerequisites, the core software, any additional components and finalize any configuration. With EMR, all this is done behind-the-scenes, so users don’t need to worry. EMR also has the ability to make its clusters “transient.” This means an EMR cluster doesn’t have to run when it’s not needed. A cluster can be spun up, made to process data in one or more series of “steps” and then spun down. The results of the processing can be written to S3 for later consumption. Traditional Hadoop installations are quite monolithic in nature with sometimes hundreds of nodes sitting idle when no jobs are running. With EMR, this waste can be minimized. Finally, EMR adds a new type of file system for Hadoop: the EMR File System. EMRFS extends Amazon S3 as the file system for the Hadoop cluster. With EMRFS, data in a cluster is not lost when it’s terminated. Use Cases Any processing workload requiring a Hadoop back-end (e.g. Hive, HBase, Pig, Sqoop etc.) Enterprise data lakes Conclusion In this three-part blog series we had a brief introduction to some of the most commonly used AWS services. The storage, database and analytics services have evolved over time and have become more robust and scalable as customers have tested them with a multitude of use cases. The following table shows a “cheat sheet” of the various AWS technologies, their core functions and where you would implement each. Take a look: AWS Technology What is it? Where do you use it? Simple Storage Service A highly available and durable file system Static website hosting Backup location Log file storage Data pipeline source and destination Amazon Glacier A file system for long term data storage Backup archival Critical file archiving Relational Database Service A fully managed database service for Oracle, Microsoft SQL Server, Aurora, PostgreSQL, MySQL, MariaDB, etc. Web-site backend for content management systems Enterprise application backend DynamoDB A fully managed NoSQL database User preferences Clickstream data Games, IoT data Elastic Compute Cloud with Elastic Block Storage A virtual host with attached storage Hosting databases of any kind Elastic Compute Cloud with Elastic File System A virtual host with a mounted file system Data analytics Media server Amazon Redshift A petabyte scale data warehouse Enterprise data warehouse Part of data lake Amazon ElastiCache High performance in-memory database (Redis or memcached) Mobile games IoT applications Elastic MapReduce A fully managed Hadoop environment Any application that requires a Hadoop back-end Apache Spark backend Part of data lake There are also a number of auxiliary services that work as “glue” between these primary services. These auxiliary services include Amazon Data Migration Service (DMS), ElasticSearch, Data Pipeline, AWS Glue, Athena, Kinesis or Lambda. Using these tools, customers can build complex data pipelines with relative ease. These tools are also serverless, which means they can scale up or down automatically as needed. Also, please note that we have not provided any pricing details for any of the services we discussed, nether did we talk about EC2 or RDS instance classes or their capacities. That’s because pricing varies over time and also differs between regions and Amazon brings out new classes of servers at regular intervals. Additional Resources Comparing AWS S3 and Glacier Data Storage Services – Migrating to AWS Part 1 Comparing RDS, DynamoDB & Other Popular Database Services – Migrating to AWS Part 2 AWS 101: An Overview of Amazon Web Services

AWS

July 12, 2018

Blog

What is Blockchain, Anyway? And What Are the Biggest Use Cases?

Everyone’s talking about blockchain these days. In fact, there is so much hype about blockchains — and there are so many grand ideas related to them — that it’s hard not to wonder whether everyone who is excited about blockchains understands what a blockchain actually is. If, amidst all this blockchain hype, you’re asking yourself “what is blockchain, anyway?” then this article is for you. It defines what blockchain is and explains what it can and can’t do. Blockchain Is a Database Architecture In the most basic sense, blockchain is a particular database architecture. In other words, like any other type of database architecture (relational databases, NoSQL and the like), a blockchain is a way to structure and store digital information. (The caveat to note here is that some blockchains now make it possible to distribute compute resources in addition to data. For more on that, see below.) What Makes Blockchain Special? If blockchain is just another type of database, why are people so excited about it? The reason is that a blockchain has special features that other types of database architectures lack. They include: Maximum data distribution. On a blockchain, data is distributed across hundreds of thousands of nodes. While other types of databases are sometimes deployed using clusters of multiple servers, this is not a strict requirement. A blockchain by definition involves a widely distributed network of nodes for hosting data. Decentralization. Each of the nodes on a blockchain is controlled by a separate party. As a result, the blockchain database as a whole is decentralized. No single person or group controls it, and no single group or person can modify it. Instead, changes to the data require network consensus. Immutability. In most cases, the protocols that define how you can read and write data to a blockchain make it impossible to erase or modify data once it has been written. As a result, data stored on a blockchain is immutable. You can add data, but you can’t change what already exists. (We should note that while data immutability is a feature of the major blockchains that have been created to date, it’s not strictly the case that blockchain data is always immutable.) Beyond Data As blockchains have evolved over the past few years, some blockchain architectures have grown to include more than a way to distribute data across a decentralized network. They also make it possible to share compute resources. The Ethereum blockchain does this, for example, although Bitcoin—the first and best-known blockchain—was designed only for recording data, not sharing compute resources. If your blockchain provides access to compute resources as well as data, it becomes possible to execute code directly on the blockchain. In that case, the blockchain starts to look more like a decentralized computer than just a decentralized database. Blockchains and Smart Contracts Another buzzword that comes up frequently when discussing what defines a blockchain is a smart contract. A smart contract is code that causes a specific action to happen automatically when a certain condition is met. The code is executed on the blockchain, and the results are recorded there. This may not sound very innovative, but there are some key benefits and use cases. Any application could incorporate code that makes a certain outcome conditional upon a certain circumstance. If-this-then-that code stanzas are not really a big deal. What makes a smart contract different from a typical software conditional statement, however, is that because the smart contract is executed on a decentralized network of computers, no one can modify its outcomes. This feature differentiates smart contracts from conditional statements in traditional applications, where the application is controlled by a single, central authority, which has the power to modify it. Smart contracts are useful for governing things like payment transactions. If you want to ensure that a seller does not receive payment for an item until the buyer receives the item, you could write a smart contract to make that happen automatically, without relying on third-party oversight. Limitations of Blockchains By enabling complete data decentralization and smart contracts, blockchains make it possible to do a lot of interesting things that you could not do with traditional infrastructure. However, it’s important to note that blockchains are not magic. Most blockchains currently have several notable limitations. Transactions are not instantaneous. Bitcoin transactions take surprisingly long to complete, for example. Access control is complicated. On most blockchains, all data is publicly accessible. There are ways to limit access control, but they are complex. In general, a blockchain is not a good solution if you require sophisticated access control for your data. Security. While blockchain is considered a secure place for transactions and storing/sending sensitive data and information, there have been a few blockchain-related security breaches. Moving your data to a blockchain does provide an inherent layer of protection because of the decentralization and encryption features, however, like most things, it does not guarantee that it won’t be hacked or exploited. Additional Resources Watch the latest SnapSecChat videos to hear what our CSO, George Gerchow, has to say about data privacy and the demand for security as a service. Read a blog on new Sumo Logic research that reveals why a new approach to security in the cloud is required for today’s modern businesses. Learn what three security dragons organizations must slay to achieve threat discovery and investigation in the cloud.

Blog

Comparing Europe’s Public Cloud Growth to the Global Tech Landscape

Blog

React Tables: How to Evaluate Options and Integrate a Table into Your Project

Blog

Thoughts from Gartner’s 2018 Security & Risk Management Summit

Blog

Deadline to Update PCI SSL & TLS Looms, Are You Ready?

Quick History Lesson Early internet data communications were enabled through the use of a protocol called HyperText Transmission Protocol (HTTP) to transfer data between nodes on the internet. HTTP essentially establishes the “request-response” rules to be used between a “client” (i.e. web browser) and “server”(computer hosting a website) throughout the session. While the use of HTTP grew along with internet adoption, its lack of security protocols left internet communications vulnerable to attacks from malicious actors. In the mid-nineties, Secure Sockets Layer (SSL) was developed to close this gap. SSL is known as a “cryptographic protocol” standard established to enable the privacy and integrity of the bidirectional data being transported via HTTP. You may be familiar with HTTPS or HyperText Transmission Protocol over SSL (a.k.a. HTTP Secure). Transport Layer Security (TLS) version 1.0 (v1.0) was developed in 1999 as an enhancement to the then current SSL v3.0 protocol standard. TLS standards matured over time with TLS v1.1 [2006] and TLS v1.2 [2008]. Early Security Flaws Found in HTTPS While both SSL and TLS protocols remained effective for some time, in October of 2014, Google’s security team discovered a vulnerability in SSL version 3.0. Skilled hackers were able to use a technique called Padding Oracle On Downgraded Legacy Encryption — widely referred to as the “POODLE” exploit to bypass the SSL security and decrypt sensitive (HTTPS) information including secret session cookies. By doing this, hackers could then hijack user accounts. In December 2014, the early versions of TLS were also found to be vulnerable from a new variant of the POODLE attack exploits, that enabled hackers to downgrade the protocol version to one that was more vulnerable. Poodle Attacks Spur Changes to PCI Standards So what do POODLE attacks have to do with Payment Card Industry Data Security Standards (PCI DSS) standards and compliance? PCI DSS Requirement 4.1 mandates the use of “strong cryptography and security protocols to safeguard sensitive cardholder data during transmission” and these SSL vulnerabilities (and similar variants) also meant sensitive data associated with payment card transactions was also open to these risks. And in April of 2015 the PCI Standards Security Council (SSC) issued a revised set of industry standards — PCI DSS v3.1, which stated “SSL has been removed as an example of strong cryptography in the PCI DSS, and can no longer be used as a security control after June 30, 2016.” This deadline applied to both organizations and service providers to remedy this situation in their environments by migrating from SSL to TLS v1.1 or higher. They also included an information supplement: “Migrating from SSL and Early TLS” as a guide. However, due to early industry feedback and push back, in December of 2015 the PCI SSC issued a bulletin extending the deadline to June 30, 2018 for both service providers and end users to migrate to higher, later versions of TLS standards. And in April of 2016 the PCI SSC issued PCI v3.2 to formalize the deadline extension and added an “Appendix 2” to outline the requirements for conforming with these standards. Sumo Logic Is Ready, Are You? The Sumo Logic platform was built with a security-by-design approach and we take security and compliance very seriously. As a company, we continue to lead the market in securing our own environment and providing the tools to help enable our customers to do the same. Sumo Logic complied with the the PCI DSS 3.2 service provider level one standards in accordance with the original deadline (June 30, 2016), and received validation from a third party expert, Coalfire. If your organization is still using these legacy protocols it is important to take steps immediately and migrate to the newest versions to ensure compliance by the approaching June 30, 2018 deadline. If you are unsure whether these vulnerable protocols are still in use in your PCI environment, don’t wait until it’s too late to take action. If you don’t have the resources to perform your own audit, the PCI Standards Council has provided a list of “Qualified Security Assessors” that can help you in those efforts. What About Sumo Logic Customers? If you are a current Sumo Logic customer, in addition to ensuring we comply with PCI DSS standards in our own environment, we continually make every effort to inform you if one or more of your collectors are eligible for an upgrade. If you have any collectors that might still be present in your PCI DSS environment that do not meet the new PCI DSS standards, you would have been notified through the collectors page in our UI (see image below). It’s worthwhile to note that TLS v1.1 is still considered PCI compliant, however, at Sumo Logic we are leapfrogging the PCI requirements and moving forward, we will only be supporting TLS v1.2. If needed you can follow these instructions to upgrade (or downgrade) as required. Sumo Logic Support for PCI DSS Compliance Sumo Logic provides a ton of information, tools and pre-built dashboards to our customers to help with managing PCI DSS compliance standards in many cloud and non-cloud environments. A collection of these resources can be found on our PCI Resources page. If you are a cloud user, and are required to manage PCI DSS elements in that type of environment, in April 2018 the PCI SSC Cloud Special Interest Group issued an updated version 3.0 to their previous version 2.0 that was last released in February 2013. Be looking for another related blog to provide a deeper dive on this subject. PCI SSC Cloud Computing Guidelines version 3.0 include the following changes: Updated guidance on roles and responsibilities, scoping cloud environments, and PCI DSS compliance challenges. Expanded guidance on incident response and forensic investigation. New guidance on vulnerability management, as well as additional technical security considerations on topics such as Software Defined Networks (SDN), containers, dog computing and internet of things (IoT). Standardized terminology throughout the document. Updated references to PCI SSC and external resources. Additional Resources For more information on the compliance standards Sumo Logic supports visit our self-service portal. You’ll need a Sumo Logic account to access the portal. Visit our DocHub page for specifics on how Sumo Logic helps support our customer’s PCI compliance needs Sign up for Sumo Logic for free to learn more

Blog

DevOps Redemption: Don't Let Outdated Data Analytics Tools Slow You Down

Blog

SnapSecChat: The Demand for Security as a Service

Blog

Log Management and Analytics for the AWS ELB Classic Service

Quick Refresher Earlier this year, we showed you how to monitor Amazon Web Services Elastic Load Balancer (AWS ELB) with Cloudwatch. This piece is a follow up to that, and will focus on Classic Load Balancers. Classic Load Balancers provide basic load balancing across multiple Amazon EC2 instances and operate at both the request level and connection level. Classic Load Balancers are intended for applications that were built within the EC2-Classic network. AWS provides the ability to monitor your ELB configuration with detailed logs of all the requests made to your load balancers. There is a wealth of data in the logs generated by ELB, and it is extremely simple to set up. How to Get Started: Setting up AWS ELB Logs Logging is not enabled in AWS ELB by default. It is important to set up logging when you start using the service so you don’t miss any important details! Step 1: Create an S3 Bucket and Enable ELB Logging Note: If you have more than one AWS account (such as ops, dev, and so on) or multiple regions that generate Elastic Load Balancing data, you’ll probably need to configure each of these separately. Here are the key steps you need to follow Create an S3 Bucket to store the logs Note: Want to learn more about S3? Look no further (link) Allow AWS ELB access to the S3 Bucket Enable AWS ELB Logging in the AWS Console Verify that it is working Step 2: Allow Access to external Log Management Tools To add AWS ELB logs to your log management strategy, you need to give access to your log management tool! The easiest way to do that is by creating a special user and policy. Create a user in AWS Identity and Access Management (IAM) with Programmatic Access. For more information about this, refer to the appropriate section of the AWS User Guide. Note: Make sure to store the Access Key ID and Secret Access Key credentials in a secure location. You will need to provide these later to provide access to your tools! Create a Custom Policy for the new IAM user. We recommend you use the following JSON policy: { “Version”:”2012-10-17″, “Statement”:[ { “Action”:[ “s3:GetObject”, “s3:GetObjectVersion”, “s3:ListBucketVersions”, “s3:ListBucket” ], “Effect”:”Allow”, “Resource”:[ “arn:aws:s3:::your_bucketname/*”, “arn:aws:s3:::your_bucketname” ] } ] } Note: All of the Action parameters shown above are required. Replace the “your_bucketname” placeholders in the Resource section of the JSON policy with your actual S3 bucket name. Refer to the Access Policies section of the AWS User Guide for more info. What do the Logs look like? ELB logs are stored as .log files in the S3 buckets you specify when you enable logging. The file names of the access logs use the following format: bucket[/prefix]/AWSLogs/aws-account-id/elasticloadbalancing/region/yyyy/mm/dd/aws-account-id_elasticloadbalancing_region_load-balancer-name_end-time_ip-address_random-string.log bucket The name of the S3 bucket. prefix The prefix (logical hierarchy) in the bucket. If you don’t specify a prefix, the logs are placed at the root level of the bucket. aws-account-id The AWS account ID of the owner. region The region for your load balancer and S3 bucket. yyyy/mm/dd The date that the log was delivered. load-balancer-name The name of the load balancer. end-time The date and time that the logging interval ended. For example, an end time of 20140215T2340Z contains entries for requests made between 23:35 and 23:40 if the publishing interval is 5 minutes. ip-address The IP address of the load balancer node that handled the request. For an internal load balancer, this is a private IP address. random-string A system-generated random string. The following is an example log file name: s3://my-loadbalancer-logs/my-app/AWSLogs/123456789012/elasticloadbalancing/us-west-2/2014/02/15/123456789012_elasticloadbalancing_us-west-2_my-loadbalancer_20140215T2340Z_172.160.001.192_20sg8hgm.log Syntax Each log entry contains the details of a single request made to the load balancer. All fields in the log entry are delimited by spaces. Each entry in the log file has the following format: timestamp elb client:port backend:port request_processing_time backend_processing_time response_processing_time elb_status_code backend_status_code received_bytes sent_bytes “request” “user_agent” ssl_cipher ssl_protocol The following table explains the different fields in the log file. Note: ELB can process HTTP requests and TCP requests, and the differences are noted below: Field Description timestamp The time when the load balancer received the request from the client, in ISO 8601 format. elb The name of the load balancer client:port The IP address and port of the requesting client. backend:port The IP address and port of the registered instance that processed this request. request_processing_time [HTTP listener] The total time elapsed, in seconds, from the time the load balancer received the request until the time it sent it to a registered instance.

AWS

June 19, 2018

Blog

Transform Graphite Data into Metadata-Rich Metrics using Sumo Logic’s Metrics Rules

Graphite Metrics are one of the most common metrics formats in application monitoring today. Originally designed in 2006 by Chris Davis at Orbitz and open-sourced in 2008, Graphite itself is a monitoring tool now used by many organizations both large and small. It accepts metrics from a wide variety of sources, including popular daemons like collectd and statsd, provided that the metrics are sent in the following simple format: Where metric path is a unique identifier, specified in a dot-delimited format. Implicit in this format is also some logical hierarchy specific to each environment, for example: While this hierarchical format has been widely accepted in the industry for years, it creates challenges for usability and ultimately lengthens the time to troubleshoot application issues. Users need to carefully plan and define these hierarchies ahead of time in order to maintain consistency across systems, scale monitoring effectively in the future and reduce confusion for the end user leveraging these metrics. Fortunately, the industry is evolving towards tag-based metrics to make it easier to design and scale these systems, and Sumo Logic is excited to announce the launch of Metrics Rules to take advantage of this new model immediately. Using Metrics Rules to Bring Graphite Metrics into the New World Sumo Logic built its metrics platform to support metadata-rich metrics, but we also acknowledged that the broader industry and many of our customers have invested heavily in their Graphite architecture and naming schemas over time. Sumo Logic’s Metrics Rules solution now allows users to easily transform these Graphite metrics into the next generation, tag-based metric format, which provides three key benefits: Faster Time to Value: No need to re-instrument application metrics to take advantage of this metadata-rich, multi-dimensional format. Send Graphite-formatted metrics to Sumo immediately and enrich them with tag-based metadata later. Easy Configuration: An intuitive user interface (UI) allows you to validate and edit your transformation rules in real-time, while competitive solutions require carefully defined config files that are difficult to set up and prone to errors. Improved Usability: With rich metadata, use simple key-value pairs to discover, visualize, filter and alert on metrics without knowing the original Graphite-based hierarchy. Using the example above, we can use Metrics Rules to enrich the dot-delimited Graphite names with key-value tags, which will make it easier for us to monitor metrics by our system’s logical groupings in the future: Intuitive Metrics Rules UI for Easy Validation and Edits As Graphite monitoring systems grow, so do the complexities in maintaining these dot-delimited hierarchies across the organization. Some teams may have defined Graphite naming schemes with five different path components (e.g., app.env.host.assembly.metric), while others may have more components or a different hierarchical definition altogether. To make it easier to create tags out of these metrics, the Metrics Rules configuration interface allows you to see a preview of your rules and make sure that you’ve properly captured the different components. Simply specify a match expression (i.e., which metrics the rule will apply to), define variables for each of the extracted fields and then validate that each tag field is extracting the appropriate values. After saving the rule, Sumo Logic will go back in time and tag your metrics with this new metadata so you can take advantage of these rules for prior data points. Improved Discoverability, Filtering and Alerting with Key-Value Tags Once these metrics contain the key-value tags that we’ve applied via Metrics, you can take advantage of several usability features to make finding, visualizing and alerting on your metrics even easier. For example, Sumo Logic’s autocomplete feature makes it easier to find and group metrics based on these key-value tags: Additionally, when using our unified dashboards for logs and metrics, these new tags can be leveraged as filters for modifying visualizations. Selecting a value in one of these filters will append a key-value pair to your query and filter down to the data you’re interested in: Finally, configuring alerts becomes significantly easier when scoping and grouping your metrics with key-value pairs. In the example below, we selected metric=vcpu.user from one of our namespaces, and we’re averaging this across each node in Namespace=csteam. This means that alerts will trigger across each node, and our email and/or webhook notifications will tell us which particular node has breached the threshold: The Bigger Picture Users can now convert legacy Graphite-formatted performance metrics into the metadata-rich metrics with Sumo Logic, both in real-time and after ingestion. This allows customers to increase the usability and accessibility for their analytics users by allowing them to leverage business relevant tags, instead of relying only on obscure, technical tags. Now with the capability to extract business context (metadata) from IT-focused metrics, organizations can use this data to gain actionable insight to inform strategic business decisions. In a broader context, this is significant because as we’ve been seeing from our customers, the hard lines between IT and business are becoming blurred, and there’s a strong emphasis on using data to improve the overall end-user experience. As more organizations continue to leverage machine data analytics to improve their security, IT and business operations, the ability to map machine data insights to actionable, contextual business analytics for IT and non-core-IT users is critical. Learn More Head over to Sumo Logic DocHub for more details on how to configure Metrics Rules on your account. Additionally, see how these rules can even be used for non-Graphite metrics by parsing out values from existing key-value pairs such as _sourceCategory and _sourceHost. Are you at DockerCon 2018 at Moscone Center in San Francisco this week? We’ll be there! Stop by our booth S5 to chat with our experts, get a demo and to learn more! Additional Resources Read the press release on our latest product enhancements unveiled at DockerCon Download the report by 451 Research & Sumo Logic to learn how machine data analytics helps organizations gain an advantage in the analytics economy Check out the Logs-to-Metrics blog Sign up for Sumo Logic for free

June 12, 2018

Blog

Accelerate Data Analytics with Sumo Logic’s Logs-to-Metrics Solution

If you’re building a new application from scratch and are responsible for maintaining its availability and performance, you might wonder whether you should be monitoring logs or metrics. For us, it’s a no-brainer that you’ll want both: metrics are fast and efficient for proactively monitoring the health of your system, while logs are essential for helping to troubleshoot the details of the issue itself to find the root cause. To use a real world analogy, let’s say you go in for an annual check up and the doctor sees you have elevated blood pressure (“the metric”). He then asks you enough questions to discover that you’ve been eating fast food five nights a week (“the logs”), and recommends a diet change to normalize your blood pressure levels (“the fix”). But what if you’re working with an existing application where logs have always been used for monitoring? Or you’re leveraging third-party services that are only sending you logs? These logs may often contain key performance indicators (KPIs) like latency, bytes sent and request time, and Sumo Logic is great for structuring this in a way to create dashboards and alerts. However, to get the performance benefits of metrics, you might consider re-instrumenting your application to output those KPIs as native metrics instead of logs. But we all know how much free time you have to do that. Extract Metrics from Logs for High Performance Analytics Still, you may be wondering: why would I spend time converting all of my log data to metrics? The long and short of it is this: to deliver the best customer experience to your users. And machine data analytics is essential for that. However, according to data we recently released, one of the biggest barriers to adopting a data analytics tool is the lack of real-time analytics to inform operational, security and business decisions. Without it, you’ll suffer from slow analytics and will lose customers in minutes. No one wants that, especially when customers are relying on your tools to help them resolve critical issues. Sumo Logic’s Logs-to-Metrics solution is the answer to that challenge because we make it easy for you to turn logs into metrics that can be then used as valuable KPIs. And since we do the heavy lifting and work with you to create metrics from existing logs, you don’t have to worry about creating them from scratch. Whether your KPIs are embedded in the logs themselves (e.g., latency, request_time) or you’re looking to compute KPIs by counting the logs (e.g., error count, request count), we’ve got you covered. Turning some of your logs into metrics will give you several key benefits: High Performance Analytics: Storing data in a time-series database allows for lightning fast query times, since the data is optimized for speed and efficiency. Thirteen-Month Data Retention: For all metrics, Sumo Logic provides 13-month retention by default, enabling quick long-term trending of critical business and operational KPIs. Flexible and Low Latency Alerting: With metrics, you can leverage Sumo Logic’s real-time metrics alerting engine, which includes intuitive UI configuration, multiple threshold settings, missing data alerts, muting and more. Never Re-Instrument Code Again: Gain all of the benefits of metrics without digging into your code to configure a metrics output. Easy Configuration with Real-Time Validation In order to make this metrics extraction as seamless as possible, we’ve created a fast way for you to validate your rules in real-time. There are three simple steps to pick out your metrics: Specify a Scope: This is the set of logs that contain the metrics you are interested in. Typically, this contains one or more pieces of metadata and some keywords to narrow down the stream of logs. For example, “_sourceCategory=prod/checkout ERROR”. Define a Parse Expression: Use Sumo Logic’s parsing language to extract out the important fields you’ll want to turn into metrics. You can even use regular expressions for more complex log lines. Select Metrics and Dimensions: After successfully parsing your logs, select which fields are metrics and which are dimensions. Metrics will be the actual value you are interested in tracking, while dimensions are the groups you would want to aggregate those values by. For example, if you want to track the number of errors by service and region, “errors” would be a metric while “service” and “region” would be dimensions. In real-time, Sumo Logic will show you a preview of your parse expression to make sure you’ve correctly extracted the right fields. You can also extract multiple metrics and dimensions from a single rule. KPIs as Metrics = 100x Performance over Logs As much as we love the performance of our log analytics at Sumo Logic, we really love the performance of our metrics. Transforming thousands (or millions) of unstructured log messages into structured visualizations on the fly is always possible, but when the data can be stored as a metric in our native time-series database, the resulting query performance can be astounding. In the simple comparison below, it’s pretty easy to see which chart belongs to metrics: Low Latency Monitoring and Highly Flexible Alerting After extracting metrics out of your logs, you can also take advantage of Sumo Logic’s real-time alerting engine, which monitors your metrics in real-time and triggers notifications within seconds of a condition being met. In additional to the low latency, some other benefits include: Multiple Thresholds: Create different alerts based on the severity of the metric. For example, create a warning alert if CPU is above 60 for five minutes, but generate a critical alert if it’s ever above 90. Multiple Notification Destinations: Send your alerts to multiple destinations. For example, create a PagerDuty incident and send an email when the monitor is critical, but just send a Slack message if it’s hit the warning threshold. Missing Data: Get notified when data hasn’t been seen by Sumo Logic, which can be a symptom of misconfiguration or a deeper operational issue. The Bigger Picture Unstructured machine data is not always optimized for the kind of real-time analytics customers need to inform business decisions. With this new release, users can now take advantage of Sumo Logic’s metrics capabilities without re-instrumenting their code by leveraging existing logs for more efficient analytics and insights. In addition to the deep forensics and continuous intelligence provided by logs, customers can take advantage of metrics by easily extracting key performance indicators from unstructured logs, while still retaining those logs for root cause analysis. These metrics can then be used with the Sumo Logic time series engine, providing 10 to 100 times the analytics performance improvements over unstructured log data searches, as well as support long-term trending of metrics. This allows them to move fast and continue to deliver a seamless experience for their end users. Learn More Logs-to-Metrics is now generally available to all Sumo Logic customers. Head over to our documentation to learn more about how to get started. Additional Resources Read the press release on our latest product enhancements unveiled at DockerCon Download the report by 451 Research & Sumo Logic to learn how machine data analytics helps organizations gain an advantage in the analytics economy Check our new Metrics Rules blog Sign up for Sumo Logic for free

June 12, 2018

Blog

The Sumo Logic Advantage for the Analytics Economy

Blog

Monitoring Kubernetes: What to Monitor (Crash Course, Part 2)

Blog

Monitoring Kubernetes: The K8s Anatomy (Crash Course, Part 1)

Blog

Employee Spotlight: Exploring the Parallels Between Finance and DevSecOps Engineering

In this Sumo Logic Employee Spotlight we interview Michael Halabi. Mike graduated from UC Santa Cruz with a bachelor’s degree in business management economics, spent some time as an auditor with PwC, joined Sumo Logic as the accounting manager in the finance department, and recently transitioned to a new role at the company as a DevSecOps engineer. [Pause here for head scratch] I know what you’re thinking, and yes that is quite the career shift, but if you stay with us, there’s a moral to this story, as well as a few lessons learned. Work Smarter, Not Harder Q: Why did you initially decide business management economics was the right educational path? Mike Halabi (MH): I fell into the “uncertain college kid” category. While I was interested in engineering, I was also an entrepreneur at heart and knew that someday, if I were to start my own business, I would need a foundational business background as well as a variety of other life experiences outside of textbook knowledge. Q: How do you approach your work? MH: Everything in life, no matter how scary it may appear up front, can be broken into a series of simpler and smaller tasks. If you learn how to think about problem solving in a certain way, you can make anything work, no matter how far beyond your skill set and core competency it may originally seem. This is especially true in the technology industry where success often depends on doing it not just better, but also faster, than the competition. Breaking down complex problems into bite size chunks allows you to tackle each piece of the problem quickly and effectively and move on to the next. Q: What’s the best way for a business to achieve that — doing it better and faster? MH: Automation. This is applicable across the board. The finance industry is full of opportunities to automate processes. Half of what a traditional finance team spends its time doing is copy/pasting the same information into the same email templates or copy/pasting a formula in excel and manually tweaking each line. In other words, a bunch of tedious outdated practices that could be easily automated thanks to modern programs and technologies. One instance I recall is someone spending a full day calculating a small subset of a massive spreadsheet line by line: eight hours to do one-tenth of the massive workbook. With a proper understanding of the problem and how to leverage the tools available, I wrote a formula to copy/paste in 30 minutes that completed the entire workbook and is still in use today. Scalable, simple, efficient — this formula removes manual error and works every time. And this was a quarterly project, so that many weeks’ worth of highly paid time is saved every quarter. Low hanging fruit like this is everywhere. Q: So how did you capture the attention of Sumo Logic’s technical team? MH: Word got out about my closet-coding (really, I annoyed everyone to death until they let me help with something fun) and soon, various people in various teams were sending side projects on troubleshooting and automation my way. I continued on like this for awhile — finance accounting manager by day, coder by night until I was approached by our CSO and asked if I’d like to transition onto his team as a DevSecOps engineer. Connect the Dots Q: Let’s back up. How did you initially get into coding? MH: I took an early liking to video game development and while I didn’t have a formal engineering or coding background, using the above methodology, I taught myself how to make simple games using C++/SDL. Then, once I started helping out with various projects at Sumo Logic, I discovered Python, C# and Go. By spending time experimenting with each language I found different use-cases for them and in trying to apply what I’d learned I was pushed into the altogether different world of infrastructure. Making solutions was easy enough, getting them to less technically inclined folks became a new challenge. In order to deploy many of my cross functional projects at Sumo Logic, I had to learn about Docker, Lambda, EC2, Dynamo, ELBs, SSL, HTTP, various exploits/security related to web-based tech, etc. I devoted more of my time to learning about the backend and underlying technologies of the modern internet because making a service scalable and robust requires a holistic skillset beyond simply writing the code. Q: Are there any interesting parallels between finance and engineering? MH: As an auditor at PwC, I worked frequently with many companies from the very early startup stage to large public companies, and the problems most all of these companies face are the same. How do we get more done without hiring more people or working longer hours, and without sacrificing work quality. In finance the problem is handled generally by hiring more at a lot of companies. Q: Can you expand on that? MH: You need to look beyond the company financials. Increased revenue to increased work can’t (or should never) be a 1:1 ratio. For a company to scale, each individual employee has to understand how his or her role will scale in the future to keep pace with corporate growth and needs. You scale an organization by using technology, not by mindlessly throwing bodies at the work. That’s what I learned from finance. You don’t need a team of 10 people to collect money and write the same email to clients multiple times a day, when you can automate and have a team of two handle it. Manual processes are slow and result in human error. In engineering I think this concept is well understood, but in finance, in my experience, many companies behave as if they’re still in the 1500s with nothing more than an abacus to help them do their job. Find a Passion Project Q:What would be your advice to those considering a major career shift? MH: Our interests and passions will shift over time, and there’s nothing wrong with that. If you decide one day to do a complete 180 degree career change, go for it. If you don’t genuinely enjoy what you do, you’ll never truly advance. I loved designing video games and automating financial processes, which led to my career shift into engineering. Did I put in long hours? Yes. Did I enjoy it? Yes. Passion may be an oversung cliche but if you aren’t invested in your work, you’ll go through the motions without reaping any of the benefits, including the satisfaction of producing meaningful work that influences others in a positive way. Q: What’s your biggest learning from this whole experience? MH: The biggest takeaway for me as a coder was that theoretical knowledge doesn’t always apply in the real world because you can’t know how to make something until you make it. Coding is an iterative process of creating, breaking and improving. So never be afraid to fail occasionally, learn from it and move on. And don’t put yourself in a box or give up on your dreams simply because you don’t have a formal education or piece of paper to prove your worth. The technology industry is hungry for engineering talent, and sometimes it can be found in unusual places. In fact, finding employees with robust skill sets and backgrounds will only positively impact your team. Our collective experiences make us holistically stronger.

Blog

Gain Full Visibility into Microservices Architectures Using Kubernetes with Sumo Logic and Amazon EKS

Blog

Sumo Logic Partners with IP Intelligence Leader Neustar to Meet Growing Customer Needs at Scale

Customers are visiting your website, employees are logging into your systems and countless machines are talking to each other in an effort to deliver the perfect user experience. We’d like to believe that all of these individuals and machines are operating with the best of intentions, but how can we be so sure? One possible answer lies in the connecting device’s IP address and its respective physical location. IP geolocation is the process of determining the location of a device based on its unique IP address. It not only requires knowledge about the physical location of the computer where the IP address is assigned, but also how the device is connecting (e.g., via anonymous proxy, mobile, cable, etc.). This challenge becomes further complicated in an increasingly digital world with proliferating devices and millions of connections being established across the globe daily. That’s why we’re excited to announce that we’ve partnered with Neustar, a leading IP intelligence provider, to deliver one of the most comprehensive and precise geolocation databases in the industry. As a Sumo Logic customer, you can now leverage Neustar’s 20+ years of experience gathering and delivering IP intelligence insights, all at no additional charge. Precision Database + Weekly Updates = Higher Confidence Analytics In the pre-cellphone era (remember that?), everyone had a landline which meant area codes were fairly accurate identifiers of an end-user location. I knew that 516 meant someone was calling from Long Island, New York, while 415 was likely coming from the San Francisco Bay Area. But the invention of the cellphone complicated this matter. I might be receiving a call from someone with a 516 number, but because the caller was using a “mobile” device, he or she could be located anywhere in the U.S. IP addresses are like very complicated cellphone numbers — they can be registered in one place, used in another and then re-assigned to someone else without much notice. Keeping track of this is an enormous task. And over time, malicious actors realized that they could take advantage of this to not only mask their true location, but create false security alerts to distract security teams from identifying and prioritizing legitimate high-risk threats. That’s why partnering with a leader like Neustar, that uses a global data collection network and a team of network geography network analysts, to update their IP GeoPoint database on a daily basis, is key. This accuracy allows security teams to have full visibility into their distributed, global IT environment and when there’s an attempt to compromise a user’s credentials within an application, they can quickly flag any anomalous activity and investigate suspicious logins immediately. Proactive Geo Monitoring and Alerting in Sumo Logic With Neustar’s IP GeoPoint database, you can rest assured that your geolocation results are more trustworthy and reliable than ever before. Using Sumo Logic, you can continue to take advantage of the proactive alerting and dashboarding capabilities to make sense of IP intelligence across your security and operational teams. For example, you’ll have a high confidence in your ability to: Detect Suspicious Logins: alert on login attempts occurring outside of trusted regions. Maintain Regulatory Compliance: see where data is being sent to and downloaded from to keep information geographically isolated. Analyze End-User Behavior: determine where your users are connecting from to better understand product adoption and inform advertising campaigns. With real-time alerts, for example, you can receive an email or Slack notification if a login occurs outside of your regional offices: Configure real-time alerts to get notified when a machine or user is appearing from outside of a specific region. You can also use real-time dashboards to monitor the launch of a new feature, track customer behavior or gain visibility into AWS Console Logins from CloudTrail: Using Sumo Logic’s Applications, you can install out-of-the-box dashboards for instant geographic visibility into AWS Console Logins, for example. The Bigger Picture Born in AWS, Sumo Logic has always held a cloud-first, security-by-design approach and our vision is to create a leading cloud security analytics platform to help our customers overcome the challenges of managing their security posture in the cloud. There is a major gap in the available on-premises security tools for customers that not only need to manage security in the cloud, but also meet rigorous regulatory compliance standards, especially the European Union’s General Data Protection Regulation (GDPR) that went into effect last week on May 25, 2018. Geolocation is key for those needs which is why we’re thrilled to be rolling this out to our customers as part of a bigger strategy to provide visibility and security across the full application stack. Learn More Head over to Sumo Logic DocHub for more details on how to leverage the new database, then schedule some searches and create dashboards to take advantage of the enhanced IP geolocation. Check out our latest press announcement to learn about the additional features and to our cloud security analytics solution, including intelligent investigation workflows, privacy and GDPR dashboards, and enhanced threat intelligence.

Blog

Comparing RDS, DynamoDB & Other Popular Database Services – Migrating to AWS Part 2

Blog

Join the Data Revolution - Listen to our New Masters of Data Podcast

In today’s world, we are surrounded by data. It’s flying through the air all around us over radio signals, wireless internet, mobile networks and more. We’re so immersed in data every day that we rarely stop to think about what the data means and why it is there. I, for one, rarely have a quiet moment where I am not directly, or indirectly, absorbing a flow of information, regardless of whether I want to or not. So, I wonder, are we going to master this all-encompassing data, or be mastered by it? Data is the new currency of our world Data has become the lifeblood of the modern business, and those who succeed in today’s competitive landscape are those who leverage data the best. Amazing applications exist that take the raw data flowing out of the exhaust pipe of the modern economy (and our lives) and enable companies to develop products we couldn’t have even conceived of a few years ago. Artificial intelligence-driven innovations are anticipating our next move. Social media products are connecting us and targeting us. Self-driving cars are swimming in data to navigate a world full of faulty humans. But, how often do we stop to talk about the good, the bad and the ugly of data? [epq-quote align="align-right"]"A single conversation across the table with a wise man is better than ten years mere study of books." - Henry Wadsworth Longfellow [/epq-quote] The discussions now about data privacy and the nonstop stream of hackers stealing our personal information, are actually elevating data from a sub-theme of our culture into a primary topic, even for non-experts. The value of data is also rising into the awareness of boardrooms, and this is a good thing. The only way to keep ourselves honest about how we use data is to talk about how we use data — the wonderful and the despicable, the innovative and the regressive. A new podcast to explore the data revolution [epq-quote align="align-left"]The only way to keep ourselves honest about how we use data is to talk about how we use data — the wonderful and the despicable, the innovative and the regressive" - Ben Newton, director of product marketing, Sumo Logic[/epq-quote] As a long-time podcast listener and fan of the spoken word, I am excited to announce a new podcast about this data revolution — Masters of Data. Each episode we interview innovators, big thinkers and provocateurs, to learn theirs views about data. We also want to meet the people behind the data, who can humanize the data by helping us understand the cultural content and intention of the data on human experience. That way we turn this from stories about widgets and gimmicks into stories about humans and the value of data, as well as the dangers of misusing it. Our first podcast is now live and features Bill Burns, the chief trust officer at Informatica. Bill and I take a journey through time and security and discuss the evolving tech landscape over the last 20 years. We talk about how he got started in a computer lab, cut his teeth at Netscape, helped change the world at Netflix, and is on his next journey at Informatica. How to listen and subscribe To listen visit www.mastersofdata.com or subscribe via the iTunes or Google Play app stores. Once you’ve subscribed on your favorite platform, be sure to check back for new episodes and leave us a review to let us know what you think! We will be releasing several discussions over the next few months, and we look forward your feedback! Until then, listen and enjoy. And don’t forget to join the conversation on Twitter #mastersofdata

May 15, 2018

Blog

Sumo Logic For Support and Customer Success Teams

*Authored by Kevin Keech, Director of Support at Sumo Logic, and Graham Watts, Senior Solutions Engineer at Sumo Logic Many Sumo Logic customers ask, “How can I use Sumo Logic for support and customer success teams?” If you need a better customer experience to stay ahead of the competition, Sumo Logic can help. In this post, I will describe why and how support and customer success teams use Sumo Logic, and summarize the key features to use in support and customer success use cases. Why Use Sumo Logic For Support and Customer Success Teams? Improved Customer Experience Catch deviations and performance degradation before your customers report it Using dashboards and scheduled alerts, your CS and Support teams can be notified of any service impacting issues and can then reach out and provide solutions before your customers may ever know they have a problem This helps your customers avoid experiencing any frustrations with your service, which in the past may have led them to look into competitive offerings Improve your Net Promoter Score (NPS) and Service Level Agreements (SLAs) Alert team members to reach out to a frustrated customer before they go to a competitors website or log out Efficiency and Cost Savings – Process More Tickets, Faster Sumo Logic customers report an increase in the number of support tickets each team member can handle by 2-3x or more Direct access to your data eliminates the need for your Support team to request access and wait for engineering resources to grant access This leads to a higher level of customer satisfaction, and allows you to reallocate engineering time to innovate and enhance your product offerings Your support reps can perform real-time analysis of issues as they are occurring, locate the root of a problem, and get your customers solutions quicker Customers report that using LogReduce cuts troubleshooting time down from hours/days to minutes As your teams and products grow, team members can process more tickets instead of needing to hire more staff Security Eliminate the need to directly log into servers to look at logs – you can Live Tail your logs right in Sumo Logic or via a CLI Use Role Based Access Control to allow teams to view only the data they need How to Use Sumo Logic For Support and Customer Success Teams Key features that enable your Support Team, Customer Success Team, or another technical team while troubleshooting are: Search Templates See here for a video tutorial of Search Templates Form based search experience – no need for employees to learn a query language Users type in human-friendly, easy to remember values like “Company Name” and Sumo will look up and inject complex IDs, like “Customer ID” or some other UUID, into the query, shown to the right: LogReduce Reduce 10s or 100s of thousands of log messages into a few patterns with the click of a button This reduces the time it takes to identify the root cause of an issue from hours or days to minutes In the below example, a bad certificate and related tracebacks are exposed with LogReduce Dashboards Dashboard filters – Auto-populating dashboard filters for easy troubleshooting TimeCompare – Is now ‘normal’ compared to historical trends? The example below shows production errors or exceptions today, overlaid with the last 7 days of production errors or exceptions:

Blog

Sumo Logic Announces Search Templates to Improve the Customer Experience with Better, Faster Application Insights

Providing the ultimate customer experience is the goal of every modern company, and to do that they need complete visibility into every aspect of their business. At Sumo Logic, we make it our mission to democratize machine data and make it available for everyone, which allows organizations to gain the required visibility at each step. That’s why today, we are excited to announce the availability of Search Templates to our customers. The Search Templates feature furthers that goal by making data easily accessible to less technical users within an organization. This access to data provides deeper visibility and enables teams across different business units to make informed decisions, faster and more efficiently. Key Benefits of Leveraging Search Templates Reduce application downtime One of the major benefits of Search Templates comes in identifying issues such as production outages. In such instances, there’s a series of steps (generally from a playbook) that an on-call engineer must perform to rapidly identify and mitigate the issue. It typically starts with an alert that a specific component is behaving abnormally. When this alert triggers, the engineer examines an application dashboard and system-level metrics related to that component to identify specific areas for investigation. After that, the engineer would typically want to drill down into each of these areas to gather more information to help answer some basic questions, including: What is the root cause of the issue? Which customers are most affected by the issue? What areas of the product are affected by the issue? Typically these playbooks are stored and executed from a set of documents, notes or scratchpads. Search Templates allow this tribal knowledge to come into Sumo Logic and make it easily accessible to a number of users. The operations team can create a set of standardized Search Templates that require engineers to only provide specific inputs (parameters), instead of requiring them to understand technical queries. These templates make it easy for the on-call engineer to drill down to specific log messages for a timeout, connection failure, etc. or determine which customers are the most impacted so support teams can engage with them quickly. The bottom line is that Search Templates allow operations teams to reduce their application downtime and maintain a good customer experience by reducing the friction for engineers to resolve their production issues. Troubleshoot customer issues faster This issue identification use-case applies to other teams as well. For instance, we have seen a lot of our customers, such as Lifesize and Samsung SmartThings, use Sumo Logic to help their support teams troubleshoot end-user issues. These support teams generally also have a well -defined set of steps that they perform in order to troubleshoot the issue affecting their users. However, these support engineers are not typically savvy on building queries or scripts that replicate their troubleshooting steps. Search Templates make it easy to reproduce these playbook steps within the Sumo Logic platform in a standardized way, without requiring users to know any technical syntaxes. A support engineer can quickly open up one of the templates, tweak certain parameters based on the user issue, get the relevant information, that would help them understand the issue, and take remedial action quickly. In working with our customers we have seen that these steps help customer support teams reduce the time to resolve issues by as much 90 percent. Gain valuable insights The key advantage of Search Templates — as outlined earlier — is reducing the friction for users to get value and insights out of their data without requiring mastery of a query language. Users can provide input using simple text boxes and dropdowns to quickly get back the data without having to waste time or resources learning complicated technical syntax. The autocomplete dropdowns in the text boxes prompt users to choose from a pre-specified list of values, removing the need to remember each and every value. In addition, Search Templates allow users to input human-readable parameters like usernames and customer names and then substitute those with machine-optimized user and product IDs that are actually required to search the data. This significantly reduces the friction and learning curve for non-expert users, and reduces the amount time and effort needed to gain valuable insights from the complex underlying data. Manage content more efficiently Search Templates make the management of all Sumo Logic-related content easier for administrators as well as facilitate the capture of tribal knowledge in the organization. The feature promotes reusability of searches and dashboards. For instance, admins can create a single dashboard that can be used across different environments by using a parameter to toggle between environments. This reduces the number of content items that administrators have to manage. In addition, the reusability ensures that any changes that are made to content populates across all the relevant areas/systems. What’s next? If you are interested in learning more about Search Templates then check out this video. You can also visit our Sumo Logic help docs to learn more about the feature. If you like to know about our new feature releases then check out the “what’s new in Sumo Logic” section of our website.

May 14, 2018

Blog

Call for Speakers for Illuminate 2018 Now Open!

Today at Sumo Logic, we’re excited to announce that registration as well as call for speaker papers are open for our annual user conference Illuminate 2018! For those that did not attend last year or are unfamiliar with Illuminate, it’s a two-day event where Sumo Logic users and ecosystem partners come together to share best practices for shining the light on continuous intelligence for modern applications. Illuminate 2018 takes place Sept. 12-13, 2018 at the Hyatt Regency San Francisco Airport Hotel in Burlingame, Calif., and registration details can be found on the conference website under the frequently asked questions (FAQ) section. Why Attend? The better question is why not? Last year, the conference brought together more than 400 customers, partners, practitioners and leaders across operations, development and security for hands-on training and certifications through the Sumo Logic Certification Program, as well as technical sessions and real-world case studies to help attendees get the most out of the Sumo Logic platform. You can register for Illuminate 2018 here: The 2017 keynote was chock-full of interesting stories from some of our most trusted and valued customers, partners and luminaries, including: Ramin Sayar, president and CEO, Sumo Logic David Hahn, CISO, Hearst Chris Stone, chief products officer, Acquia Special guest Reid Hoffman, co-founder of LinkedIn; partner of Greylock Partners You can watch the full 2017 keynote here, and check out the highlights reel below! Interested in Speaking at This Year’s Conference? If you’ve got an interesting or unique customer use-case or story that highlights real-world strategies and technical insights about how machine data analytics have improved operations, security and business needs for you and your end-users, then we’d love to hear from you! These presentations must provide a holistic overview into how today’s pioneers are pushing the boundaries of what’s possible with machine data analytics and developing powerful use cases within their industries or organizations. Still on the fence? Keep in mind that if your session is accepted, you’ll receive one complimentary registration to Illuminate as well as branding and promotional opportunities for your session. More on Topics, Requirements & Deadline To make it easier on our customers, we’ve compiled a list of desired topics to help guide you in the submission process. However, this is not an exhaustive list, so if you have another interesting technical story to tell that is relevant to Sumo Logic users and our ecosystem, we’d love to hear it. The Journey to the Cloud (Cloud Migration) Operations and Performance Management of Applications and Infrastructure Operations and Performance Management for Containers and Microservices Best Practices for using Serverless Architectures at scale Cloud Security and Compliance Best practices for implementing DevSecOps Sumo Logic to Enable Improved Customer Support/Success Unique Use Cases (I Didn’t Know You Could Use Sumo to…) Best Practices to Adopting and Leveraging Sumo Logic Effectively within your Organization Regardless of topic, all submissions MUST include: Title Brief Abstract (200-400 words) 3 Key Audience Takeaways or Learnings Speaker Bio and Links to any Previous Presentations The deadline to submit session abstracts is June 22, 2018. Please email submissions to sumo-con@sumologic.com. Examples of Previous Customer Presentations Last year, we heard great customer stories from Samsung SmartThings, Hootsuite, Canary, Xero, and more. Here are a few customer highlight reels to give you examples of the types of stories we’d like to feature during Illuminate 2018.  Final Thoughts Even if you aren’t thinking of submitting a call for paper, we’d love to see you at Illuminate to participate in two full days of learning, certification and trainings, content sharing, networking with peers and Sumo Logic experts, and most importantly, fun! You can register here, and as a reminder, call for papers closes June 22, 2018! Together, we can bring light to dark, and democratize machine data analytics for everyone. If you have any additional questions, don’t hesitate to reach out to sumo-con@sumologic.com. Hope you see you there!

May 10, 2018

Blog

SnapSecChat: Sumo Logic’s Chief Security Officer Ruminates on GDPR D-Day

Blog

Comparing AWS S3 and Glacier Data Storage Services - Migrating to AWS Part 1

Blog

How to Build a Scalable, Secure IoT Platform on GCP in 10 Days

Blog

Introducing New Content Sharing Capabilities for Sumo Logic Customers

As organizations scale and grow, teams begin to emerge with areas of specialization and ownership. Dependencies develop, with individuals and teams acting as service providers to other functional areas. We’re finding information technology and DevOps teams rely on Sumo Logic for not just their own monitoring of application health and infrastructure, but also for sharing this data out to other functional areas such as customer support and business analysts, to empower them to make stronger data-driven decisions. Our top customers such as Xero, Acquia and Delta each have hundreds if not thousands of users spread across multiple teams who have unique data needs that align with varying business priorities and wide ranges of skill sets. Scaling our platform to support the data distribution and sharing needs of such a broad and diverse user base is a key driver of Sumo Logic’s platform strategy. To this end, we are excited to announce the general availability of our new Content Sharing capabilities. Underlying this updated ability to share commonly used assets such as searches and dashboards is a secure, fine-grained and flexible role-based access control (RBAC) model. Check out our intro to content sharing video for more details. Collaboration with Control The updated model allows for data visualization assets such as searches and dashboards to be shared out individually (or grouped in folders) to not just other users, but also to members of a particular role. Users can be invited to simply view the asset or participate in actively editing or managing it. This allows for individual users to collaborate on a search or dashboard before sharing it out to the rest of the team. When new information is discovered or a meaningful visualization is created, they can be easily added to a dashboard by anyone with edit access and and the change is made immediately available to other users. While ease-of-use is critical in ensuring a smooth collaboration experience, it should not come at the price of data security. It is always transparent as to who has what access to an asset and this access can be revoked at any time by users with the right privileges. Users can control the data that is visible to the viewers of a dashboard and new security measures such as a “run-as” setting have been introduced to prevent viewing or exploitation of data access settings put in place by administrators. Business Continuity Supporting business continuity is a key conversation we have with our customers as their businesses grow and their data use-cases evolve and deepen. This new set of features reflects Sumo Logic’s belief that the data and its visualizations belong to the organization and should be easily managed as such. We’ve replaced our single-user ownership model with a model where multiple users can manage an asset (ie, view, edit and if required, delete it). This ensures that a team member could, for example, step in and update an email-alert-generating scheduled search when the original author is unavailable. Even if the asset was not shared with other team members, administrators now have the ability (via a newly introduced “Administrative Mode”) to manage any asset in the library, regardless of whether it was actively shared with them or not. The user deletion workflow has been simplified and updated to support business continuity. When a user is deleted, any of their searches and dashboards that are located in a shared folder continue to exist where they are, thus providing a seamless experience for team members depending on them for their business workflows. Administrators can assign and transfer the management of any assets located in the user’s personal folder to another team member with the right context. (Previously, administrators would have had to take ownership and responsibility.) Teams as Internal Service Providers Development teams that own a specific app or run an area of infrastructure, are increasingly seen as internal service providers, required to meet SLAs and demonstrate application health and uptime. They need to make this information easy to discover and access by field and service teams who in turn, rely on this data to support end customers. Our newly introduced Admin Recommended section allows administrators to set up organizational folder structures that facilitate this easy interaction between producers and consumers. Relevant folders are pinned to the very top of our Library view, enabling easy discoverability to infrequent users. The contents of these folders can be managed and updated by individual development teams, without the administrator needing to intervene. This allows for sharing of best practices, commonly used searches and dashboards as well as enables users to create team-specific best practice guides. As data-driven decision-making becomes the norm across all job functions and industries, the Sumo Logic platform is designed to scale in performance, usability and security to meet these needs. Features like fine-grained RBAC and administrative control, coupled with intuitive design make information accessible to those who need it the most and enable the creation of workflows that democratize data and drive business growth. Upcoming Webinars To learn more about how these features can be applied at your organization, please sign up for one of our upcoming webinars on May 1st (2:00 p.m. PST) or May 3rd (9:00 a.m. PST)

April 26, 2018

Blog

SnapSecChat: RSA 2018 Musings from Sumo Logic's Chief Security Officer

Blog

Comparing Kubernetes Services on AWS vs. Azure vs. GCP

Blog

IRS 2018 Tax Day Website Outage: Calculating The Real Monetary Impact

Every year, the proverbial cyber gods anticipate a major data breach during tax day, and this year, they weren’t completely wrong. While we didn’t have a cyber meltdown, we did experience a major technology failure. Many U.S. taxpayers may already know what I am talking about here but for those outside of the country, let me fill you in. The U.S. Internal Revenue Service (IRS) website crashed on April 17 — the infamous tax filing day. And much like death, or rush hour traffic, no one can escape doing his or her taxes. And thanks to the good graces of the IRS, the deadline was extended a day to April 18, so that taxes could be filed by everyone as required by U.S. laws. Now you might say “ah shucks, it’s just a day of downtime. How bad can it be?” In short, bad, and I’m going to crunch some numbers to show you just how bad because here at Sumo Logic, we care a lot about customer experience and this outage was definitely low on the customer experience scale. The below will help you calculate the financial impact of the total downtime and how the delays affected the U.S. government — and more importantly, the money the government collects from all of us. To calculate the cost of the one day downtime of the IRS app, we did some digging in terms of tax filing statistics. And here’s what we found: On average over 20 million taxpayers submit their taxes in the last week, according to FiveThirtyEight. Half of these folks — 50 percent, ~10M (million) people — submitted on the last day, and therefore were affected by the site outage. How did we arrive at this number? Well, we made a conservative assumption that the folks who waited until the last week were the classic procrastinators (you know who you are!). And these procrastinators generally wait until the last day (maybe even the last hour) to file their taxes, which was also backed up by the IRS and FiveThirtyEight data points. We also make a practical assumption that most last-minute tax filers are “payees,” meaning you are waiting for the last day because you are paying money to the government. After all, if you are getting money back, there’s more incentive to file early and cash in that check! You with me so far? Now, in order to determine the amount of money the IRS collects on the last day, we need to know what the average filer pays to the IRS. This was a tricky question to answer. The IRS assumes that most folks get a refund of $2900, but does not specify the average filer payment amount. Since there exists no standard amount, we modeled a few payments ($3K, $10K, $30K) to calculate the amount that got delayed because of the last minute outage. Number of filers who pay Avg. payee amount IRS revenue ($’s) on last filing day 10M $3K $30B 10M $10K $100B 10M $30K $300B So the government delayed getting some big money, but they do eventually get it all (didn’t we say that taxes and death are inevitable?). It’s important to note here that a lot of taxes are collected via W-2 forms, so filing does not mean the payee will actually pay, and in some instances, there will be refunds granted. So with that in mind, let’s now calculate the cost of the one day delay. To do this, we use the 1.6 percent* treasury yield on the money and we calculate the actual cost of downtime. Number of filers who pay Avg. payee amount IRS revenue ($’s) on last filing day Lost revenue for one day delay 10M $3K $30B $1.3M 10M $10K $100B $4.3M 10M $30K $300B $13.1M What you see is that even one day’s cost of downtime for the U.S. government is measured in millions of dollars. And for a government that is trillions of dollars in debt (and at least in theory focused on cost/debt reduction), these numbers add up quickly. So here’s a quick PSA for the U.S. government and for the IRS in particular: Your software application matters. And the customer experience matters. Update your systems, implement the proper controls, and get your act together so that we can rest assured our money is in good hands. If you continue to run your infrastructure on legacy systems, you’ll never be able to scale, stay secure or deliver the ultimate customer experience. With the pace of technological innovation, cloud is the future and you need visibility across the full application stack. But let us not forget the second moral of this story, said best by Abraham Lincoln: “You cannot escape the responsibility of tomorrow by evading it today.” And if you want to see customer experience done right, contact Sumo Logic for more questions on how to build, secure and run your modern applications in the cloud!

Blog

Challenges to Traditional Cloud Computing: Security, Data, Resiliency

Blog

DevSecOps and Log Analysis: Improving Application Security

Blog

RSA CSO Corner: Okta & Sumo Logic Talk MFA, Minimizing Risk in the Cloud

Blog

RSA CSO Corner: Neustar & Sumo Logic Talk GDPR, IP Intelligence

Blog

Survey Data Reveals New Security Approach Needed for Modern IT

New World of Modern Apps and Cloud Create Complex Security Challenges As the transition to the cloud and modern applications accelerates, the traditional security operations center (SOC) functions of threat correlation and investigation are under enormous pressure to adapt. These functions have always struggled with alert overload, poor signal to noise ratio in detection, complex and lengthy workflows, and acute labor churn; however, cloud and modern applications add new challenges to integrate previously siloed data and process while coping with much larger threat surface areas. To overcome these challenges, security must continuously collaborate with the rest of IT to acquire and understand essential context. In addition, cloud and application-level insight must be integrated with traditional infrastructure monitoring, and investigation workflows must accelerate at many times the current speed in order to keep pace with the exploding threat landscape. In the past 2 months we’ve formally surveyed hundreds of companies about their challenges with security for modernizing IT environments in the 2018 Global Security Trends in the Cloud report, conducted by Dimensional Research in March 2018 and sponsored by Sumo Logic. The survey included a total of 316 qualified independent sources of IT security professionals across the U.S. and Europe, the Middle East and Africa (EMEA). In addition, we’ve interviewed a broad cross-section of both current and potential future Sumo Logic customers. According to the survey results, a strong majority of respondents called out the need for a fundamentally new approach for threat assessment and investigation in the cloud, and even the laggard voices conceded these are “if not when” transitions that will redraw boundaries in traditional security tools and process. In the Customer Trenches: Why Security and IT Must Collaborate Eighty-seven percent of surveyed security pros observed that as they transition to the cloud, there is a corresponding increase in the need for security and IT operations to work together during threat detection and investigation. Customer interviews gave color to this strong majority with many use cases cited. For instance, one SaaS company security team needed end customer billing history to determine the time budget and priority for conclusion/case queuing. Another online business process firm needed close collaboration with the cloud ops teams to identify if slow application access was a security problem or not. A third company needed IT help for deeper behavioral insight from identity and access management (IAM) systems. In all of these examples the heavy dose of cloud and modern applications made it nearly impossible for the already overburdened security team to resolve the issues independently and in a timely manner. They required real-time assistance in getting data and interpreting it from a variety of teams outside the SOC. These examples are just a few of the complex workflows which can no longer be solved by siloed tools and processes that are holding organizations back from fully securing their modern IT environments. These challenges surface in the survey data as well, with 50 percent of respondents specifically looking for new tools to improve cross-team workflows for threat resolution. This group — as you would expect — had plenty of overlap with the over 50 percent of respondents who observed that on-premises security tools and traditional security information and event management (SIEM) solutions can’t effectively assimilate cloud data and threats. Unified Visibility is Key: Integrating Cloud and Application Insight Eighty-two percent of those surveyed observed that as their cloud adoption increases there is a corresponding increase in the need to investigate threats at both the application and infrastructure layers. A clear pattern in this area was best summarized by one SOC manager, who said: “I feel like 90 percent of my exposure is at the application layer but my current defense provides only 10 percent of the insight I need at that layer.” Attackers are moving up the stack as infrastructure defenses solidify for cloud environments, and the attack surface is expanding rapidly with modular software (e.g. microservices) and more externally facing customer services. “Ninety percent of my exposure is at the application layer but my current defense provides only 10 percent of the insight I need” In the survey, 63 percent of security pros reported broader technical expertise is required when trying to understand threats in the cloud. An industry veteran who spent the past 3 years consulting on incorporating cloud into SOCs noted a “three strikes you’re out” pattern for SOC teams in which they could not get cloud application data, could not understand the context in the data when they did get it, and even if they understood it could not figure out how to apply the data to their existing correlation workflows. One CISO described the process like “blind men feeling an elephant,” a metaphor with a long history describing situations in which partial understanding leads to wild divergence of opinion. Customers interviews provided several examples of this dynamic. One incident pesponse veteran described painstaking work connecting the dots from vulnerabilities identified in DevOps code scans to correlation rules to detect cross-site scripting, a workflow invisible to traditional infrastructure-focused SOCs. Another enterprise with customer facing SaaS offerings described a very complex manual mapping from each application microservice to possible IOCs, a process the traditional tools could only complete in disjointed fragments. Many reported the need to assess user activity involving applications in ways standard behavior analytics tools could not. More broadly these cloud and application blind spots create obvious holes in the security defense layer, such as missing context, lost trials, unidentified lateral movement and unsolvable cases (e.g. cross-site scripting) to name a few. Diversity of log/API formats and other challenges make moving up the stack a non-trivial integration, but these obstacles must be overcome for the defense to adapt to modern IT. New Approach Needed to Break Down Existing Silos With all of these challenges in the specific areas of threat correlation and investigation, it’s no surprise that more generally an aggregate of 93 percent of survey respondents think current security tools are ineffective for the cloud. Two-thirds of those surveyed are looking to consolidate around tools able to plug the holes. A full third say some traditional categories such as the SIEM need to be completely rethought for the cloud. At Sumo Logic we’ve lived the imperative to bridge across the traditional silos of IT vs. security, application vs. infrastructure, and cloud vs. on-premises to deliver an integrated cloud analytics platform. We’re applying that hard won insight into new data sources, ecosystems and application architectures to deliver a cloud security analytics solution that meets the demands of modern IT. Stop by the Sumo Logic booth (4516 in North Hall) this week at RSA for a demo of our new cloud security analytics platform features, including privacy and GDPR-focused dashboards, intelligent investigation workflow and enhanced threat intelligence. To read the full survey, check out the report landing page, or download the infographic for a high-level overview of the key findings.

Blog

Sumo Logic's Dave Frampton Live on theCube at RSA

Blog

Log Analysis on the Microsoft Cloud

The Microsoft Cloud, also known as Microsoft Azure, is a comprehensive collection of cloud services available for developers and IT professionals to deploy and manage applications in data centers around the globe. Managing applications and resources can be challenging, especially when the ecosystem involves many different types of resources, and perhaps multiple instances of each. Being able to view logs from those resources and perform log analysis is critical to effective management of your environment hosted in the Microsoft Cloud. In this article, we’re going to investigate what logging services are available within the Microsoft Cloud environment, and then what tools are available to assist you in analyzing those logs. What Types of Logs are Available? The Microsoft Cloud Infrastructure supports different logs depending on the types of resources you are deploying. Let’s look at the logs that are gathered within the ecosystem and then investigate each in more depth. Activity Logs Diagnostic Logs Application logs are also gathered within the Microsoft Cloud. However, these are limited to compute resources and are dependent on the technology used within the resource, and application and services which are deployed with that technology. Activity Logs All resources report their activity within the Microsoft Cloud ecosystem in the form of Activity Logs. These logs are generated as a result of some different categories of events. Administrative – Creation, deletion and updating of the resource. Alerts – Conditions which may be cause for concern, such as elevated processing or memory usage. Autoscaling – When the number of resources is adjusted due to autoscale settings. Service Health – Related to the health of the environment in which the resource is hosted. These logs contain information related to events occurring external to the resource. Diagnostic Logs Complementary to the activity logs are the diagnostic logs. Diagnostic logs provide a detailed view into the operations of the resource itself. Some examples of actions which would be included in these logs are: Accessing a secret vault for a key Security group rule invocation Diagnostic logs are invaluable in troubleshooting problems within the resource and gaining additional insight into the interactions with external resources from within the resource being monitored. This information is also valuable in determining the overall function and performance of the resource. Providing this data to an analysis tool can offer important insights which we’ll discuss more in the next section. Moving Beyond a Single Resource Log viewing tools and included complex search filters are available from within the Microsoft Cloud console. However, these are only useful if you are interested in learning more about the current state of a specific instance. And while there are times when this level of log analysis is valuable and appropriate, sometimes it can’t accomplish the task. If you find yourself managing a vast ecosystem consisting of multiple applications and supporting resources, you will need something more powerful. Log data from the Microsoft Cloud is available for access through a Command Line Interface (CLI), REST API and PowerShell Cmdlet. The real power in the logs lies in being able to analyze them to determine trends, identify anomalies and automate monitoring so that engineers can focus on developing additional functionality, improving performance and increasing efficiencies. There are some companies which have developed tools for aggregating and analyzing logs from the Microsoft Cloud, including Sumo Logic. You can learn more about the value which Sumo Logic can provide from your log data by visiting their Microsoft Azure Management page. I’d like to touch on some of the benefits here in conclusion. Centralized aggregation of all your log data, both from the Microsoft Cloud and from other environments, makes it easier to gain a holistic view of your resources. In addition to making this easier for employees to find the information they need quickly, it also enhances your ability to ensure adherence to best practices and maintain compliance with industry and regulatory standards. Use of the Sumo Logic platform also allows you to leverage their tested and proven algorithms for anomaly detection, and allows you to segregate your data by source, user-driven events, and many other categories to gain better insight into which customers are using your services, and how they are using them.

Blog

RSA CSO Corner: Twistlock & Sumo Logic Talk GDPR, Container Security

Blog

RSA CSO Corner: CloudPassage & Sumo Logic Talk DevSecOps, Cloud Security

Blog

RSA Video: GDPR Flash Q&A with Sumo Logic Execs

Blog

Monitoring AWS Elastic Load Balancing with Cloudwatch

Quick Refresher – What is AWS Elastic Load Balancing? A key part of any modern application is the ability to spread the load of user requests to your application across multiple resources, which makes it much easier to scale as traffic naturally goes up and down during the day and the week. Amazon Web Services’ answer to load balancing in the cloud is the Elastic Load Balancer (AWS ELB) service – Classic ELB and Application ELB. AWS ELB integrates seamlessly with Amazon’s other cloud services, automatically spinning up new ELB instances without manual intervention to meet high demand periods and scaling them back, in off peak hours to get the most out of your IT budget, while also providing a great experience to your users. AWS provides the ability to monitor your ELB configuration through AWS Cloudwatch with detailed metrics about the requests made to your load balancers. There is a wealth of data in these metrics generated by ELB, and it is extremely simple to set up. And best of all, these metrics are included with the service! Understanding AWS Cloudwatch metrics for AWS ELB First, you need to understand the concept “Namespace”. For every service monitored by AWS Cloudwatch, there is a Namespace dimension that tells you where the data is coming. For each of the three ELB services, there is a corresponding namespace as well. Namespace Namespace Classic Load Balancers AWS/ELB Application Load Balancers AWS/ApplicationELB Network Load Balancers AWS/NetworkELB One of the most important aspects to understand with Cloudwatch metrics are the “dimensions”. Dimensions tell you the identity of what is monitoring – what it is and where it is from. For this type of metric, there are two key dimensions: Dimension Description Availability Zone What Availability Zone the ELB Instance is in LoadBalancerName The name of the ELB instance Note: AWS automatically provides rollup metrics over dimensions as well. So, for example, if you see a measurement with no Load Balancer dimension, but still has an Availability Zone (AZ), that is a rollup over all of the Load Balancers in that AZ. Another part of the metrics are the “Statistic”. Cloudwatch metrics are not raw measurements, but are actually aggregated up to more digestible data volumes. So, in order to not lose the behavior of the underlying data, Cloudwatch provides several statistics which can use depending on what you need: Statistic Description Minimum The minimum value over the reporting period (typically 1 min) Maximum The minimum value over the reporting period (typically 1 min) Sum The sum of all values over the reporting period (typically 1 min) Average The average value over the reporting period (typically 1 min) SampleCount The number of samples over the reporting period (typically 1 min) What are the key metrics to watch? There are a lot of metrics gathered by Cloudwatch, but we can divide those into two main categories: Metrics about the Load Balancer, Metrics about the Backend Instances. We will show you the key ones to watch, and what statistics are appropriate when analyzing the metric. Key performance indicators for the load balancer The key performance indicators (KPIs) will help you understand how the actual ELB instances are performing and how they are interacting with the incoming requests, as opposed to how your backend instances may be responding to the traffic. Metric What it Means and How to Use it Statistics to Use RequestCount This metric tracks the number of requests that the load balancer, or group of load balancers, has received. This is the baseline metric for any kind of traffic analysis, particularly if you don’t have auto-scaling enabled. Sum (other statistics aren’t useful) SurgeQueueLength This tells you the number of inbound requests waiting to be accepted and processed by a backend instance. This can tell you if you need to scale out your backend resources. Maximum is the most useful, but Average and Minimum can be helpful in addition to Maximum. SpilloverCount This is the number of rejected requests because the surge queue is full.

AWS

April 9, 2018

Blog

The History of Monitoring Tools

Blog

How Log Analysis Has Evolved

Blog

Achieving AWS DevOps Competency Status and What it Means for Customers

Blog

Configuring Your ELB Health Check For Better Health Monitoring

Blog

Choosing Between an ELB and an ALB on AWS

Blog

Optimizing Cloud Visibility and Security with Amazon GuardDuty and Sumo Logic

Blog

Microservices for Startups Explained

Blog

Using AWS Config Rules to Manage Resource Tag Compliance

Blog

Graphite Monitoring for Windows Performance Metrics

For several years now, the tool of choice for collecting performance metrics in a Linux environment has been Graphite. While it is true that other monitoring tools, such as Grafana, have gained traction in the last several years, Graphite remains the go tool monitoring tool for countless organizations. But what about those organizations that run Windows, or a mixture of Windows and Linux? Because Graphite was designed for Linux, it is easy to assume that you will need a native Win32 tool for monitoring Windows systems. After all, the Windows operating system contains a built-in performance monitor, and there are countless supplementary performance monitoring tools available, such as Microsoft System Center. While using Graphite to monitor Linux systems and a different tool to monitor Windows is certainly an option, it probably isn’t the best option. After all, using two separate monitoring tools increases cost, as well as making life a little more difficult for the administrative staff. Fortunately, there is a way to use Graphite to monitor Windows systems. Bringing Graphite to Windows As you have probably already figured out, Graphite does not natively support the monitoring of Windows systems. However, you can use a tool from GitHub to bridge the gap between Windows and Graphite. In order to understand how Graphite monitoring for Windows works, you need to know a little bit about the Graphite architecture. Graphite uses a listener to listen for inbound monitoring data, which is then written to a database called Whisper. Graphite is designed to work with two different types of metrics—host metrics and application metrics. Host metrics (or server metrics) are compiled through a component called Collectd. Application metrics, on the other hand, are compiled through something called StatsD. In the Linux world, the use of Collectd and StatsD means that there is a very clear separation between host and application metrics. In the case of Windows however, Graphite monitoring is achieved through a tool called PerfTap. PerfTap does not as cleanly differentiate between host and application monitoring. Instead, the tool is designed to be compatible with StatsD listeners. Although StatsD is normally used for application monitoring, PerfTap can be used to monitor Windows operating system-level performance data, even in the absence of Collectd. The easy way of thinking of this is that StatsD is basically treating Windows as an application. As is the case with the native Windows Performance Monitor, PerfTap is based around the use of counters. These counters are grouped into five different categories, including: System Counters – Used for monitoring hardware components such as memory and CPU Dot Net Counters – Performance counters related to the .NET framework ASP Net Counters – Counters that can be used to track requests, sessions, worker processes, and errors for ASP.NET SQL Server Counters – Most of these counters are directly related to various aspects of Microsoft SQL Server, but there is a degree of overlap with the System Counters, as they relate to SQL Server. Web Service Counters – These counters are related to the native web services (IIS), and allow the monitoring of ISAPI extension requests, current connections, total method requests, and more. PerfTap allows monitoring to be enabled through the use of a relatively simple XML file. This XML file performs four main tasks. First, it sets the sampling interval. The second task performed by the XML file is to provide the location of the counter definition file. The XML file’s third task is to list the actual counters that need to be monitored. And finally, the XML file provides connectivity information to the Graphite server by listing the hostname, port number, prefix key, and format. You can find an example of the XML file here. Graphite and Sumo Logic Although Graphite can be a handy tool for analyzing performance metrics, Graphite unfortunately has trouble maintaining its efficiency as the organization’s operations scale. One possible solution to this problem is to bring your Graphite metrics into the Sumo Logic service. Sumo Logic provides a free video of a webinar in which they demonstrate their platform’s ability to natively ingest, index, and analyze Graphite data. You can find the video at: Bring your Graphite-compatible Metrics into Sumo Logic. Conclusion Although Graphite does not natively support the monitoring of Windows systems, you can use a third-party utility to send Windows monitoring data to a Graphite server. Of course, Graphite is known to have difficulty with monitoring larger environments, so adding Windows monitoring data to your existing Graphite deployment could complicate the management of monitoring data. One way of overcoming these scalability challenges is to bring your Graphite monitoring data into Sumo Logic for analysis.

Blog

Sumo Logic Gives Customers 171 Percent ROI: Forrester TEI Study

Blog

A DPO's Guide to the GDPR Galaxy: Dark Reading

Blog

Tuning Your ELB for Optimal Performance

Blog

Common AWS Security Threats and How to Mitigate Them

AWS security best practices are crucial in an age when AWS dominates the cloud computing market. Although moving workloads to the cloud can make them easier to deploy and manage, you’ll shoot yourself in the foot if you don’t secure cloud workloads well. Toward that end, this article outlines common AWS configuration mistakes that could lead to security vulnerabilities, then discusses strategies for addressing them. IAM Access The biggest threat that any AWS customer will face is user access control, which in AWS-speak is known as Identity and Access Management‎ (IAM). When you sign up for a brand-new AWS account, you are taken through steps that will enable you to grant privileged access to people in your company. When the wrong access control is given to a person that really doesn’t require it, things can go terribly downhill. This is what happened with GitLab, when their production database was partially deleted by mistake! Mitigation Fortunately, IAM access threats can be controlled without too much effort. One of the best ways to go about improving IAM security is to make sure you are educated about how AWS IAM works and how you can take advantage of it.When creating new identities and access policies for your company, grant the minimal set of privileges that everyone needs. Make sure you get the policies approved by your peers and let them reason out why one would need a particular level of access to your AWS account. And when absolutely needed, provide temporary access to get the job done.Granting access to someone does not just stop with the IAM access control module. You can take advantage of the VPC methods that allow administrators to create isolated networks that connect to only some of your instances. This way, you can have staging, testing and production instances. Loose Security Group Policies Administrators sometimes create loose security group policies that expose loopholes to attackers. They do this because group policies are simpler than setting granular permissions on a per-user basis.Unfortunately, anyone with basic knowledge of AWS security policies can easily take advantage of permissive group policy settings to exploit AWS resources. They leave your AWS-hosted workloads at risk of being exploited by bots (which account for about a third of the visitors to websites, according to web security company Imperva). These bots are unmanned scripts that run on the Internet looking for basic security flaws, and misconfigured security groups on AWS servers that leave unwanted ports open are something they look for. Mitigation The easiest way to mitigate this issue is to have all the ports closed at the beginning of your account setup. One method of doing this is to make sure you allow only your IP address to connect to your servers. You can do this while setting up your security groups for your instances, to allow traffic only to your specific IP address rather than to have it open like: 0.0.0.0/0. Above all, making sure you name your security group when working in teams is always a good practice. Names that are confusing for teams to understand is also a risk. It’s also a good idea to create individual security groups for your instances. This allows you to handle all your instances separately during a threat. Separate security groups allow you to open or close ports for each machine, without having to depend on other machines’ policies. Amazon’s documentation on Security Groups can help you get tighter on your security measures. Protecting Your S3 Data One of the biggest data leaks from Verizon happened not because of a bunch of hackers trying to break their system, but from a simple misconfiguration in their AWS S3 storage bucket that contained a policy that allows anyone to read information from the bucket. This misconfiguration affected anywhere between six million and 14 million Verizon customers. This is a disaster for any business.Accidental S3 data exposure is not the only risk. A report released by Detectify identifies a vulnerability in AWS servers that allows hackers to identify the name of the S3 buckets. Using this information, an attacker can start talking to Amazon’s API. Done correctly, attackers can then read, write and update an S3 bucket without the bucket owner ever noticing. Mitigation According to Amazon, this is not actually an S3 bug. It’s simply a side effect of misconfiguring S3 access policies. This means that as long as you educate yourself about S3 configuration, and avoid careless exposure of S3 data to the public, you can avoid the S3 security risks described above. Conclusion Given AWS’s considerable market share, there is a good chance that you will deploy workloads on AWS in the future, if you do not already. The configuration mistakes described above that can lead to AWS security issues are easy to make. Fortunately, they’re also easy to avoid, as long as you educate yourself. None of these security vulnerabilities involve sophisticated attacks; they center on basic AWS configuration risks, which can be avoided by following best practices for ensuring that AWS data and access controls are secured.

Blog

Don't Fly Blind - Use Machine Data Analytics to Provide the Best Customer Experience

Blog

DevSecOps 2.0

Blog

4 Reasons Why I Chose Azure: A Developer's Perspective

Before Azure and AWS, Microsoft development teams would need a server on their local network that would manage change control and commits. Once a week, a chosen developer would compile and deploy the application to a production server. Now developers have the option of creating cloud applications in Visual Studio and connecting their projects directly to an Azure cloud server. For a Windows developer, Azure was the choice I made to develop my own software projects and manage them in the cloud due to the easy integration between my development desktop and Azure cloud instances. Azure has similar services as other cloud providers, but it has some advantages for Microsoft developers that need to deploy applications for the enterprise. The biggest advantage for an enterprise is that it reduces the amount of on-site resources needed to support a developer team. For instance, the typical Microsoft-based enterprise has a team of developers, a staging server, a development server, QA resources, Jira, and some kind of ticketing system (just to name a few). With Azure, the team can set up these resources without the real estate or the hardware on-site. It also has integrated IaaS, so you can create a seamless bond between your internal network and the cloud infrastructure. In other words, your users will never know if they are working on a local server or running applications on Azure. The Dashboard For anyone who has managed Windows servers, the dashboard (Azure calls it your portal) is intuitive. The difficult part of starting an Azure account is understanding all of the options you see in the dashboard. For developers used to local environments where you need to provision multiple resources for one application, it's important to understand that everything you need to build an application is at your fingertips. You no longer need to go out and purchase different services such as SSL and database software and install it. You just click the service that you want and click "Create" in the Azure portal and Microsoft builds it into your service agreement. If you pay as you go, then you only pay for the resources that you use. You can build APIs, services, start virtual servers, and even host WordPress sites from your portal. In the image above, I've blacked out the names of my resources for security reasons, but you can see that I have a web application, an SSL certificate, two databases (one MySQL and one MSSQL), email services and a vault (for the SSL cert) in Azure. I have an individual plan and don't use many resources from their servers, so I pay about $100 a month for a small WordPress site. When I had two web applications running, I paid about $150 a month. Integration with Visual Studio The main reason I use Azure is for it's easy integration with my Visual Studio projects. For Windows projects, integration between Visual Studio and Azure is as simple as creating a web application and copying the connection information into your projects. Once you've created the connection in Visual Studio, you can use Azure for change control and promote code directly from your development computer. For a simple web application, Microsoft offers a web app and a web app with SQL. This should be self-explanatory. If you just want to build a web application and don't need a database, then you choose the first option. The second one will set up a web app and a SQL Server. You aren't limited to a Windows environment either. When you create a web app, Azure asks if you want to use Windows or Linux. When you create your application, it sits on a subdomain named <your_app_name>.azurewebsites.net. This will be important when you set up your TLD. This feature is the downside of using Azure over traditional hosting. When you set up your TLD, you set up a CNAME and configure a custom domain in your Azure settings. When search engines crawl your site, sometimes they index this subdomain instead of only the TLD. This makes it difficult to work with when you use applications such as WordPress. When you install WordPress, you install it on the subdomain, so WordPress has the subdomain in all of its settings. This causes errors when you promote the site to your TLD, and you must do a global database search and replace to remove the it from your settings. I found this was one con to using Azure for my websites. After you create the web app, you're shown basic information to get started should you just want to upload files to the new location using FTP. The "Get publish profile" provides you with a downloadable text file that you use to connect Visual Studio. You can also connect it directly to your own GitHub repository. As a matter of fact, Microsoft will generate pre-defined settings for several repositories. After you download one of these files, you import the settings directly into your profile and you're automatically connected. When I work in Visual Studio with an Azure connection, I check out files, edit them and check them back in as I code. It feels exactly like the old-school Team Foundation Server environment except I don't have to buy the expensive equipment and licenses to host an internal Microsoft Windows server. Easy VM Creation for Testing Third-Party Software Occasionally, I get a customer that has an application that they want me to test. It's usually in beta or it's their recent stable release but it's something that I wouldn't normally use on my machine. It's dangerous for me to install third-party software on my work machine. Should my customers' software crash my computer, I lose important data. Not only could I lose data, but I don't know what type of software I'm installing on my important developer machine connected to my home network. I don't want anything to have the ability to search my home network and storage. With an Azure VM, I not only keep the software off of my local machine, but it's shielded from my local network too. I could install a VM on my local desktop, but I like to keep this machine clean from anything other than what I need to develop and code. I use Azure VMs to install the software, and then I can destroy the VM when I'm done with it. What's great about Azure is that they have several pre-installed operating systems and frameworks to choose from. Here are just a few: Some other platforms not shown above but can be installed on an Azure VM: Citrix XenDesktop Kali Linux Cisco CSR SQL Server cluster Kiteworks Suse Linux Jira Magento WordPress When I'm working with client software, I use the default Windows Server installation, which is 2016. Microsoft adds the latest operating system to its list as they are released. It's an Azure advantage over working with traditional VPS service. With traditional service, you must ask the host to upgrade your operating system and change your service. With Azure, you spin up a new VM with any operating system you choose. I then use the VM to install the software, work with it until I'm finished a project, and then destroy the VM when I'm done. Because I only use the VM for a short time and don't connect it to any resource-intensive applications, it only costs me a few extra dollars a month. But I keep third-party apps "sandboxed" from my own network without using local resources. Working with WordPress on Azure Occasionally organizations have a separate WordPress site for content even with a team of Windows developers. It's easier for marketing to publish content with WordPress. Instead of using a separate shared host, Azure has the option to create an app with WordPress pre-installed. Most people think running WordPress on a Windows server is senseless, but you get some kind of security from standard hacks that look for .htaccess or Apache vulnerabilities, and you can keep your web services all in one place. It only costs a few dollars to host the WordPress site, but you pay for traffic that hits the service so it might be much more if you have a high-traffic blog. Azure has its Insights services as well, so you can monitor WordPress unlike traditional shared or VPS services where you need a third-party application. You also get monitoring and all the available infrastructure such as load balancing, SSL, and monitoring should you need it with the WordPress site. These aren't available with simple shared or VPS hosting. While running WordPress on Windows seems counterintuitive, running it in Azure is more beneficial for the enterprise that needs to keep detailed statistics on site usage and security from script kiddies that attack any public WordPress site with poor Linux management. Is Azure Better than AWS? Most people will tell you that AWS is a better choice, but I preferred Azure's portal. I found spinning up services was more intuitive, but what sold me was the integration into Visual Studio. If you're in an enterprise, the one advantage of Azure is that you can integrate Active Directory Services so that your network expands into Azure's cloud IaaS. This would be much better than building a separate portal that you must control with individual security settings. This eliminates the possibility of accidentally exposing data from incorrect security settings. If you decide to try Azure, they give you the first 30 days free. AWS gives you 12 months, so users have longer to figure out settings. I found Azure beneficial and kept paying for a low-traffic, low-resource account until I could figure out if I wanted to permanently use it. I've had an account for two years now and don't plan to ever switch.

Blog

Top 5 Metrics to Monitor in IIS Logs

Blog

Auto Subscribing CloudWatch Log Groups to AWS Lambda Function

Blog

Kubernetes Logging Overview

Blog

Best Practices for AWS Config

AWS Config was introduced by Amazon Web Services in 2014 as an auditing tool to help consumers of AWS resources actively track and monitor AWS assets. The tool allows administrators to determine compliance with corporate and security standards. It also functions in determining changes to the cloud ecosystem which may have resulted in performance and functionality problems. In this article, we’re going to look at AWS Config in more detail and cover best practices which you may consider implementing to make the most of this tool. Setting Up AWS Config in Your Account A significant time-saving benefit of AWS Config is that you don’t need to install or maintain agents on any of your cloud resources. AWS Config functions inside each region of your account and can be easily enabled through the AWS Management Console. To enable AWS Config, log into the AWS Management Console and navigate to the Config home page. Fig. 1 AWS Config Home Page When you enable AWS Config, there are four pieces of information or configuration that you will need to provide. Resource types to record You can elect to record all resources within the region (Default select) You can choose to include global resources as well. If your use case only involves the need to monitor resources of a specific type, you can also deselect the record all resources option, and specify only those resources you would like to track. Amazon S3 bucket Amazon SNS topic AWS Config role If you choose to have a new role created, AWS automatically builds a role with read-only access to those resources specified for the configuration. Let’s look at some best practices to adopt when using AWS Config for your resource management needs. 1: Centralize Administration If you oversee multiple AWS accounts, it is wise to invest in some initial planning. Determine where you want to store the information and who will need access to it. When selecting the Amazon S3 bucket and SNS topic during the initial configuration, you can specify a bucket and topic in another account. By consolidating your information, you’ll save a lot a time and headaches when the time comes for auditing and generating reports. 2: Standardize Tagging Developing a tagging standard will take some effort, and you may experience some pushback from your development team, but the initial investment will pay you back in dividends many times over. Develop a tagging standard for your organization that includes information for each resource, such as: Resource Owner – Team, Cost Center, or other taxonomic type information. Environment – Production, Test, Development Application/Service Role – Web, Database, etc. Tag anything you can and include as much information as is feasible. Most AWS resources support custom user tags, and with the help of your DevOps team, you may be able to automate most of the work in applying tags to resources as they are created and deployed into your AWS ecosystem. The investment in defining and applying a tagging standard will be invaluable when you need to identify resources based on the environment, assign costing to specific teams and owners, or when you want to do detailed reports on resource usage within your organization. 3: Automate, Automate, Automate The second step when enabling AWS Config is the inclusion of rules. In cloud environments where the number of resources appears to grow exponentially as your organization does, automation is your friend. By setting up automated processes to monitor your account for specific conditions, you’ll be able to keep up on configuration changes and be notified when updates to the environment fall outside of your organization’s security and configuration guidelines. Fig. 2 Set up Rules to Monitor Your Configuration Some of the rules that you may want to consider including are: Ensure that required tagging is in place on all relevant resources Validate that volumes are encrypted appropriately Notifications are sent when certificates and keys are set to expire. 4: Trust But Verify Once you have AWS Config enabled on your account, it’s a good idea to validate that everything is working as expected. Below are a couple of checks you can make to ensure that your AWS Config setup is working appropriately. Validate that you have enabled AWS Config in all regions for your accounts. The S3 bucket specified exists and that it is receiving logs as expected. The SNS topic exists and is receiving notifications. Global resources are included in your configuration setup.

Blog

How Much Data Comes From The IOT?

Blog

Resolving Issues with Your Application Using IIS Logs

Blog

Biggest AWS Security Breaches of 2017

Blog

Three Dragons to Slay in Threat Discovery and Investigation for the Cloud

Chinese dragon symbol. Threat correlation and prioritization (what do I pay attention to in an avalanche of highlighted threats?) and threat investigation (how do I decide what happened and what to do quickly?) are extremely challenging core functions of the security defense, resulting in many cases with less than 10% of high priority threats fully investigated. The accelerating migration to cloud and modern application deployment are making these already difficult workflows untenable in traditional models, leading to questions such as how to gather and correlate all of the new sources of data at cloud scale? How to understand and triangulate new dynamic data from many layers in the stack? How to react with the pace demanded by new models of DevSecOps deployment? And how to collaborate to connect the dots across evolving boundaries and silos? Last week a veteran of many cloud migration security projects I know described many SOCs as“groping in the dark” with these challenges and looking for a new approach despite all of the vendor claims mapped to their pains. The usual crowd of incremental enhancements (e.g. bringing cloud data into the traditional SIEM, automating manual workflows, layering more tools for specialized analytics, leveraging wisdom of crowds, etc.) leaves three dragons roaming the countryside which need to be slain for security to keep pace with the unstoppable accelerating migration to the cloud. Dragon #1 – Siloed Security and IT Ops Investigation Workflows A basic dilemma in security for the cloud is that often the knowledge needed to pursue an investigation to conclusion is split between two groups. Security analysts understand the process of investigation and the broad context, but often only IT ops understands the essential specific context – application behavior and customer content, for example – needed to interpret and hypothesize at many steps in a security investigation. A frequent comment bucket item goes something like, “The SOC understands the infrastructure, but they don’t know how to interpret app logs or new data sources like container orchestration.” This gap in understanding makes real time collaboration essential to prevent exploding backlogs, partial investigations, and bias toward more solvable on-prem alerts. Aside from needing to understand unfamiliar, new, and rapidly changing data sources in a single security investigation, cloud deployments generate more frequent “Dual Ticket” cases in which it is unknown whether a security issue or an IT issue is the root cause (ex: my customer is complaining they can’t access our app – network congestion? Cloud provider outage? Server CPU overload? DDoS attack? Malware? Customer issue?) It isn’t just that two separate investigations take more time and resources to complete and integrate, often, in cloud cases, neither side can reach conclusion without the other. Working from common data isn’t enough – analytics and workflow need to be common as well to enable the seamless collaboration required. In addition, modern cloud deployments often employ DevSecOps models in which the pace of application update, rollout, and change is measured in days or hours as opposed to months or quarters. One security threat investigation implication is that the processing of the threat resolution backlog must align so that current resources can be applied to current environments without being mired in “old” cases or chasing continuous flux in the data. This is challenge enough, but having to manage this triage across two separate backlogs in both IT and security with the usual integration taxes means operating on the scale of hours and days is extremely challenging. While separate siloes for IT ops and security investigations were feasible and logical in on-prem classic IT, modern cloud deployments and application architecture demand a seamless back and forth workflow where at each step the skills and perspective from both IT and security are needed to properly interpret the results of queries, evidence uncovered, or unfamiliar data. Asking both sides to completely subsume the knowledge of the other is unrealistic in the short term – a much better solution is to converge their workflows so they can collaborate in real time. Dragon #2 – Traditional Security Bias on Infrastructure vs. Application Insight Traditional SIEMs have long been exhorted to look up the stack to the application layer, and in several instances new product areas have sprung up when they have not. In the cloud world this application layer “nice to have” becomes a “must have.” Clould providers have taken on some of the infrastructure defense previously done by individual companies, creating harder targets that cause attackers to seek softer targets. At the same time, much of the traditional infrastructure defense from the on-prem world has not yet been replicated in the cloud, so often application layer assessment is the only investigation method available. In addition to the defensive need to incorporate the application layer, there clearly is additional insight at that layer which is unknown at the infrastructure layer (e.g. customer context, behavioral analytics, etc.). This is particularly true when it is unclear whether a security or an IT problem exists. Many point systems specialize in extracting actionable insight from this layer, but the holistic correlation and investigation of threat is more difficult, in part because of wide variations in APIs, log formats, and nomenclature. Looking forward, modern application deployment in the cloud also increases the surface area for investigation and threat assessment. For example, chained microservices create many possible transitions in variables important to investigators. For all of these reasons, adding insight from the application layer is necessary and good for cloud deployments, but integrating this insight quickly with infrastructure insight is better. Many investigation workflows jump back and forth across these layers several times in a single step, so fully integrated workflows will be essential to leverage the assimilation of new insight. Dragon #3 – Investigation Times Measured in 10s of Minutes and Hours In cloud and modern application deployment, the sheer volume of incoming data will make yesterday’s data avalanche seem like a pleasant snow dusting. Also, dynamic and transient data, entities, and nomenclature make workflows straightforward (although still slow and annoying) in the old world (e.g. track changing IP addresses for a user or machine) extremely challenging in the cloud. Finally, collaboration will require new models of distributed knowledge transfer since investigation workflows will be shared across both security and IT ops. Many SOCs are at the breaking point in traditional environments with growing backlogs of investigations and reactive triage. Achieving investigation times in minutes to keep pace in the cloud despite these additional challenges, will require breakthrough innovation in getting rapid insight in huge dynamic data sets and in scaling learning models across both humans and machines. Slaying these dragons will not be easy or quick – new solutions and thinking will collide with comfort zones, entrenched interests, perceived roles of people and process, and more than a few “sacred cows.” Despite these headwinds – I’m optimistic looking ahead based on two core beliefs: 1) The massive economic and technological leverage of the cloud has already led to many other transition dragons of comparable ferocity being attacked with zeal (e.g. DevSecOps, Data Privacy, Regional Regulation, etc.), and 2) unlike many other transitions a broad cross section of the individuals involved in these messy transitions on the front lines have far more to gain in the leap forward of their own skills, learning, and opportunity than they have to lose. Aside from that, the increasingly public scorecard of the attackers vs. the defenders will help keep us honest about progress along the way.

Blog

Kubernetes Security Best Practices

Blog

Docker Logging Example

Docker is hard. Don't get me wrong. It's not the technology itself that is difficult...It's the learning curve. Committing to a Docker-based infrastructure means committing to a new way of thinking, which can be a harsh adjustment from the traditional thinking behind bare metal and virtualized servers. Because of Docker's role-based container methodology, simple things like log management can seem like a bear to integrate. Thankfully, as with most things in tech, once you wrap your head around the basics, finding the solution is simply a matter of perspective and experience. Collecting Logs When it comes to aggregating Docker logs in Sumo Logic, the process starts much like any other: Add a Collector. To do this, open up the Sumo Logic Collection dashboard and open up the Setup Wizard. Because we will be aggregating logs from a running Docker container, rather than uploading pre-collected logs, select the Set Up Streaming Data option in the Setup Wizard when prompted. Next up, it is time to select the data type. While Docker images can be based on just about any operating system, the most common base image—and the one used for this demonstration—is Linux-based. After selecting the Linux data type, it's time for us to get into the meat of things. At this point, the Setup Wizard will present us with a script that can be used to install a Collector on a Linux system. The Dockerfile While copying and pasting the above script is generally all that is required for a traditional Linux server, there are some steps required to translate it into a Docker-friendly environment. To accomplish this, let's take a look at the following Dockerfile: <strong>FROM</strong> ubuntu <strong>RUN</strong> apt-get update <strong>RUN</strong> apt-get install -y wget nginx <strong>CMD</strong> /etc/init.d/nginx start && tail -f /var/log/nginx/access.log That Dockerfile creates a new container from the Ubuntu base image, installs NGINX, and then prints the NGINX access log to stdout (which allows our Docker image to be long-running). In order to add log aggregation to this image, we need to convert the provided Linux Collector script into Docker-ese. By replacing the sudo and && directives with RUN calls, you'll end up with something like this: <strong>RUN</strong> wget "https://collectors.us2.sumologic.com/rest/download/linux/64" -O SumoCollector.sh <strong>RUN</strong> chmod +x SumoCollector.sh <strong>RUN</strong> ./SumoCollector.sh -q -Vsumo.token_and_url=b2FkZlpQSjhhcm9FMzdiaVhBTHJUQ1ZLaWhTcXVIYjhodHRwczovL2NvbGxlY3RvcnMudXMyLnN1bW9sb2dpYy5jb20= Additionally, while this installs the Sumo Logic Linux Collector, what it does not do is start up the Collector daemon. The reason for this goes back to Docker's "one process per container" methodology, which keeps containers as lightweight and targeted as possible. While this is the "proper" method in larger production environments, in most cases, starting the Collector daemon alongside the container's intended process is enough to get the job done in a straightforward way. To do this, all we have to do is prefix the /etc/init.d/nginx start command with a /etc/init.d/collector start && directive. When all put together, our Dockerfile should look like this: <strong>FROM</strong> ubuntu <strong>RUN</strong> apt-get update <strong>RUN</strong> apt-get install -y wget nginx <strong>RUN</strong> wget "https://collectors.us2.sumologic.com/rest/download/linux/64" -O SumoCollector.sh <strong>RUN</strong> chmod +x SumoCollector.sh <strong>RUN</strong> ./SumoCollector.sh -q -Vsumo.token_and_url=b2FkZlpQSjhhcm9FMzdiaVhBTHJUQ1ZLaWhTcXVIYjhodHRwczovL2NvbGxlY3RvcnMudXMyLnN1bW9sb2dpYy5jb20= <strong>CMD</strong> /etc/init.d/collector start && /etc/init.d/nginx start && tail -f /var/log/nginx/access.log Build It If you've been following along in real time up to now, you may have noticed that the Set Up Collection page hasn't yet allowed you to continue on to the next page. The reason for this is that Sumo Logic is waiting for the Collector to get installed. Triggering the "installed" status is as simple as running a standard docker build command: docker build -t sumologic_demo . Run It Next, we need to run our container. This is a crucial step because the Setup Wizard process will fail unless the Collector is running. docker run -p 8080:80 sumologic_demo Configure the Source With our container running, we can now configure the logging source. In most cases, the logs for the running process are piped to stdout, so unless you take special steps to pipe container logs directly to the syslog, you can generally select any log source here. /var/log/syslog is a safe choice. Targeted Collection Now that we have our Linux Collector set up, let's actually send some data up to Sumo Logic with it. In our current example, we've set up a basic NGINX container, so the easiest choice here is to set up an NGINX Collector using the same Setup Wizard as above. When presented with the choice to set up the Collection, choose the existing Collector we just set up in the step above. Viewing Metrics Once the Collectors are all set up, all it takes from here is to wait for the data to start trickling in. To view your metrics, head to your Sumo Logic dashboard and click on the Collector you’ve created. This will open up a real-time graph that will display data as it comes in, allowing you to compare and reduce the data as you need in order to identify trends from within your running container. Next Steps While this is a relatively simplistic example, it demonstrates the potential for creating incredibly complex workflows for aggregating logs across Docker containers. As I mentioned above, the inline collector method is great for aggregating logs from fairly basic Docker containers, but it isn't the only—or best—method available. Another more stable option (that is out of the scope of this article) would be using a dedicated Sumo Logic Collector container that is available across multiple containers within a cluster. That said, this tutorial hopefully provides the tools necessary to get started with log aggregation and monitoring across existing container infrastructure.

Blog

AWS Config vs. CloudTrail

Blog

What You Need to Know About Meltdown and Spectre

Last week, a security vulnerability was announced involving the exploitation of common features in microprocessor chips that power computers, tablets, smartphones and data centers. The vulnerabilities known as “Meltdown” and “Spectre” are getting lot attention in the media, and no doubt people are concerned about its impact on business, customers, partners and more. Here’s what you really need to know about these vulnerabilities. What are Meltdown and Spectre? The Meltdown vulnerability, CVE-2017-5754, can potentially allow hackers to bypass the hardware barrier between applications and kernel or host memory. A malicious application could therefore access the memory of other software, as well as the operating system. Any system running on an Intel processor manufactured since 1995 (except Intel Itanium and Intel Atom before 2013) is affected. The Spectre vulnerability has two variants: CVE-2017-5753 and CVE-2017-5715. These vulnerabilities break isolation between separate applications. An attacker could potentially gain access to data that an application would usually keep safe and inaccessible in memory. Spectre affects all computing devices with modern processors manufactured by Intel or AMD, or designed by ARM*. These vulnerabilities could potentially be exploited to steal sensitive data from your computer, such as passwords, financial details, and other information stored in applications. Here is a great primer explaining these security flaws. What can be compromised? The core system, known as the kernel, stores all types of sensitive information in memory. This means banking records, credit cards, financial data, communications, logins, passwords and secret information could which is all be at risk due to Meltdown. Spectre can be used to trick normal applications into giving up sensitive data, which potentially means anything processed by an application can be stolen, including passwords and other data. Was the Sumo Logic platform affected? Yes. Practically every computing device affected by Spectre, including laptops, desktops, tablets, smartphones and even cloud computing systems. A few lower power devices, such as certain Internet of Things gadgets, are unaffected. How is Sumo Logic handling the vulnerabilities? As of January 4th, 2018, AWS confirmed that all Sumo Logic systems were patched, rebooted and protected from the recent Meltdown/Spectre vulnerability. We worked very closely with our AWS TAM team and verified the updates. Sumo Logic started the OS patching process with the latest Ubuntu release Canonical on January 9th. Risk level now that AWS has patched is low, but we will continue to be diligent in following up and completing the remediation process. We take this vulnerability very seriously and are dedicated to ensuring that Sumo Logic platform is thoroughly patched and continuously monitored for any malicious activity. If you have questions please reach out to secops@sumologic.com.

Blog

Kubernetes Development Trends

Blog

Logs and Metrics: What are they, and how do they help me?

Blog

2018 Predictions: ICO Frenzy, Advance of Multi-Cloud, GDPR and More

This is an exciting time to be in enterprise software. With the rise of serverless, the power of hybrid computing and the endless uses of artificial intelligence (AI), 2017 will likely go down as the most disruptive ever. But what does 2018 have in store? Sumo Logic executives weighed in in our yearly prediction series. Read on to see what they predict will influence the coming year in technology the most. Also keep these in mind and check back in mid-year to see how many come true! Demand for multi-cloud, multi-platform will drive the need for multi-choice Over the past few years, there has been much debate within enterprise IT about moving critical infrastructure to the cloud – specifically, around which cloud model is the most cost effective, secure and scalable. One thing is for certain – the cloud is the present (and future) of enterprise IT, and legacy companies continuing to predominantly or solely house their infrastructure on-premises to support existing or new modern applications will become increasingly irrelevant in a few years from now as their competitors will prevail. Moreover, this problem is further exacerbated as cloud users are demanding choice, which is going to drive massive growth in multi-cloud, multi-platform adoption in 2018. As a result, enterprises will need a unified cloud native analytics platform that can run across any vendor, whether it’s Amazon, Microsoft or Google, including what’s traditionally running on-premise. This agnostic model will serve as the backbone for the new world I refer to as the analytics economy, defined by positive disruption at every layer of the stack. –Ramin Sayar, CEO, Sumo Logic ICO creates another Wild West Next year we will begin to see the results from the growth in the initial coin offering (ICO) frenzy from this year. Investors have poured more than $300B into more than 200 ICOs this year, but it’s still a very unregulated market with a mix of players attempting to gain early entry. While legitimate companies are seeking and benefiting from crypto-token funding, there is still a lot of dubious activity from questionable characters in the space trying to make a quick buck. If crypto-token equity begins to take hold and legitimizes its worth to investors who pursued ICOs this year, then in 2018 the startup market will become the wild freaking west. AI will not transform the enterprise in the near future Previous predictions and claims about the direct impact of AI on enterprises have been overblown. There is excessive hype around how AI will lead us to new discoveries and medical breakthroughs. However, those expecting AI to be the ultimate truth conveyer are mistaken. It will be very hard to design a model that can determine unbiased truth, because human bias – whether explicitly or implicitly – will be coded into these data analytics systems and reinforce existing beliefs and prejudices. With that said, there are certain applications where systems can make better decisions in a shorter amount of time than humans, such as in the case of autonomous vehicles. In 2018 we will begin to see real use cases of the power of AI appear in our everyday lives — it just isn’t ready to be the shining star for the enterprise quite yet. When you look at the maturity of the enterprise, only half of the Global 2000 offer fully digital products. So, despite all of the buzz around digital transformation, there’s a lot of catch-up to be done before many of these companies can even consider looking at advanced developments such as AI. — Christian Beedgen, CTO, Sumo Logic GDPR regulations will turn massive tech companies into walking targets It won’t take long after the May 25 GDPR deadline before the gloves come off and the European Union cracks down on audits of big tech companies. We’re talking about Uber, Google, Apple and so forth. This will be EU’s effort to reinforce the severity of meeting GDPR regulations and to show that no business – not even the household names – will be immune to complying with GDPR stands. After the EU cracks down on the big tech companies, financial institutions and travel companies will be next, as these types of organizations are the most globalized industries, where data flows freely across geographical borders. And regardless of the EU’s efforts, the reality is that many companies won’t meet the May deadline, whether due to lack of resources, laziness or apathy. You better believe that those businesses that don’t get on board – and get caught – will be crushed, as business will come to a grinding halt. Government will continue to fall flat with security If I were a hacker, I would target the path of least resistance, and right now – and into 2018 – that path collides squarely with government agencies. What’s scary is that government organizations hold some of our most critical data, such as social security numbers, health records and financial information. It’s shocking how the government generally lags in terms of security and technology innovation. Over the past few years the government has been a prime target for bad actors. Take a look at the Office of Personnel Management breach in 2015, and more recently the hacks into the Department of Homeland Security and FBI in 2016. Next year will be no different. Even with all of the panels, hearings and legislation, such as the Modernizing IT Act and the executive order, reaffirming its commitment to updating and implementing stronger cybersecurity programs, the government is already 10-15 years behind, and I don’t see this improving over the next year. Millennials will be our security saving grace Millennials will inspire a societal shift in the way we view security and privacy. If you follow the data, it’ll make sense. For instance, Facebook is now most popular among adults age 65 and older. It’s less appealing to younger generations who’ve moved on to newer, more secure ways to express themselves, such as disappearing video chats with Snapchat. As social media evolves, privacy, user control/access and multi-factor authentication have become a natural part of protecting online identity, for both users and developers alike. My personal resolution for 2018 is to step up my mentorship to this younger generation. If we can encourage them to channel this “Security First” way of thinking in a professional capacity, we can continue to build a resilient and robust cybersecurity workforce that makes us all more secure. –George Gerchow, VP of Security and Compliance, Sumo Logic Now that you have read Sumo’s, tell us your predictions for the coming year. Tweet them to us at @SumoLogic

December 15, 2017

Blog

Finding and Debugging Memory Leaks with Sumo

Memory leaks happen when programs allocate more memory than they return. Memory is beside Compute one of the critical assets of any computer system. If a machine runs out of memory, it cannot provide its service. In the worst case, the entire machine might crash and tear down all running programs. The bugs responsible for that misbehavior are often hard to find. Sumo’s collector enables monitoring memory consumption out of the box. Using some additional tooling, it is possible to collect fine-grained logs and metrics that accelerate finding and efficient debugging of memory leaks. Ready to get started? See all the ways the Sumo Logic platform helps monitor and troubleshoot—from a seamless ingestion of data, to cross-platform versatility, and more. You can even get started for free.Free Trial Memory Management and Memory Leaks Memory management is done on multiple levels: The Operating System (OS) keeps track of memory allocated by its program in kernel space. In user space, virtual machines like the JVM might implement their own memory management component. At its core, memory management follows a Producer-Consumer pattern. The OS or VM gives away (produces) chunks of memory whenever programs are requesting (consuming) memory. Since memory is a finite resource in any computer system, programs have to release the allocated memory that is then returned to the pool of available memory managed by the producer. For some applications, the programmer is responsible for releasing memory, in others like the JVM a thread called garbage collector will collect all objects that are used no more. A healthy system would run through this give-and-take in a perfect circle. In a bad system, the program fails to return unused memory. This happens for example if the programmer forgets to call the functionfree, or if some objects keep on being referenced from a global scope after usage. In that case, new operations will allocate more memory on top of the already allocated, but unused memory. This is misbehavior is called a memory leak. Depending on the size of the objects this can be as little as a few bytes, kilobytes, or even megabytes if the objects, for example, contain images. Based on the frequency the erroneous allocation is called, the free space fills up as quickly as a few microseconds or it could take months to exhaust the memory in a server. This long time-to-failure can make memory leaks very tricky to debug because it is hard to track an application running over a long period. Moreover, if the leak is just a few bytes this marginal amount gets lost in the noise of common allocation and release operations. The usual observation period might be too short to recognize a trend. This article describes a particularly interesting instance of a memory leak. This example uses the Akka actor framework, but for simplicity, you can think of an actor as an object. The specific operation in this example is downloading a file: An actor is instantiated when the user invokes a specific operation (download a file) The actor accumulates memory over its lifetime (keeps adding to the temporary file in memory) After the operation completes (file has been saved to disk), the actor is not released The root cause of the memory leak is that it can handle only one request and it is useless after saving the content of the file. There are no references to the actor in the application code, but there still is a parent-child relationship defined in the actor system that defines a global scope. From After-the-Fact Analysis to Online Memory Supervision Usually, when a program runs out of memory it terminates with an “Out of Memory” error or exception. In case of the JVM, it will create a heap dump on termination. A heap dump is an image of program’s memory at the termination instant and saved to disk. This heap dump file can then be analyzed using tools such as MemoryAnalyzer, YourKit, or VisualVM for the JVM. These tools are very helpful to identify which objects are consuming what memory. They operate, however, on a snapshot of the memory and cannot keep track of the evolution of the memory consumption. Verifying that a patch works is out of the scope of these tools. With a little scripting, we can remediate this and use Sumo to build an “Online Memory Supervisor” that stores and processes this information for us. In addition to keeping track of the memory consumption history of our application, it saves us from juggling around with heap dump files that can potentially become very large. Here’s how we do it: 1. Mechanism to interrogate JVM for current objects and their size The JVM provides an API for creating actual memory dumps during runtime, or just retrieve a histogram of all current objects and their approximate size in memory. We want to do the latter as this is much more lightweight. The jmap tool in the Java SDK makes this interface accessible from the command line: jmap -histo PID Getting the PID of the JVM is as easy as grepping for it in the process table. Note that in case the JVM runs as a server using an unprivileged user, we need to run the command as this user via su. A bash one-liner to dump the object histogram could look like: sudo su stream -c"jmap -histops -ax | grep "[0-9]* java" | awk '{print $1}' > /tmp/${HOSTID}_jmap-histo-`date +%s`.txt" 2. Turn result into metrics for Sumo or just drop it as logs As a result of the previous operation, we have now a file containing a table with object names, count, and retained memory. IN order to use it in Sumo we’ll need to submit it for ingestion. Here we got two options: (a) send the raw file as logs, or (b) convert the counts to metrics. Each object’s measurement is a part of a time series tracking the evolution of the object’s memory consumption. Sumo Metrics ingest various time series input formats, we’ll use Graphite because it’s simple. To affect the conversion of a jmap histogram to Graphite we use bash scripting. The script cuts beginning and end of the file and then parses the histogram to produce two measurements: <class name, object count, timestamp> <class name, retained size, timestamp> Sending these measurements to Sumo can be done either through Sumo’s collector, using collectd with Sumo plugin, or sending directly to the HTTP endpoint. For simplicity, we’ve used the Graphite format and target the Sumo collector. To be able to differentiate both measurements as well as different hosts we prepend this information to the classpath: <count|size>.<host>.classpath For example, a jmap histogram might contain data in tabular form like: 69: 18 1584 akka.actor.ActorCell 98: 15 720 akka.actor.RepointableActorRef 103: 21 672 akka.actor.ChildActorPath 104: 21 672 akka.actor.Props Our script turns that into Graphite format and adds some more hierarchy to the package name. In the next section, we will leverage this hierarchy to perform queries on objects counts and sizes. count.memleak1.akka.actor.ActorCell 18 123 count.memleak1.akka.actor.RepointableActorRef 15 123 count.memleak1.akka.actor.ChildActorPath 21 123 count.memleak1.akka.actor.Props 21 123 In our case, we’ll just forward these logs to the Sumo collector. Previously, we’ve defined a Graphite source for Metrics. Then, it’s as easy as cat histogram-in-graphite | nc -q0 localhost 2003. 3. Automate processing via Ansible and StackStorm So far we are now capable of creating a fine-grained measurement of an application’s memory consumption using a couple of shell commands and scripts. Using the DevOps automation tools Ansible and StackStorm, we can turn this manual workflow in an Online Memory Supervision System. Ansible helps us to automate taking the measurement of multiple hosts. For each individual host, it connects to the hosts via ssh, runs the jmap command, the python conversion script, and submits the measurement to Sumo. StackStorm manages this workflow for us. In a given period, it kicks off Ansible and logs the process. In case something goes wrong, it defines remediation steps. Of course, there are alternatives to the myriad of available tools. Ansible competes with SaltStack, Chef, and Puppet. StackStorm is event-driven automation with all bells and whistles, for this example, we could have used a shell script with sleepor a simple cron job. Using Sumo to Troubleshoot Memory Leaks Now it’s time to use Sumo to analyze our memory. In the previous steps, we have submitted and ingested our application’s fine-grained memory consumption data. After this preparation, we can leverage Sumo to query the data and build dashboards. Using queries, we can perform in-depth analysis. This is useful as part of a post-mortem analysis to track down a memory leak, or during development to check, if a memory allocation/deallocation scheme actually works. During runtime, dashboards could monitor critical components of the application. Let’s check this out on a live example. We’ll use a setup of three JVMs simulating an application and a StackStorm instance. Each is running in their own Docker container, simulating a distributed system. To make our lives easier, we orchestrate this demo setup using Vagrant: Figure 1: Memory leak demo setup and control flow A Memory Measurement node orchestrates the acquisition process. We’ve developed a short Ansible script that connects to several application nodes and retrieves a histogram dump from the JVMs running the faulty program from [1]. It converts the dumps to Graphite metrics and sends them via the collector to Sumo. StackStorm periodically triggers the Ansible workflow. Finally, we use the UI to find and debug memory leaks. Analyze memory consumption First, we want to get an overview of what’s going on in the memory. We start to look at the total memory consumption of a single host. A simple sum over all objects sizes yields the application’s memory consumption over time. The steeply increasing curve abruptly comes to an end at a total of about 800 Mb. This is the total memory that we dispatched to the JVM (java -Xmx800m -jar memleak-assembly-0.1.jar). Figure 2: Total memory consumption of host memleak3 Drilling down on top memory consumers often hints at the responsible classes for a memory leak. For that query, we parse out all objects and sum their counts and sizes. Then we display only the top 10 counts. In the size query, we filter out objects above a certain size. These objects are the root objects of the application and do not contain much information. Figure 3: Top memory consumers on a single node Figure 4: To memory top consumers by size We find out that a Red-Black Tree dominates the objects. Looking at the Scala manual tells us that HashMaps make extensive use of this data structure: Scala provides implementations of immutable sets and maps that use a red-black tree internally. Access them under the names TreeSet and TreeMap. We know that ActorSystem uses HashMaps to store and maintain actors. Parsing and aggregating queries help to monitor entire subsystems of a distributed application. We use that to find out that the ActorSystem accumulates memory not only on a single host but over a set of hosts. This leads us to believe that this increase might not be an individual error, by a systemic issue. Figure 5: Use query parsing and aggregation operations to display the ActorSystem’s memory consumption A more detailed view of the Child actor reveals the trend how it accumulates memory. The trick in this query is that in the search part we filter out the packages inakka.actor.* the search expression and then use the aggregation part to parse out the single hosts and sum the size values of their individual objects. Since all three JVMs started at the same time, their memory usage increases at a similar rate in this picture. We can also split this query into three separate queries like below. These are looking at how the Child actors on all three hosts are evolving. Figure 6: The bad Child actor accumulating memory Finally, we verify that the patch worked. The latest chart shows that allocation and deallocation are now in balance on all three hosts. Figure 7: Memory leak removed, all good now Memory Analysis for Modern Apps Traditional memory analyzers were born in the era of standalone, desktop applications. Therefore, they work on snapshots and heap dumps and cannot track the dynamicity of memory allocation and deallocation patterns. Moreover, they are also restricted to work on single images and it is not easy to adapt them to a distributed system. Modern Apps have different requirements. Digital Businesses provide service 24/7, scale out in the cloud, and compete in terms of feature velocity. To achieve feature velocity, detecting memory issues online is more useful than after-the-fact. Bugs such as memory leaks need rapid detection and bugfixes inserted frequently and without stopping services. Pulling heap dumps and starting memory analyzers just won’t work in many cases. Sumo takes memory analysis to the next level. Leveraging Sumo’s Metrics product we can track memory consumption for classes and objects within an application. We look at aggregations of their counts and sizes to pinpoint the fault. Memory leaks are often hard to find and need superior visibility into an application’s memory stack to become debuggable. Sumo achieves this not only for a single instance of an application but scales memory analysis across the cloud. Additionally, Sumo’s Unified Logs and Monitoring (ULM) enables correlating logs and metrics and facilitates understanding the root cause of a memory leak. Bottom Line In this post, we showed how to turn Sumo into a fine-grained, online memory supervision system using modern DevOps tools. The fun doesn’t stop here. The presented framework can be easily extended to include metrics for threads and other resources of an application. As a result of this integration, developers and operators gain high visibility in the execution of their application. References Always stop unused Akka actors – Blog Post Acquire object histograms from multiple hosts – Ansible Script Sumo’s Modern Apps report – BI Report

Blog

Monitor AWS Lambda Functions with Sumo Logic

Blog

Optimizing Cloud Security: Amazon GuardDuty and Sumo Logic

Security concerns and skill shortages continue to impede cloud adoption Migration to the cloud is still being hampered by the security concerns this new frontier poses to these organizations and due to the same cybersecurity skills gaps already present in many if not most of these organizations today. This was highlighted in a 2017 survey by Forbes where 49% of respondents stated that they were delaying cloud deployment due to a cyber security skills gap. And even with adequate staffing, those organizations who have adopted some facet of cloud into their organization, express concerns in their abilities to monitor and manage these new environments. Sumo Logic and Amazon GuardDuty to the rescue Sumo Logic was founded over seven years ago, by security industry professionals, as a secure, cloud-native, machine data analytics platform, to convert machine data into real-time continuous intelligence, providing organizations with the full-stack visibility, analytics and insights they need to build, run and secure their modern applications and cloud infrastructures. The Sumo Logic platform provides security analytics and visibility across the entire AWS environment with context derived from details such as user access, platform configurations, changes, and with the ability to generate audit trails to demonstrate compliance with industry standards. Sumo Logic also correlates analytics from Crowdstrike threat intelligence to identify risks and threats in the AWS environment such as communications with malicious IPs, URLs, or Domains. At AWS’ annual re:Invent 2017 conference in Las Vegas this week, they announced the availability of Amazon GuardDuty. GuardDuty, provides AWS users with a continuous security monitoring and threat detection service. And due to Sumo Logic’s strong, and long standing relationship with AWS, Sumo Logic was provided early access to the beta version of GuardDuty, which allowed the team to develop, announce and release in parallel with Amazon, the complimentary Sumo Logic Amazon GuardDuty App. Click to enlarge The way GuardDuty works is by gathering log data from three distinct areas of the AWS cloud environment including: AWS Virtual Private Cloud (VPC) “flow logs” AWS CloudTrail “event logs” AWS Route 53 DNS “query logs” Along with the log data above, AWS provides additional sources of context (including threat intel associated with the AWS environment) to provide users with identification of potential threats in their environments. These potential threats are called “findings” by GuardDuty. Each “finding” provides users with details about each of the threats identified so that they can take any necessary action as needed. “Findings” details include to following information: Last seen – the time at which the activity took place that prompted the finding. Count – the number of times the finding was generated. Severity – the severity level (High, Medium, or Low) High – recommendation to take immediate remediation steps. Medium – investigate the implicated resource at your earliest convenience. Low – suspicious or malicious activity blocked. No immediate action needed. Finding Type – details and include the: Threat Purpose (more details available in the GuardDuty User Guide): Backdoor Behavior Cryptocurrency Pentest Recon Stealth Trojan UnauthorizedAccess Resource Type Affected: with the initial release of GuardDuty “only EC2 instances and IAM users (and their credentials) can be identified in findings as affected resources” Threat Family Name: the overall threat or potential malicious activity detected. Threat Family Variant: the specific variant of the Threat Family detected. Artifact: a specific resource owned by a tool used in the attack. Region – the region in which the finding was generated. Account ID – the ID of the AWS account in which the activity took place t Resource ID – the ID of the AWS resource against which the activity took place Target – the area of your AWS infrastructure where GuardDuty detected potentially malicious or anomalous activity Action – the activity that GuardDuty perceived to be potentially malicious or anomalous. Actor – the user that engaged in the potentially malicious or unexpected activity The Sumo Logic Amazon GuardDuty App Value-Add Pre-built Sumo Logic GuardDuty dashboards: Sumo Logic provides a single pane of glass to reduce the complexity of managing multiple environments, with pre-configured, user friendly and customizable dashboards that take GuardDuty’s linear data format and layers-on rich graphical reporting and depictions of trends over time. Click to enlarge Click to Fix: The Sumo Logic Amazon GuardDuty App allows users to rapidly, and visually identify “findings”, ranked by their severity levels (high, medium, and low), and can simply click on any of them to be automatically routed to their AWS environment to take any necessary actions for remediation. Value-added Context: The Sumo Logic Amazon GuardDuty App adds additional sources of analytics for deeper and wider visibility in the AWS environment and context across the organization including full stack visibility into application/infra logs, Application/Elastic Load Balancer (ALB/ELB) performance, and supplemental threat intel provided by Crowdstrike with no additional fees. The new Amazon GuardDuty offering along with capabilities from Sumo Logic’s tightly integrated GuardDuty App provides organizations with the tools they need to more simply and effectively manage and monitor their AWS cloud environments. And with the visibility for more rapid detection and remediation of real and potential threats to mission critical resources in those environments. Get the Sumo Logic Amazon GuardDuty App Sign up for Sumo Logic instantly and for free Watch the Sumo Logic product overview video.

Blog

Monitoring k8s-powered Apps with Sumo Logic

Blog

The Countdown to AWS re:Invent 2017

I travel to a lot of conferences over the course of the year. But the grand poohbah of them all (and one of my personal favorites) is AWS re:Invent. This has quickly become the must-attend tech event, and with more than 40,000 attendees expected in Las Vegas, this year will no doubt be one for the books. The Sumo Logic team will be there in full force showcasing our real-time machine data analytics platform and how we help businesses get the continuous intelligence needed to build, run and secure modern applications to accelerate digital business transformation. Here’s a rundown of some of our key activities: Sumo Logic Breakout Presentation: Making the Shift to Practical DevSecOps Agility is the cornerstone of the DevOps movement with security best practices and compliance are now the responsibility of everyone in the development lifecycle. Our VP of Security and Compliance George Gerchow will be presenting on Tuesday, Nov. 28 at 2:30 pm at the Aria Hotel. Swing by to learn best practices for making the shift to DevSecOps leveraging the CIS AWS Foundation Benchmarks. Visit us at Booth #1804 Stop by our booth to learn more about the power of real-time machine data analytics and how to centralize your data and turn analytics into business, operational, and security insights for full stack visibility of your AWS workloads. See live demos, talk to our technical experts and pick up limited edition swag! Join us at the Modern App Ecosystem Jam On Wednesday, Nov. 29 we will be hosting a party with our awesome partner ecosystem celebrating today’s new world of building, running and securing modern applications. No presentations, no pitches, just an evening networking with peers. Take a break from AWS re:Invent and join us for yummy Cuban appetizers, specialty mojitos and drinks, cigar rollers, entertainment, swag and more! Space is limited – secure your spot today! Closed-Loop Security Analytics with CloudPassage Throughout the week Sumo Logic will be co-presenting with CloudPassage to highlight our joint integration which gives users a comprehensive, real-time view of security and compliance postures while rapidly detecting and containing attacks. Stop by the CloudPassage booth #913 to learn more. Follow the Conversations on Social We will be live tweeting, posting Facebook Live videos and photos throughout the week. Twitter: @SumoLogic LinkedIn: www.linkedin.com/SumoLogic Facebook: https://www.facebook.com/Sumo.Logic/ For a full list of events and news, check out our re:Invent events page. We look forward to seeing you in Las Vegas next week!

Blog

Christian's Musings from Web Summit 2017

I was able to attend my third Web Summit last week. This is the second time for me in Lisbon, as I was lucky enough to be invited to talk on the Binate.io stage again after last year. If you are interested, check out my musings on instinct, intuition, experience and data analytics. Web Summit has grown tremendously since I first attended the Dublin incarnation in 2013. This year, the event was sold out at 60,000 attendees (!) - the Portuguese came out in force, but it was very clear that this event is, while of course drawing most attendees from all across Europe, ultimately an international affair as well. With so many people attending, Web Summit can be rather overwhelming. There is a bit of everything, and an incredible crowd of curious people. Lisbon is fantastically beautiful city, off the beaten path when it comes to tech conferences mostly, so the local folks are really coming out in force to take in the spectacle. So, what is Web Summit? Originally started in Dublin in 2009, it has over the years become a massive endeavor highlighting every conceivable aspect of technology. There's four massive conference halls with multiple stages for speakers and podium discussions in each hall. Christian Beedgen on binate.io stage - Web Summit, Lisbon 2017 Then there is the main arena holding 20,000 people; this is where the most high-profile keynote speakers hit the stage. Web Summit has always brought in government officials and politicians to the show as well in an effort to promote technology. I was actually standing next to Nigel Farage at the speaker cloak room waiting for my coat. There was another guy there as well who was already berating this unfortunate character, so thankfully I didn't have to do it myself. I managed to catch a couple of the keynotes in the aforementioned large arena. Three of them left an impression. Firstly, it was great to see Max Tegmark speak. I am reading his current book, Life 3.0, right now, and it is always a bit of trip when the author suddenly appears on a stage and you realize you have to throw away your mental image of that voice in your head that has been speaking to you from the pages of the book and adopt reality. In this case however, this was not a negative, as Max came across as both deeply knowledgeable and quite relaxed. He looked a bit like he is playing in the Ramones with his black leather jacket and black jeans; this I didn't see coming. In any case, I highly recommend checking out what he has to say. In light of the current almost bombastically overblown hype around AI, he is taking a very pragmatic view, based on many years of his own research. If you can imagine a future of "beneficial AI", check out his book, Life 3.0, for why and how we have a chance to get there. I was also impressed by Margrethe Vestager. She is a Danish politician and currently the European Commissioner for Competition. She captured the audience by simply speaking off of a couple of cue cards, not PowerPoint slides at all. Being a politician, she was casting a very official appearance, of course - but she wore some sick sneakers to a conservative dress which I thought was just awesome. Gotta love the Danish! Her talk centered around the reasoning behind the anti-trust investigation she brought against Google (which eventually lead to a $2.7 billion fine!) The details are too complicate to be reasonably summarized here, but essentially centered around the fact that while nobody in the EU has issues with Google's near-monopoly on search, in the eyes of the competition watchdogs, for Google to use this position to essentially favor their own products in search results creates intolerable fairness issues for other companies. It is very interesting to see how these views are developing outside of the US. The third and last memorable session had animated AI robots dialoguing with their inventor, Einstein, Artificial General Intelligence, distributed AI and models and the blockchain. It was by and large only missing Taylor Swift. SingularityNET is a new effort to create an open, free and decentralized market place for AI technology, enabled by Smart Contracts. I frankly don't have the slightest clue how that would work, but presenter Ben Goertzel was animatedly excited about the project. The case for needing an AI marketplace for narrow AIs to compose more general intelligences was laid out in a strenuous "discussion" with "lifelike" robots from Hanson Robotics. It is lost on me why everybody thinks they need to co-opt Einstein; first Salesforce calls their machine learning features Einstein, now these robotics guys have an Einstein robot on stage. I guess the path to the future requires still more detours to the past. I guess Einstein can't fight back on this anymore and at least they are picking an exceptional individual... Now that I am back in the US for only a day, the techno-optimism that's pervasive at Web Summit feels like a distant memory already.

November 14, 2017

Blog

Monitor DynamoDB with Sumo Logic

What is DynamoDB ? DynamoDB is a fast and flexible NoSQL database service provided by AWS. This cloud based database supports both document and key-value store models. It was internally developed at AWS to address the need for an incrementally scalable, highly-available key-value storage system. Due to its auto scaling, high performance, and flexible data schema – it’s been widely adopted by various industries such as gaming, mobile, ad tech, IoT, and many other applications. Sumo Logic App for DynamoDB Sumo logic recently released an App for Amazon DynamoDB. It’s a unified Log and Metrics App, and collects data from two sources : CloudTrail API calls DynamoDB CloudWatch Metrics App covers following high level use cases: DynamoDB API calls (Create/Update/Delete Table), Geolocation, and User Info How to plan ‘Capacity’ of DynamoDB Capacity : Number of Read/Write per second per table Read/Write Throttle Events Successful and Throttle Requests by Table and Operation Name Latency and Errors User and System Error Count Latency by Table Name and Operation Name Conditional Check Failed Request by Table Name DynamoDB Overview Dashboard Key DynamoDB Performance Metrics Metrics Description Percent of Provisioned Write Consumed This metric tells you percentage of Provisioned Write Capacity consumed by Table. It should stay below 100%, if it exceeds, then DynamoDB can throttle requests. It’s calculated by : (ConsumedWriteCapacityUnits/ProvisionedWriteCapacityUnits) x 100 Percent of Provisioned Read Consumed This metric tells you percentage of Provisioned Read Capacity consumed by Table (s). It should stay below 100% – if it exceeds, then DynamoDB can throttle your requests. It’s calculated by : (ConsumedReadCapacityUnits/ProvisionedReadCapacityUnits) x 100 Read Throttle Events by Table and GSI Requests to DynamoDB that exceed the provisioned read capacity units for a table or a global secondary index. Write Throttle Events by Table and GSI Requests to DynamoDB that exceed the provisioned write capacity units for a table or a global secondary index. These Read/Write Throttle Events should be zero all the time, if it is not then your requests are being throttled by DynamoDB, and you should re-adjust your capacity. As for how much to provision for your table, it depends a lot on your workload. You could start with provisioning to something like 80% of your peaks and then adjust your table capacity depending on how many throttles you receive. Hence monitoring throttle helps you plan your capacity against your workload. Before you decide on how much to adjust capacity, consider the best practices at Consider Workload Uniformity When Adjusting Provisioned Throughput. Throttle Requests by Table and Operation Name Requests to DynamoDB that exceed the provisioned throughput limits on a resource. Number of Successful Requests by Table and Operation Name The number of successful requests (SampleCount) during specified time period User Error Count Number of requests to DynamoDB that generate HTTP 400 status code during the specified time period. An HTTP 400 usually indicates a client-side error such as an invalid combination of parameters, attempting to update a nonexistent table, or an incorrect request signature.To ensure your services are interacting smoothly with DynamoDB, this count should always be Zero. Latency by Table Name Time taken by DynamoDB in Milliseconds to complete processing the request. You should monitor this, and if this crosses normal threshold level or keep increasing, then you should get involved as it can impact the performance of your services. Sumo Logic’s Machine Learning based Outlier detector automatically lets you detect any outlier in latency, and also let you set up Metrics Monitor for alerting. System Error Count by Table and Operation Name Number of requests to DynamoDB that generate HTTP 500 status code during the specified time period. An HTTP 500 usually indicates an internal service error. This count should always be zero if your services are working fine, and if not then you should be immediately involved in debugging services generating 500 error code. Conditional Check Failed Request Count The number of failed attempts to perform conditional writes. The PutItem, UpdateItem, and DeleteItem operations let you provide a logical condition that must evaluate to true before the operation can proceed. If this condition evaluates to false, ConditionalCheckFailedRequests is incremented by one. Unify DynamoDB API calls and Metrics You can use Log overlay feature to identify if there is any correlation between number of API calls being made to a specific table and increased Latency on that table. Metrics Outlier and Alert Metrics outlier feature automatically identifies the data points which are not in normal range. Furthermore, you can configure different knobs available to filter out the noise. For example, here Sumo Logic’s machine learning algorithm automatically detects outlier in ReadThrottleEvents Count. For more info, see here Also, you can set up alerts, and then send an email or web-hook notifications if your metrics crosses a certain threshold. For more info, see here Secure your DynamoDB API calls. You can correlate DynamoDB CloudTrail events with Sumo Logic – CrowdStrike Threat Intel feed to secure your infrastructure from any malicious activity and user. Conclusion With Sumo Logic App for DynamoDB: You can monitor and alert on key dynamoDB metrics. You can detect any outlier in metrics. You can overlay DynamoDB API Calls with metrics to start debugging issues easily. You can find any malicious activities in DynamoDB environment by correlating CloudTrail events with Sumo Logic Threat Intel Offering. What’s next ? If you already have Sumo Logic account then DynamoDB App is available for free to use. If you are new to Sumo Logic, then start by signing up for free account here. Questions Thanks for reading! If you have any questions or comments feel free to reach out via email (ankit@sumologic.com) or LinkedIn

AWS

November 9, 2017

Blog

The Path to DevSecOps in 6 Steps

Blog

AWS Security Best Practices: Log Management

Blog

Apache Log Analysis with Sumo Logic

Blog

AWS ELB vs. NGINX Load Balancer

Blog

Apache Logs vs. NGINX Logs

Blog

Introducing the State of Modern Applications in the Cloud Report 2017

Blog

Packer and Sumo Logic - Build Monitoring Into Your Images

Whether you're new to automating your image builds with Packer, new to Sumo Logic, or just new to integrating Packer and Sumo Logic, this post guides you through creating an image with Sumo Logic baked in. We'll use AWS as our cloud provider, and show how to create custom machine images in one command that allow you to centralize metrics and logs from applications, OSs, and other workloads on your machines. Overview When baking a Sumo Logic collector into any machine image, you'll need to follow three main steps: First, create your sources.json file, and add it to the machine. This file specifies what logs and metrics you'd like to collect It's usually stored at /etc/sources.json, although you can store it anywhere at point to it Next, download, rename, and make the collector file executable. Collector downloads for various operating systems and Sumo Logic deployments can be found here An example command might look like: sudo wget 'https://collectors.us2.sumologic.com/rest/download/linux/64' -O SumoCollector.sh && sudo chmod +x SumoCollector.sh Finally, run the install script and skip registration. The most important part here is to use the -VskipRegistration=true flag so that the collector doesn't register to the temporary machine you are trying to built the image with Other important flags include -q > Run the script in quiet mode -Vephemeral=true > This tells Sumo Logic to auto-remove old collectors that are no longer alive, usually applicable for autoscaling use cases where VMs are ephemeral -Vsources=/etc/sources.json > Point to the local path of your sources.json file -Vsumo.accessid=<id> -Vsumo.accesskey=<key> > This is your Sumo Logic access key pair See all installation options here An example command might look like: sudo ./SumoCollector.sh -q -VskipRegistration=true -Vephemeral=true -Vsources=/etc/sources.json -Vsumo.accessid=<id> -Vsumo.accesskey=<key> Packer and Sumo Logic - Provisioners Packer Provisioners allow you to communicate with third party software to automate whatever tasks you need to built your image. Some examples of what you'd use provisioners for are: installing packages patching the kernel creating users downloading application code In this example, we'll use the Packer Shell Provisioner, which provisions your machine image via shell scripts. The basic steps that Packer will execute are: Start up an EC2 instance in your AWS account Download your sources.json file locally, which describes the logs and metrics you'd like to collect Download the Sumo Logic collector agent Run the collector setup script to configure the collector, while skipping registration (this creates a user.properties config file locally) Create the AMI and shut down the EC2 instance Print out the Amazon Machine Image ID (AMI ID) for your image with Sumo baked in Instructions: Packer and Sumo Logic Build Before You Begin To ensure Packer can access your AWS account resources, make sure you have an AWS authentication method to allow Packer to control AWS resources: Option 1: User key pair Option 2: Set up the AWS CLI or SDKs in your local environment I have chosen option 2 here so my Packer build command will not need AWS access key pair information. After setting up your local AWS authentication method, create a Sumo Logic free trial here if you don't already have an account. Then, generate a Sumo Logic key pair inside you Sumo Logic account. Copy this key down, as the secret key will only be shown once. Step 1 - Get Your Files After downloading Packer, download the Packer+Sumo_template.json and the packer_variables.json files, and place all 3 in the same directory. Step 2 - Customize Variables and Test Your Template Use the command ./packer validate packer_sumo_template.json to validate your packer template. This template automatically finds the latest Amazon Linux image in whatever region you use, based on the source_ami_filter in the builders object:"source_ami_filter": { "filters": { "virtualization-type": "hvm", "name": "amzn-ami-hvm-????.??.?.x86_64-gp2", "root-device-type": "ebs" }, "owners": ["amazon"], "most_recent": true } Customize the Region in the packer_variables.json file to the AWS Region you want to build your image in You can also change the Sumo collector download URL if you are in a different deployment The sources.json file url can be updated to point to your own sources.json file, or you can update the template to use the Packer File Provisioner to upload your sources.json file, and any other files Step 3 - Build Your Image Use the command ./packer build -var-file=packer_variables.json -var 'sumo_access_id=<sumo_id>' -var 'sumo_access_key=<sumo_key>' packer_sumo_template.json to build your image. You should see the build start and finish like this: Image Build Start Image Build Finish Done! Now that you've integrated Packer and Sumo Logic, you can navigate to the AMI section of the EC2 AWS console and find the image for use in Autoscaling Launch Configurations, or just launch the image manually. Now What? View Streaming Logs and Metrics! Install the Sumo Logic Applications for Linux and Host Metrics to get pre-built monitoring for your EC2 Instance: What Else Can Sumo Logic Do? Sumo Logic collects AWS CloudWatch metrics, CloudTrail audit data, and much more. Sumo Logic also offers integrated Threat Intelligence powered by CrowdStrike, so that you can identify threats in your cloud infrastructure in real time. See below for more documentation: AWS CloudTrail AWS CloudWatch Metrics Integrated Threat Intelligence What's Next? In part 3 of this series (will be linked here when published), I'll cover how to deploy an Autoscaling Group behind a load balancer in AWS. We will integrate the Sumo Logic collector into each EC2 instance in the fleet, and also log the load balancer access logs to an S3 bucket, then scan that bucket with a Sumo Logic S3 source. If you have any questions or comments, please reach out via my LinkedIn profile, or via our Sumo Logic public Slack Channel: slack.sumologic.com (@grahamwatts-sumologic). Thanks for reading!

AWS

September 29, 2017

Blog

Transitioning to Cloud: The Three Biggest Decision Hurdles

Written by Rod Trent Just a couple years ago, the cloud seemed far enough way that most organizations believed they could take their time migrating from on-premises systems to remote services and infrastructure. But, as the cloud wars heated up, the abilities of the cloud quickened so that now cloud adoption seems like a forgone conclusion. While Amazon continues to be a leader on paper, Microsoft is making some serious inroads with its “intelligent cloud,” Azure. And, we’ve got to the point in just a short period of time where there’s almost nothing keeping organizations from initiating full migrations to the cloud. Amazon, Microsoft, Google and others will continue to clash in the clouds, leapfrogging each other in function, feature, and cost savings. And, that’s good for everyone. Companies can eliminate much of the hardware and network infrastructure costs that have buoyed internal technology services for the last 20 years. A move to the cloud means delivering a freedom of work, allowing the business to expand without those historical, significant investments that dried up the annual budget just a few months into the fiscal year. It’s also a freedom of work for employees, making them happier, more confident, and more productive. For those that have now concluded that it’s time to migrate, and for those that are still wanting to stick a toe in the water to test, taking that first step isn’t as tough as it once was and there’s a “cool factor” about doing so – if for nothing but the cost savings alone. To get started, here are the three biggest hurdles to overcome. Determining How Much or How Little For most organizations shifting from on-premises to the cloud, it’s not an all or nothing scenario. Not everything can or should be run in the cloud – at least not yet. It takes a serious effort and proper due diligence to determine how much can be migrated to operate in the cloud. In a lot of cases, companies need to take an approach where they wean themselves slowly off reliance on on-premises operations and upgrade old systems before moving them. Many companies that go “all in” realize quickly that a slow and steady pace is more acceptable and that a hybrid cloud environment produces the most gains and positions the company for future success. Which leads to the next point… Locating an Experienced Partner Companies should not approach a migration to the cloud as something that is their sole responsibility. The organizations that have the most success are the ones that invested in partnerships. Experienced partners can help minimize headache and cost by identifying the company’s qualified software and solutions and leading organizations’ applications and processes into the cloud. A partner like Sumo Logic can help with application and service migrations. Their experience and solutions are designed to eliminate hassle and ensure success. We are happy to have Sumo Logic as a sponsor of IT/Dev Connections this year. Which leads to the next point… Hire or Educate? There are new skills required for operating in the cloud and not every organization has IT staff or developers that are well-versed in this modern environment. Many companies, through the process of determining the level of cloud and identifying the right partnerships will be able to determine the skills required for both the migration and for the continuing management and support. Once that has been identified, companies can take an inventory of the skills of current IT and developer staff. In some cases, hiring may be an option. However, in most cases, and because the current staff is already acclimated to the business, it makes the most sense to ensure that the current staff is educated for the new cloud economy. There are several resources available, but one in particular, IT/Dev Connections 2017, happens in just a few short weeks. IT/Dev Connections 2017 is an intimately unique conference with heavy focus on the cloud. With many of today’s top instructors onboard delivering over 200 sessions, one week of IT/Dev Connections 2017 delivers enough deep-dive instruction to provide enough opportunity to train the entire staff on the cloud and ensure the company’s migration is successful and enduring. IT/Dev Connections 2017 runs October 23-26, 2017 in San Francisco. Visit the site to register, to identify speakers you know, and to view the session catalog. Rod Trent is the Engagement, Education, and Conference Director for Penton. He has more than 25 years of IT experience, has written many books, thousands of articles, owned and sold businesses, has run and managed editorial teams, and speaks at various conferences and user groups. He’s an avid gadget fan, a die-hard old television show and cartoon buff, a health and exercise freak, and comic book aficionado. @rodtrent

Azure

September 28, 2017

Blog

Docker Logs vs. VM Logs: The Essential Differences

Blog

AWS Security vs. Azure Security

Blog

How Should We Monitor Docker Now?

Blog

Does IoT Stand for ‘Internet of Threats’?

Blog

SIEM is Not the Same for Cloud Security

Blog

Three Things You Didn’t Know About AWS Elastic Load Balancer

Blog

Cloud maturity, security, and the importance of constant learning - Podcast

Blog

Gratitude

Sumo Logic: what thoughts come to your mind when you hear these two words? Some images that cross your mind may be a genius sumo wrestler solving math problems, or you might think that it’s the title of an autobiography of a Sumo wrestler, but really other than being a cool name, it can be a mystery. Well, at least that’s what came to my mind when I was told that I would be interning in the marketing department of Sumo Logic. Hello to the beautiful people reading this, my name is Danny Nguyen and when I first started interning at Sumo Logic, I was nothing more than your average 17 year old high schooler walking into a company thinking that he was way in over his head. Coming into Sumo Logic, I had the exact same panic, nervousness, and fear that anyone would have at their very first job – worried about making a good impression, worried that the work would be too much, and worried that the people he would be working with wouldn’t laugh at his jokes. Before starting my internship, I had these preconceived thoughts of the stereotypical intern. I figured that I would just be someone to give busy work to and someone that would be bossed around. I thought that I wouldn’t be taken seriously and looked down upon because I was still so young. However, as soon as my very first day at Sumo Logic came to an end, I knew that those worries that I had would be nothing but a figment of my imagination because being an intern at Sumo Logic had become and still is, one of the best experiences of my life. Sumo Logic completely redefined what I thought being an intern and working at a company meant. I was constantly learning something new and meeting new people every single week. I was treated with the same respect as if I were a full-time employee and coworker. I was able to gain a real-life experience of working in the marketing department with hands on learning. However, the greatest thing at Sumo Logic, that I will always remember, are all the people that make up the foundation and give it the amazing personality that it has. They made sure that I was getting a great experience where I made as many connections and learned as many things as I could. I was encouraged and inspired every single day to learn and to keep becoming a better version of myself than I was the previous day. But most importantly, people genuinely wanted to get to know as a person and become my friend. So when you ask me what Sumo Logic means to me today, I could type up a two page essay expressing all the different words and adjectives to describe my gratitude, love, and appreciation that I have for Sumo Logic and the people there – and it still would not be enough. – Danny From Maurina, Danny’s Manager: High school intern – I’ll let that sit with you a moment. When I agreed to take in a high school intern (gulp) here at Sumo Logic, I was worried to say the least, but our experience with Danny blew any expectations I had out of the water. A coworker here at Sumo introduced me to Genesys Works, whose mission is to transform the lives of disadvantaged high school students. I met with the Bay Area coordinator, and realized this was an amazing chance to make a difference in a young person’s life. I signed on…then I was terrified. Mentor and her pupil When Danny’s first day rolled around, I was unsure what to expect. From the minute we started talking, however, all my fears were put to rest. Our first conversation was a whirlwind of questions – “What’s your favorite type of food?” “What’s your degree in?” “Do you have any regrets in life?”…wait what? From there, I knew that Danny wasn’t your average 17 year old intern. For the nine months I managed Danny, I watched him grow from a quiet 17 year old, to a vibrant, confident, young man/professional who could turn a scribble into a perfect Salesforce report (hard to do if you know anything about Salesforce reporting), a Post-it drawing into a display ad, and a skeptical manager into a true believer in the power of drive and passion, regardless of age. Thank you Danny! <3

September 7, 2017

Blog

Detecting Insider Threats with Okta and Sumo Logic

Security intelligence for SaaS and AWS Workloads is different than your traditional on-prem environment Based on Okta’s latest Business@Work report, organizations are using between 16-22 SaaS applications in their environment. In the report, Office 365 comes out as the top business applications followed by Box and G suite. These business-critical SaaS applications hold sensitive and valuable company information such as financial data, employee records, and customer data. While everyone understands that SaaS applications provide immediate time-to-value and are increasing in adoption at a faster pace than ever before, what many fail to consider is that these SaaS applications also create a new attack surface that represents substantial risk for the company due to the lack of visibility that security operations teams would typically have with traditional, on-prem applications. If employee credentials are compromised, it creates huge exposure for the company because the attacker is able to access all the applications just like an insider would. In this case, timely detection and containment of an insider threat become extremely important. Sumo Logic’s security intelligence will allow security operations to address the many challenges related to SaaS and cloud workload security. There are many challenges for incident management and security operations teams when organizations are using SaaS applications: How do you make sure that users across SaaS applications can be uniquely identified? How can you track anomalies in user behavior? The first step from the attacker after exploiting the vulnerability is to steal employee’s identity and move laterally in the organization. In that process, the attacker’s behavior will be considerably different than the normal user’s behavior. Second, it is critical that the entire incident response and management processes are automated for detection and containment of such attacks to minimize potential damage or data leakage. Most organizations moving to the cloud have legacy solutions such as Active Directory and on-prem SIEM solutions. While traditional SIEM products can integrate with Okta, they cannot integrate effectively with other SaaS applications to provide complete visibility into user activities. Considering there are no collectors to install to get logs from SaaS applications, traditional SIEM vendors will not be able to provide the required insight into the modern SaaS application and AWS workloads. In order to solve for these specific problems, Okta and Sumo Logic have partnered to provide better visibility and faster detection of insider threats. Okta ensures that every user is uniquely identified across multiple SaaS applications. Sumo Logic can ingest those authentication logs from Okta and be able to correlate with the user activities across multiple SaaS applications such as Salesforce, Box, and Office 365. Sumo Logic has machine learning operators such as multi-dimensional Outlier, LogReduce, and LogCompare to quickly surface the anomaly in the user activities by correlating identity from Okta with the user activities in Salesforce and Office 365. Once the abnormal activities have been identified, Sumo Logic can take multiple actions such as sending Slack message, creating ServiceNow tickets or disabling the user in Okta or triggering actions within a customer’s automation platform. The use case: Okta + Sumo Logic = accurate incident response for cloud workloads and SaaS applications ` How many times have you fat fingered your password and got the authentication failure? Don’t answer it. Authentication failure is a part of life. You cannot launch an investigation every time there is an authentication failure. That would result in too many false positives and an overload of wasted effort for your security operations team. Okta and Sumo Logic allows you to detect multiple authentication failures followed by a successful authentication. It is good enough to launch an investigation at this point, but we all know it could also be a user error. Caps Lock is on, key board is misbehaving or we might have just changed the password and forgotten! To ensure that security operations get more intelligent and actionable insights into such events, Sumo Logic can provide additional context by correlating such authentication failure logs from Okta with user activity across multiple SaaS application. For example, I changed my password and now I am getting authentication failure within Okta. After that I realized the mistake and corrected it, I get the successful authentication. I log into the Box application to work on few documents and signed off. Sumo Logic will take this Okta event and correlate with the Box activities. In case the attacker had logged in instead of me, then there will be anomalies in behavior. An attacker might download all documents or make ownership changes to the documents. While this is happening, Sumo Logic will be able to spot these anomalies in near real time and be able to take a variety of automated actions from creating a ServiceNow ticket to disable the user in Okta. You can start ingesting your Okta logs and correlate with the user activity logs across multiple SaaS applications now. Sign up for your free Sumo Logic trial that never expires! Co-author Matt Egan is a Partner Solutions Technical Architect in Business Development at Okta. In this role, he works closely with ISV partners, like Sumo Logic, to develop integrations and joint solutions that increase customer value. Prior to joining Okta, Matt has held roles ranging from Software Development to Information Security over an 18 years career in technology.

Blog

The Top 5 Reasons to Attend Illuminate

Blog

GDPR Compliance: 3 Steps to Get Started

The General Data Protection Regulation (GDPR) is one of the hottest topics in IT security around the globe. The European Union (EU) regulation gives people more say over what companies can do with their data, while making data protection rules more or less identical throughout the EU. Although this regulation originated in the EU, its impact is global; any organization that does business using EU citizens’ data must be compliant. With the May 2018 deadline looming, IT security professionals worldwide are scrambling to ensure they’re ready (and avoid the strict fines for non-compliance and security breaches). In the video below, Sumo Logic VP of Security and Compliance George Gerchow offers three ways to get you GDPR-ready in no time. 1. Establish a Privacy Program Establishing a privacy program allows you to set a baseline for privacy standards. Once you have a privacy program in place, when new regulations like GDPR are released, all you have to do is fill in the gaps between where you are and where you need to be. 2. Designate a Data Protection Officer This is a critical part of complying with GDPR—and a great way to build sound data security principles into your organization. Under the GDPR requirements, the Data Protection Officer: Must report directly to the highest level of management Can be a staff member or an external service provider Must be appointed on the basis of professional qualities, particularly expert knowledge on data protection law and practices Must be provided with appropriate resources to carry out their tasks and maintain their expert knowledge Must not carry out any other tasks that could result in a conflict of interest 3. Take Inventory of Customer Data and Protections Before GDPR compliance becomes mandatory, take a thorough inventory of where your customer data is housed and how it is protected. Make sure you understand the journey of customer data from start to finish. Keep in mind that the data is only as secure as the systems you use to manage it. As you dissect the flow of data, take note of critical systems that the data depends upon. Make sure the data is secured at every step using proper methodologies like encryption. Bonus Tip: Arrange Third-Party GDPR Validation Between now and May 2018, you still start to see contracts coming through that ask if you are GDPR-compliant. When the deadline rolls around, there will be two groups of organizations out there: Companies that have verification of GDPR compliance to share with prospective clients. Companies that say they are GDPR compliant and want clients to take their word for it. Being in the first group gives your company a head start. Conduct a thorough self-assessment (and document the results) or use a third-party auditor to provide proof of your GDPR compliance. Learn More About GDPR Compliance Ready to get started with GDPR? George Gerchow, the Sumo Logic VP of Security and Compliance, shares more tips for cutting through the vendor FUD surrounding GDPR.

Blog

Understanding and Analyzing IIS Logs

Blog

Apache Error Log Files

Blog

Machine Learning and Log Analysis

Blog

Terraform and Sumo Logic - Build Monitoring into your Cloud Infrastructure

Are you using Terraform and looking for a way to easily monitor your cloud infrastructure? Whether you're new to Terraform, or you control all of your cloud infrastructure through Terraform, this post provides a few examples how to integrate Sumo Logic's monitoring platform into Terraform-scripted cloud infrastructure. *This article discusses how to integrate the Sumo Logic collector agent with your EC2 resources. To manage a hosted Sumo Logic collection (S3 sources, HTTPS sources, etc.), check out the Sumo Logic Terraform Provider here or read the blog. Collect Logs and Metrics from your Terraform Infrastructure Sumo Logic's ability to Unify your Logs and Metrics can be built into your Terraform code in a few different ways. This post will show how to use a simple user data file to bootstrap an EC2 instance with the Sumo Logic collector agent. After the instance starts up, monitor local log files and overlay these events with system metrics using Sumo Logic's Host Metrics functionality: AWS CloudWatch Metrics and Graphite formatted metrics can be collected and analyzed as well. Sumo Logic integrates with Terraform to enable version control of your cloud infrastructure and monitoring the same way you version and improve your software. AWS EC2 Instance with Sumo Logic Built-In Before we begin, if you are new to Terraform, I recommend Terraform: Up and Running. This guide originated as a blog, and was expanded to a helpful book by Yevgeniy Brikman. What We'll Make In this first example, we'll apply the Terraform code in my GitHub repo to launch a Linux AMI in a configurable AWS Region, with a configurable Sumo Logic deployment. The resources will be created in your default VPC and will include: One t2.micro EC2 instance One AWS Security Group A Sumo Logic collector agent and sources.json file The Approach - User Data vs. Terraform Provisioner vs. Packer In this example, we'll be using a user data template file to bootstrap our EC2 instance. Terraform also offers Provisioners, which run scripts at the time of creation or destruction of an instance. HashiCorp offers Packer to build machine images, but I have selected to use user data in this example for a few reasons: User Data is viewable in the AWS console Simplicity - my next post will cover an example that uses Packer rather than user data, although user data can be included in an autoscaling group's launch configuration For more details, see the Stack Overflow discussion here If you want to build Sumo Logic collectors into your images with Packer, see my blog with instructions here The sources.json file will be copied to the instance upon startup, along with the Sumo Logic collector. The sources.json file instructs Sumo Logic to collect various types of logs and metrics from the EC2 instance: Linux OS Logs (Audit logs, Messages logs, Secure logs) Host Metrics (CPU, Memory, TCP, Network, Disk) Cron logs Any application log you need A Note on Security This example relies on wget to bootstrap the instance with the Sumo Logic collector and sources.json file, so ports 80 and 443 are open to the world. In my next post, we'll use Packer to build the image, so these ports can be closed. We'll do this by deleting them in the Security Group resource of our main.tf file. Tutorial - Apply Terraform and Monitor Logs and Metrics Instantly Prerequisites First, you'll need a few things: Terraform - see the Terraform docs here for setup instructions A Sumo Logic account - Get a free one here Access to an AWS account with AmazonEC2FullAccess permissions - If you don't have access you can sign up for the free tier here An AWS authentication method to allow Terraform to control AWS resources Option 1: User key pair Option 2: Set up the AWS CLI or SDKs in your local environment Instructions 1. First, copy this repo (Example 1. Collector on Linux EC2) somewhere locally. You'll need all 3 files: main.tf, vars.tf, and user_data.sh main.tf will use user_data.sh to bootstrap your EC2 main.tf will also use vars.tf to perform lookups based on a Linux AMI map, a Sumo Logic collector endpoint map, and some other variables 2. Then, test out Terraform by opening your shell and running: /path/to/terraform plan You can safely enter any string, like 'test', for the var.Sumo_Logic_Access_ID and var.Sumo_Logic_Access_Key inputs while you are testing with the plan command. After Terraform runs the plan command, you should see: "Plan: 2 to add, 0 to change, 0 to destroy." if the your environment is configured correctly. 3. Next, run Terraform and create your EC2 instance, using the terraform apply command There are some configurable variables built in For example, the default AWS Region that this EC2 will be launched into is us-east-1, but you can pass in another region like this: path/to/terraform/terraform apply -var region=us-west-2 If your Sumo Logic Deployment is in another Region, like DUB or SYD, you can run the command like this: path/to/terraform/terraform apply -var Sumo_Logic_Region=SYD 5. Then, Terraform will interactively ask you for your Sumo Logic Access Key pair because there is no default value specified in the vars.tf file Get your Sumo Logic Access Keys from your Sumo Logic account and enter them when Terraform prompts you First, navigate to the Sumo Logic Web Application and click your name in the left nav and open the Preferences page Next, click the blue + icon near My Access Keys to create a key pair See the official Sumo Logic documentation here for more info You will see this success message after Terraform creates your EC2 instance and Security Group: "Apply complete! Resources: 2 added, 0 changed, 0 destroyed." 6. Now you're done! After about 3-4 minutes, check under Manage Data > Collection in the Sumo Logic UI You should see you new collector running and scanning the sources we specified in the sources.json (Linux OS logs, Cron log, and Host Metrics) Cleanup Make sure to delete you resources using the Terraform destroy command. You can enter any string when you are prompted for the Sumo Logic key pair information. The -Vephemeral=true flag in our Sumo Logic user data configuration command instructs Sumo Logic to automatically clean out old collectors are no longer alive. /path/to/terraform destroy Now What? View Streaming Logs and Metrics! Install the Sumo Logic Applications for Linux and Host Metrics to get pre-built monitoring for your EC2 Instance: What Else Can Sumo Logic Do? Sumo Logic collects AWS CloudWatch metrics, CloudTrail audit data, and much more. Sumo Logic also offers integrated Threat Intelligence powered by CrowdStrike, so that you can identify threats in your cloud infrastructure in real time. See below for more documentation: AWS CloudTrail AWS CloudWatch Metrics Integrated Threat Intelligence What's Next? In part 2 of this post, I'll cover how to deploy an Autoscaling Group behind a load balancer in AWS. We will integrate the Sumo Logic collector into each EC2 instance in the fleet, and also log the load balancer access logs to an S3 bucket, then scan that bucket with a Sumo Logic S3 source. Thanks for reading! Graham Watts is an AWS Certified Solutions Architect and Sales Engineer at Sumo Logic

Blog

Monitoring and Troubleshooting Using AWS CloudWatch Logs

AWS CloudWatch is a monitoring tool for the AWS platform. CloudWatch logs are an important resource for monitoring and helping to interpret the data in your AWS cloud. This article covers the essentials of working with AWS CloudWatch and CloudWatch Logs. What Can You Do with CloudWatch? As a tool, CloudWatch is quite versatile. IT Pros can use it for several different purposes, including tracking performance metrics, setting threshold alarms, and even taking automated action when a monitored resource exceeds a predetermined threshold. Monitor Amazon EC2 Instances One of the most common uses of AWS CloudWatch is for the monitoring of EC2 instances. The nice thing about this functionality is that it is enabled by default. AWS collects performance metrics from EC2 instances every five minutes, and stores those metrics for 15 months so that you can monitor performance changes over time. For instances that require more timely performance data, AWS does provide an option to collect performance data every minute. Doing so requires you to enable detailed monitoring for the instance, which is a simple process, but incurs an additional cost. Monitor Events Logged by CloudTrail AWS CloudWatch logs can do far more than simply monitor the performance of EC2 instances. You can also use CloudWatch to gather the events that have been monitored by AWS CloudTrail. For those who might not be familiar with CloudTrail, it is designed to be an auditing mechanism for AWS. As you are no doubt aware, AWS is made up of an extremely diverse collection of services. The one thing that all of these services have in common is that they are built around the use of APIs. Any time that you interact with an AWS service, an API is at work in the background. This holds true regardless of whether the service is accessed programmatically, through the AWS console, or through the AWS CLI. CloudTrail’s job is to capture a record of all API activity that occurs across an AWS account. A log of the activity is written to an S3 bucket, but it is also possible to deliver the logging data to CloudWatch. Kinesis Streams and AWS Lambda AWS Kinesis Streams are designed to help AWS subscribers to either process or analyze extremely high volumes of streaming data. A Kinesis stream can capture data from hundreds of thousands of sources simultaneously, and can process or analyze multiple terabytes of data every hour. Kinesis is often used in conjunction with AWS Lambda, which allows for the automatic processing of streaming data. Lambda is designed to log data through CloudWatch logs. Filtering and Searching AWS CloudWatch Logs AWS CloudWatch logs can accumulate vast amounts of data, so it is important to be able to filter the log data based on your needs. Filtering is achieved through the use of metric filters. Perhaps the most important thing to understand about metric filters is that they do not support retroactive filtering. Only events that have been logged since the time that the filter was created will be reported in the filtered results. Log entries that existed prior to the filter’s creation are not included in the filtered results. Creating a Metric Filter To create a metric filter, log into the AWS console, and choose the CloudWatch service. When the CloudWatch dashboard appears, click on the Logs option, and then click on the number of metric filters that is displayed within your log group. (The number of metric filters will initially be set at zero.) If no log groups exist, you will have to create a log group before continuing. Click the Add Metric Filter button, and you will be taken to a screen that asks you to specify a few different pieces of information. First, you will need to provide a filter pattern. A filter pattern specifies what the metric filter will look for within the log. (For instance, entering the word Error will cause the filter to look for occurrences of the word Error.) Next, you will need to select the log data that you plan to test. Once you have made your selection, click the Test Pattern button to make sure that the results are what you expect, and then click on the Assign Metric button. The resulting screen requires you to enter a filter name. The filter name is just a friendly name used to identify the metric filter within the log group. You will also need to specify a metric namespace. A metric namespace is nothing more than a group that contains related metrics. By default, AWS uses LogMetrics as the metric namespace name. Finally, you will have to specify a metric name. The metric name is the name of the CloudWatch metric where the log information will be published. AWS also gives you the optional ability to write a metric value to the log when a pattern match occurs. When you are done, click the Create Filter button, and the metric filter will be created. You can monitor your metrics from the CloudWatch Metrics dashboard.

AWS

July 27, 2017

Blog

Log Aggregation vs. APM: No, They’re Not the Same Thing

Are you a bit unsure about the difference between log aggregation and Application Performance Monitoring (APM)? If so, you’re hardly alone. These are closely related types of operations, and it can be easy to conflate them—or assume that if you are doing one of them, there’s no reason to do the other. In this post, we’ll take a look at log aggregation vs APM, and the relationship between these two data accumulation/analysis domains, and why it is important to address both of them with a suite of domain-appropriate tools, rather than a single tool. Defining APM First, let’s look at Application Performance Monitoring, or APM. Note that APM can stand for both Application Performance Monitoring and Application Performance Management, and in most of the important ways, these terms really refer to the same thing—monitoring and managing the performance of software under real-world conditions, with emphasis on the user experience, and the functional purpose of the software. Since we’ll be talking mostly about the monitoring side of APM, we’ll treat the acronym as being interchangeable with Application Performance Monitoring, but with the implicit understanding that it includes the performance management functions associated with APM. What does APM monitor, and what does it manage? Most of the elements of APM fall into two key areas: user experience, and resource-related performance. While these two areas interact (resource use, for example, can have a strong effect on user experience), there are significant differences in the ways in which they are monitored (and to a lesser degree, managed): APM: User Experience The most basic way to monitor application performance in terms of user experience is to monitor response time. How long does it take after a user clicks on an application input element for the program to display a response? And more to the point, how long does it take before the program produces a complete response (i.e., a full database record displayed in the correct format, rather than a partial record or a spinning cursor)? Load is Important Response time, however, is highly dependent on load—the conditions under which the application operates, and in particular, the volume of user requests and other transactions, as well as the demand placed on resources used by the application. To be accurate and complete, user experience APM should include in-depth monitoring and reporting of response time and related metrics under expected load, under peak load (including unreasonably high peaks, since unreasonable conditions and events are rather alarmingly common on the Internet), and under continuous high load (an important but all too often neglected element of performance monitoring and stress testing). Much of the peak-level and continuous high-level load monitoring, of course, will need to be done under test conditions, since it requires application of the appropriate load, but it can also be incorporated into real-time monitoring by means of reasonably sophisticated analytics: report performance (and load) when load peaks above a specified level, or when it remains above a specified level for a given minimum period of time. APM: Resource Use Resource-based performance monitoring is the other key element of APM. How is the application using resources such as CPU, memory, storage, and I/O? When analyzing these metrics, the important numbers to look at are generally percentage of the resource used, and percentage still available. This actually falls within the realm of metrics monitoring more than APM, and requires tools dedicated to metrics monitoring. If percent used for any resource (such as compute, storage or memory usage) approaches the total available, that can (and generally should) be taken as an indication of a potential performance bottleneck. It may then become necessary to allocate a greater share of the resource in question (either on an ongoing basis, or under specified conditions) in order to avoid such bottlenecks. Remember: bottlenecks don’t just slow down the affected processes. They may also bring all actions dependent on those processes to a halt. Once Again, Load Resource use, like response time, should be monitored and analyzed not only under normal expected load, but also under peak and continuous high loads. Continuous high loads in particular are useful for identifying potential bottlenecks which might not otherwise be detected. Log Aggregation It should be obvious from the description of APM that it can make good use of logs, since the various logs associated with the deployment of a typical Internet-based application provide a considerable amount of performance-related data. Much of the monitoring that goes into APM, however, is not necessarily log-based, and many of the key functions which logs perform are distinct from those required by APM. Logs as Historical Records Logs form an ongoing record of the actions and state of the application, its components, and its environment; in many ways, they serve as a historical record for an application. As we indicated, much of this data is at least somewhat relevant to performance (load level records, for example), but much of it is focused on areas not closely connected with performance: Logs, for example, are indispensable when it comes to analyzing and tracing many security problems, including attempted break-ins. Log analysis can detect suspicious patterns of user activity, as well as unusual actions on the part of system or application resources. Logs are a key element in maintaining compliance records for applications operating in a regulated environment. They can also be important in identifying details of specific transactions and other events when they require verification, or are in dispute. Logs can be very important in tracing the history and development of functional problems, both at the application and infrastructure level—as well as in analyzing changes in the volume or nature of user activity over time. APM tools can also provide historical visibility into your environment, but they do it in a different way and at a different level. They trace performance issues to specific lines of code. This is a different kind of visibility and is not a substitute for the insight you gain from using log aggregation with historical data in order to research or analyze issues after they have occurred. The Need for Log Aggregation The two greatest problems associated with logs are the volume of data generated by logging, and the often very large number of different logs generated by the application and its associated resources and infrastructure components. Log aggregation is the process of automatically gathering logs from disparate sources and storing them in a central location. It is generally used in combination with other log management tools, as well as log-based analytics. It should be clear at this point that APM and log aggregation are not only different—It also does not make sense for a single tool to handle both tasks. It is, in fact, asking far too much of any one tool to take care of all of the key tasks required by either domain. Each of them requires a full suite of tools, including monitoring, analytics, a flexible dashboard system, and a full-featured API. A suite of tools that can fully serve both domains, such as that offered by Sumo Logic, can, on the other hand, provide you with the full stack visibility and search capability into your network, infrastructure and application logs.

July 27, 2017

Blog

How to prevent Cloud Storage Data Leakage

Blog

Jenkins, Continuous Integration and You: How to Develop a CI Pipeline with Jenkins

Continuous Integration, or CI for short, is a development practice wherein developers can make changes to project code and have those changes automatically trigger a process which builds the project, runs any test suites and deploys the project into an environment. The process enables teams to rapidly test and develop ideas and bring innovation faster to the market. This approach allows teams to detect issues much earlier in the process than with traditional software development approaches. With its roots in Oracle’s Hudson server, Jenkins is an open source integration server written in Java. The server can be extended through the use of plugins and is highly configurable. Automated tasks are defined on the server as jobs, and can be executed manually on the server itself, or triggered by external events, such as merging a new branch of code into a repository. Jobs can also be chained together to form a pipeline, taking a project all the way from code to deployment, and even monitoring of the deployed solution in some cases. In this article, we’re going to look at how to set up a simple build job on a Jenkins server and look at some of the features available natively on the server to monitor and troubleshoot the build process. This article is intended as a primer on Jenkins for those who have not used it before, or have never leveraged it to build a complete CI pipeline. Before We Get Started This article assumes that you already have a Jenkins server installed on your local machine or on a server to which you have access. If you have not yet accomplished this, the Jenkins community and documentation can be an excellent source of information and resources to assist you. Jenkins is published under the MIT License and is available for download from their GitHub repository, or from the Jenkins website. Within the Jenkins documentation, you’ll find a Guided Tour, which will walk you through setting up a pipeline on your Jenkins box. One of the advantages of taking this tour is that it will show you how to create a configuration file for your pipeline, which you can store in your code repository, side-by-side with your project. The one downside of the examples presented is that they are very generic. For a different perspective on Jenkins jobs, let’s look at creating a build pipeline manually through the Jenkins console. Creating A Build Job For this example, we’ll be using a project on GitHub that creates a Lambda function to be deployed on AWS. The project is Gradle-based and will be built with Java 8. The principles we’re using could be applied to other code repositories, build and deployment situations. Log in to your Jenkins server, and select New Item from the navigation menu. Jenkins New Item Workflow Choose a name for your project, select Freestyle project and then scroll down and click OK. I’ll be naming mine Build Example Lambda. When the new project screen appears, follow the following steps. Not all of these steps are necessary, but they’ll make maintaining your project easier. Enter a Description for your project and describe what this pipeline will be doing with it. Check Discard old builds, and select the Log Rotation Strategy with the Max # of builds to keep set to 10. These are the settings I use, but you may select different numbers. Having this option in place prevents old builds from taking too much space on your server. We’ll add a parameter for the branch to build, and default it to master. This will allow you to build and deploy from a different branch if the need arises. Select This project is parameterized. Click on Add Parameter and select String Parameter. Name: BRANCH Default Value: master Description: The branch from which to pull. Defaults to master. Scroll down to Source Code Management. Select Git. Enter the Repository URL. In my case, I entered https://github.com/echovue/Lambda_SQSMessageCreator.git You may also add credentials if your Git repository is secure, but setting that up is beyond the scope of this article. For the Branch Specifier, we’ll use the parameter we set up previously. Parameters are added by enclosing the parameter name in curly braces and prefixing it with a dollar sign. Update this field to read */${BRANCH} Git Configuration Using Parameterized Branch For now, we’ll leave Build Triggers alone. Under Build Environment, select Delete workspace before build starts, to ensure that we are starting each build with a clean environment. Under Build, select Add build step, and select Invoke Gradle script. When I want to build and deploy my project locally, I’ll enter ./gradlew build fatJar on the command line. To accomplish this as part of the Jenkins job, I’ll complete the following steps. Select Use Gradle Wrapper Check From Root Build Script Dir For Tasks, enter build fatJar Finally, I want to save the Fat Jar which is created in the /build/libs folder of my project, as this is what I’ll be uploading to AWS in the next step. Under Post-build Actions, Select Add post-build action and choose Archive the artifacts. In files to archive, enter build/libs/AWSMessageCreator-all-* Finally, click on Save. Your job will now have been created. To run your job, simply click on the link to Build with Parameters. If the job completes successfully, you’ll have a jar file which can then be deployed to AWS Lambda. If the job fails, you can click on the job number, and then click on Console Output to troubleshoot your job. Next Steps If your Jenkins server is hosting on a network that is accessible from the network which hosts the code repository you’re using, you may be able to set up a webhook to trigger the build job when changes are merged into the master branch. The next logical step is to automate the deployment of the new build to your environment if it builds successfully. Install the AWS Lambda Plugin and the Copy Artifact Plugin on your Jenkins server, and use it to create a job to deploy your Lambda to AWS, which copies the jar file we archived as part of the job we built above. When the deployment job has been successfully created, open the build job, and click on the Configure option. Add a second Post-build action to Build other projects. Enter the name of the deployment project, and select Trigger only if build is stable. At this point, the successful execution of the build job will automatically start the deployment job. Congrats! You’ve now constructed a complete CI pipeline with Jenkins.

Blog

Use Sumo Logic to Collect Raspberry Pi Logs

June 18, 2017

Blog

Integrating Machine Data Analytics in New Relic Insights via Sumo Logic Webhooks

When Sumo Logic and New Relic announced a partnership at AWS re:Invent 2016, we immediately started hearing the excitement from our joint customers. The ability to combine the strengths of two leading SaaS services that offer fast time-to-value for monitoring and troubleshooting modern applications would offer a powerful and complete view of digital businesses, from the client down to the infrastructure. Today, we’re pleased to announce another advancement in our partnership: integrated machine data analytics with application and infrastructure performance data in New Relic Insights via a custom New Relic webhook built directly into Sumo Logic. Custom New Relic webhook in Sumo Logic Unlocking Insights from Sumo Logic Scheduled searches in Sumo Logic allow you to monitor and alert on key events occurring in your application and infrastructure. The flexibility of the query language allows you to pull just the information you need while fine tuning the thresholds to trigger only when necessary. Combined with your New Relic APM and New Relic Infrastructure data in New Relic Insights, you’ll now be able to visualize information such as: Events: Service upgrades, exceptions, server restarts, for example Alerts: More than 10 errors seen in 5 minutes, for example, or failed login attempts exceeding 5 in 15 minutes KPIs: Count of errors by host, for example, or top 10 IPs by number of requests Integrating these insights into New Relic provides an integrated context for faster root cause analysis and reduced Mean Time to Resolution (MTTR), all within a single pane of glass. In just three simple steps, you’ll be able to leverage Sumo Logic webhooks to send data to New Relic. Step 1: Configure the New Relic webhook connection In New Relic Insights, you will first need to register an API key that will be used by the Sumo Logic webhook. These keys allow you to securely send custom events into New Relic from different data sources. Type in a short description to keep a record of how this API key will be used, then copy the Endpoint and Key for setup in Sumo Logic. Generate an API Key from New Relic Insights to be used in Sumo Logic In Sumo Logic, create a New Relic webhook connection and insert the Endpoint and Key into the URL and Insert Key fields. The payload field gives you the flexibility to customize the event for viewing in New Relic. In addition to the actual results, you can optionally specify metadata to provide additional context. For example, the name of the Sumo Logic search, a URL to that particular search, a description, and more. This payload can also be customized later when you schedule the search. Variables from your Sumo Logic search can be included in your payload for additional context in New Relic. Step 2: Schedule a search to send custom events After saving your New Relic webhook, you have the option to specify this as the destination for any scheduled search in Sumo Logic. The example below shows a query to look for “Invalid user” in our Linux logs every 15 minutes. To store and visualize this information in New Relic, we simply schedule a search, select the New Relic webhook that we configured in Step 1, and customize the payload with any additional information we want to include. This payload will send each result row from Sumo Logic as an individual event in New Relic. The Sumo Logic query language allows you to transfer meaningful insights from your logs to New Relic Step 3: Visualize events in New Relic Insights Once the scheduled search has been saved and triggered, we can see the data populating in New Relic Insights and use the New Relic Query Language (NRQL) to create the visualizations we need. NRQL’s flexibility lets you tailor the data to your use case, and the visualization options make it seamless to place alongside your own New Relic data. In fact, you might not even notice the difference between the data sources—can you tell which data below is coming from New Relic, and which is coming from Sumo Logic? A unified view: “Source IP’s from Failed Attempts” streams in from Sumo Logic, while “Errors by Class” comes from New Relic The ability to visualize application and infrastructure performance issues alongside insights from your logs reduces the need to pivot between tools, which can speed root cause analysis. If you’ve spotted an issue that requires a deeper analysis of your logs, you can jump right into a linked Sumo Logic dashboard or search to leverage machine learning and advanced analytics capabilities. Learn more Head over to Sumo Logic DocHub for more details on how to configure the New Relic webhook, then schedule some searches to send custom events to New Relic Insights. We’re excited to continue advancing this partnership, and we look forward to sharing more with you in the future. Stay tuned!

June 8, 2017

Blog

Disrupting the Economics of Machine Data Analytics

The power of modern applications is their ability to leverage the coming together of mobile, social, information and cloud to drive new and disruptive experiences…To enable companies to be more agile, to accelerate the pace at which they roll out new code, to adopt DevSecOps methodologies where traditional siloed walls between the teams are disappearing. But these modern applications are highly complex with new development and testing processes, new architectures, new tools (i.e. containers, micro-services and configuration management tools), SLA requirements, security in the cloud concerns, and explosion of data sources, coming from these new architectures as well as IOT. In this journey to the cloud with our 1500+ customers, we have learned a few things about their challenges: All of this complexity and volume of data is creating unprecedented challenges to enable ubiquitous user access to all this machine data to drive continuous intelligence across operational and security use cases. In this new world of modern applications and cloud infrastructures, they recognize that not all data is created equal. For example, the importance, the life expectancy, the access performance needed, the types of analytics that need to be run against that data. Think IT Operations data (high value, short life span, frequent and high performance access needs) vs. regulatory compliance data (long term storage, periodic searches, esp. at audit times, slower performance may be acceptable). Data ingest in certain verticals such as retail and travel, fluctuate widely and provisioning at maximum capacity loads – with idle capacity the majority of the year – is unacceptable in this day and age. So if we step back for a moment and look at the industry as a whole, what is hindering a company’s ability to unleash their full data potential? The root of the problem comes from two primary areas: 1. The more data we have, the higher the cost 2. The pricing models of current solutions are based on volume of data ingested and not optimized for varying use cases that we are seeing… it is like a “one size fits all” kind of approach Unfortunately, organizations are often forced to make a trade-off because of the high cost of current pricing models, something we refer to as the data tax – the cost of moving data into your data analytics solution. They have to decide: “What data do I send to my data analytics service?” as well as “Which users do I enable with access?” As organizations are building out new digital initiatives, or migrating workloads to the cloud, making these kinds of tradeoffs will not lead to ultimate success. What is needed is a model that will deliver continuous intelligence across operational and security use cases. One that leverages ALL kinds of data, without compromise. We believe there is a better option – one which leverages our cloud-native machine data analytics platform, shifting from a volume based approach – fixed, rigid, static – to a value based pricing model – flexible and dynamic – aligned with the dynamic nature of the modern apps that our customers are building. One that moves us to a place where democratization of machine data is realized! Introducing Sumo Logic Cloud Flex As this launch was being conceived, there were four primary goals we set out to accomplish: Alignment: Alignment between how we priced out service and the value customers received from it. Flexibility: Maximum flexibility in the data usage and consumption controls that best align to the various use cases Universal Access: Universal access of machine data analytics to all users, not just a select few Full Transparency: Real-time dashboards on how our service is being used, the kind of searches people are running, and the performance of the system And there were four problem areas we were trying to address: Data Segmentation: Different use cases require different retention durations Data Discrimination: Not all data sets require the same performance and analytics capabilities Not economical to store and analyze low value data sets Not economical to store data sets for long periods of time, esp. as it relates to regulatory compliance mandates Data Ubiquity: Not economical for all users to access machine data analytics Data Dynamics: Support seasonal business cycles and align revenue with opex So with this Cloud Flex launch, Sumo Logic introduces the following product capabilities to address these four pain points: Variable Data Retention Analytics Profile Unlimited Users Seasonal Pricing If increasing usage flexibility in your data analytics platform is of interest, please reach out to us. If you would like to get more information on cloud flex and the Democratizing Machine Data Analytics, please read our press release.

June 6, 2017

Blog

Universal Access

“In God we trust, all others must bring data.” – W. Edward Dennings Over the years, we’ve all internalized the concept that data driven decision making is key. We’ve watched as digital businesses like AirBnB, Uber, and Amazon far outpace their competitors in market-share and profitability. They do this by tapping into continuous intelligence: they use the data generated as users interact with their applications to provide customized experiences – allowing them to learn and adapt quickly to where their users want to be. I had always imagined decision-making at these companies to be kind of like the stock photo; a well-heeled executive looking at an interesting chart and experiencing that moment of brilliant insight that leads to a game-changing business decision. The reality though, is that it was never as simple as that. It was hard work. It was not one key decision made by one key executive. It was several hundreds of small every-day decisions made across all levels in the organization by all kinds of individuals, that slowly but surely inched the company across the finish-line of success and sustains them today in their continued growth. The better equipped employees were with the relevant data, the better they could execute in their roles. At most companies, the decision to be data-driven is simple, actually getting to that state, not so much. Conversations might go something like this: “We should be more data-driven!” “Yeah!” “What data do we need?” “Depends, what are we trying to solve?” “Where’s the data?” “How do we get to it?” “What do we do once we have it?” “How can we share all this goodness?” At Sumo Logic, we’ve already cracked the hard fundamental problems of getting to the data and being able to ask meaningful questions of that data. We support a vast and scalable set of collection tools and at Sumo’s core is very powerful machine data analytics platform that allows our customers to query their logs and metrics data to quickly derive insights and address impactful operational and security issues. We’re working our way up the problem chain. Now that we can easily get to the data and analyze it – how do we use it as an organization? How can our machine data tell our infrastructure engineers about the health of the service, the support team about performance against SLAs, help PMs to understand user adoption and find a way to summarize all of this into a format that can be presented to executives and key stakeholders? To solve this, we recently introduced the concept of public dashboards; data-rich dashboards that could be shared outside of Sumo and across the organization. This helped expose data to users who relied on it to make decisions, but who were far removed from the actual applications and infrastructure that generated it. Now, we’re tackling a deeper issue: how do users and teams collaborate on data analysis within Sumo? How do they learn from each other about what kind of metrics other teams collect, what are best practices and how do they learn and grow exponentially as an organization as they become empowered with this data? We plan to solve this, later this year, by allowing users to share their dashboards, log searches and metrics queries with other users and roles in their organization. Teams can collaborate with each other and control with granularity how different users and roles in the organization can edit or view a dashboard or a search. Administrators can efficiently organize and make accessible the content that’s most relevant for a particular group of people. We’ve embraced the concept of Universal Access to mean accessibility to Sumo and more importantly, the data in Sumo, to all users regardless of their skill or experience levels with Sumo. We’ve redesigned Sumo to be contextual and intuitive with the introduction of simpler navigation and workflows. Current users will appreciate the new types of content that can be opened in tabs – such as dashboards, log searches, metrics queries and live tail – and the fact that these tabs are persistent across login sessions. New users will have a smoother onboarding experience with a personalized homepage. To check out the new UI (beta) & learn more about how Sumo Logic can help your organization be more data-driven, sign-up today!

June 6, 2017

Blog

Journey to the Cloud, with Pivotal and Sumo Logic

There is no denying it – the digital business transformation movement is real, and the time for this transformation is now. When, according to survey from Bain & Company, 48 of 50 Fortune Global companies have publicly announced plans to adopt public cloud, it is clear that there are no industries immune from this disruption. We are seeing traditional industries such as insurance, banking, and healthcare carving out labs and incubators that bring innovative solutions to market, and establish processes and platforms to help the rest of the organization with their evolution. For large enterprises it is critical that they manage the challenges of moving to public cloud, while satisfying the needs of a diverse set of internal customers. They need to support a variety of development languages, multiple deployment tool chains, and a mix of data centers and multiple public cloud vendors. Because these are long term strategies that involve considerable investment, they are concerned about long-term vendor lock-in, and are being proactive about developing strategies to mitigate those risks. These organizations are looking toward cloud-neutral commercial vendors to help them migrate to the cloud, and have consistency in how they deploy and manage their applications across heterogeneous environments. These enterprises are increasingly turning to Pivotal Cloud Foundry® to help them abstract their app deployments from the deployment specifics of individual cloud platforms, and maintain their ability to move apps and workloads across cloud providers when the time comes. Effective DevOps Analytics for the Modern Application The migration of enterprise workloads to the cloud, and the rise of public cloud competition, is driving the demand for Sumo Logic as a cloud-native platform for monitoring and securing modern applications. Pivotal Cloud Foundry enables users to abstract the underlying plumbing necessary to deploy, manage and scale containerized cloud native applications. This benefits developers by greatly increasing their productivity and ability to launch applications quickly. Such an environment also exposes a broader set of operational and security constructs that are useful to track, log and analyze. However it can also be more complicated to diagnose performance issues with decoupled architectures and composable micro-services. Full stack observability and the ability to trace all the apps and services together are critical to successful cloud deployments. Observability of decoupled architectures with composable services requires the ability to trace all layers of the stack With Pivotal Cloud Foundry and tools from Sumo Logic, an organization can have an observable, enterprise-class platform for application delivery, operations, and support across multiple public cloud providers and on-premises data centers. Beyond platform operations, Cloud Foundry customers want to enable their app teams to be self sufficient, and promote an agile culture of DevOps. Often, with legacy monitoring and analytics tools, the operations team will have access to the data, but they can’t scale to support the application teams. Or, the apps team may restrict access to their sensitive data, and therefore not support the needs of the security and compliance team. Sumo Logic believes in democratized analytics. This means that this massive flow of highly valuable data, from across the stack and cloud providers, should be available to everyone that can benefit from it. This requires the right level of scale, security, ubiquity of access, and economics that only Sumo Logic can provide. Sumo Logic & Pivotal Cloud Foundry Partnership Through our collaboration with Pivotal®, Sumo Logic has developed an app for Pivotal Cloud Foundry, as well as an easy-to-deploy integration with Pivotal Cloud Foundry Loggregator. A customer ready Beta of the “Sumo Logic Nozzle for PCF”, is available now as an Operations Manager Tile for Pivotal Cloud Foundry, available for download in BETA from Pivotal Network. Sumo Logic Tile Installed in the PCF Ops Manager If you are already using or evaluating Pivotal Cloud Foundry you can get started with operational and security analytics in a manner of minutes. With this integration, all of the log and metrics data collated by Cloud Foundry Loggregator will be streamed securely to the Sumo Logic Platform. For deployments with security and compliance requirements, Sumo Logic’s cloud-based service is SOC 2, HIPAA, and PCI-compliant. The Sumo Logic integration for Pivotal Cloud Foundry will be available in the App Library soon. If you would like early access, please contact your account team. Sumo Logic App for Pivotal Cloud Foundry highlight key Pivotal data and KPI’s The Sumo Logic App for Pivotal Cloud Foundry highlights key Pivotal data and KPIs. Sumo Logic’s App for Cloud Foundry operationalizes Pivotal Cloud Foundry’s monitoring best practices for you, and provides a platform for you to build upon to address your unique monitoring and diagnostic requirements.

Blog

The Democratization of Machine Data Analytics

Earlier today we announced a revolutionary set of new platform services and capabilities. As such, I wanted to provide more context around this and our strategy. While this new announcement is very exciting, we have always been pushing the boundaries to continuously innovate in order to remove the complexity and cost associated with getting the most value out of data. Whether it’s build-it-yourself open source toolkits or legacy on-premise commercial software packages, the “data tax” associated with these legacy licensing models, let alone the technology limitations have prevented universal access for all types of data sources and more importantly users. This strategy and the innovations we announced address the digital transformation taking place industry- wide, led by the mega trends of cloud computing adoption, DevSecOps and the growth of machine data. For example, IDC recently forecasted public cloud spending to reach 203.4 billion by 2020, while Bain & Company’s figure is nearly twice that at $390 billion. Whatever number you believe, the bottom line is that public cloud adoption is officially on a tear. For example, according to Bain, 48 of the 50 Fortune Global companies have publicly announced cloud adoption plans to support a variety of needs. In the world of DevSecOps, our own Modern App Report released last November substantiated the rise of a new Modern Application Stack, replete with new technologies, such as Docker, Kuburnetes, NoSQL, S3, Lambda, and CloudFront that are seriously challenging the status quo of traditional on-premise standards from Microsoft, HP, Oracle, and Akamai. However, the most significant, and arguably the most difficult digital transformation trend for businesses to get their arms around is the growth of machine data. According to Barclay’s Big Data Handbook, machine data will account for 40 percent of all data created by 2020, reaching approximately 16 zetabytes. (To put that number in perspective, 16 zetabytes is equivalent to streaming the entire Netflix catalogue 30 million times!) Since machine data is the digital blueprint of digital business, it’s rich source of actionable insights either remains locked away or difficult-to-extract at best because of expensive, outdated, disparate tooling that limits visibility, impedes collaboration and slows down the continuous processes required to build, run, secure and manage modern applications. Seven years ago, Sumo Logic made a big bet: disrupt traditional Big Data models with their lagging intelligence indicators by pursuing a different course: a real-time, continuous intelligence platform strategy better equipped to support the continuous innovation models of transformational companies. Now this need is becoming more critical than ever as the laggards make their shift and cloud computing goes mainstream, which not only will drive those market data numbers even higher, but also put the squeeze on the talent necessary to execute the shift. That’s why Sumo Logic’s vision, to “Democratize Machine Data”, now comes to the forefront. To truly enable every company to have the power of real-time, machine data analytics, we believe the current licensing, access and delivery models surrounding machine data analytics are also ripe for disruption. Our announcement today provides essential new innovations – ones that are only achievable because of our market-leading, multi-tenant, cloud-native platform – that remove economic, access and visibility barriers holding companies back from reaching their full data-insight potential. They are: Sumo Cloud Flex: a disruptive data analytics economic model that enables maximum flexibility to align data consumption and use with different use cases, and provide universal access by removing user-based licensing. While this was purpose-built and optimized for the massive untapped terabyte volume data sets, it’s also applicable to the highly variable data sets. Unified Machine Data Analytics: New, native PaaS and IaaS-level integrations to our cloud-native, machine data analytics platform to support data ingest from a variety of cloud platforms, apps and infrastructures. These additions will enable complete visibility and holistic management across the entire modern application and infrastructure stack. Universal Access: New experience capabilities such as a contextual and intuitive user interface to improve user productivity and public dashboards, and improved content sharing for faster collaboration with role-based access controls (RBAC). With this innovation, machine data insights are easier to access for non-technical, support services and business users. Over time, we predict ease-of-use initiatives like this will be one of the drivers to help close the current data scientist/security analyst talent gap. With our new innovations announced today, plus more coming later in the year, Sumo Logic is positioned to become the modern application management platform for digital transformation, delivered to our customers as a low TCO, scalable, secure service. That’s because machine data analytics will be the layer that provides complete visibility to manage the growing complexity of cloud-based, modern applications, which is sorely needed today and in the future. As the leading, cloud-native machine data analytics service on AWS, we service more than 1500 customers, from born-in-the-cloud companies like Salesforce, Twilio and AirBnB to traditional enterprises, such as Marriott, Alaska Airlines and Anheuser-Busch. Our platform system on average analyzes 100+ petabytes of data, executes more than 20 million searches, and queries 300+ trillion records every day. While these numbers seem massive, the numbers keep growing and yet we are only at the beginning of this massive opportunity. Other machine data analytics options such as cobbling a solution together with old technologies, or trying to build it on your own fall short because they don’t address the fundamental problem – machine data will just keep growing. To address this, the data layer must be re-architected – similar to the compute layer – to utilize the power of true distributed computing to address a problem that is never over – the volume, velocity and variety of machine data growth – and to do so in a way that meets the speed, agility and intelligence demands of digital business. You can’t put old, enterprise application architectures on top of the cloud and expect to be prepared. Sumo Logic’s ability to flexibly manage and maximize data and compute resources – the power of multi-tenant, distributed system architecture – across 1500+ customers means our customers have the ability to maximize their data insight potential to realize the holy grail of being real-time, data-driven businesses. We invite you to experience the power of the Sumo Logic for free. As always, I look forward to your feedback and comments.

June 6, 2017

Blog

Graphite vs. Sumo Logic: Building vs. Buying value

No no no NOOOOO NOOOOOOOO… One could hear Mike almost yelling while staring at his computer screen. Suddenly Mike lost the SSH connection to one of the core billing servers. He was in the middle of the manual backup before he could upgrade the system with the latest monkey patch. Mike, gained a lot of visibility and the promotion, after his last initiative of migrating ELK Stack to SaaS-based machine data analytics platform, Sumo Logic. He improved MTTI/MTTR by 90% and uptime of the log analytics service by 70% in less than a month time. With the promotion, he was in-charge of the newly formed site reliability engineering (SRE) team. He had 4 people reporting to him. It was a big deal. This was his first major project after the promotion and he wanted to ensure that everything goes well. But just now, something happened to the billing server and Mike had a bad feeling about it. He waited for few minutes to check if the billing server will start responding again. It has happened before, where SSH client used to temporarily lose the connection to the server. The root cause of the connection loss was the firewall in the corporate headquarters. They had to upgrade the firewall to fix this issue. Mike was convinced that it’s not the firewall, but something else has happened to the billing server, and this time around there was a way to confirm his hunch. To view what happened to the billing server he runs a query on Sumo Logic. “_SourceHost=billingserver AND “shut*” He quickly realizes that server was rebooted. He broadens the search to +-5 minutes range from the above log message and identifies that disk was full. He added some more disk to the existing server to ensure that billing server does not restart because of the lack of hard drive space. However, Mike had no visibility into host metrics such as CPU, Hard Disk, and Memory usage. He needed a solution to gather host and custom metrics. He couldn’t believe how the application was managed without these metrics. He knew very well that Metrics must be captured to get visibility into system health. So he reprioritized his Metrics project over making ELK stack and entire infrastructure PCI compliant. Stage 1: Installing Graphite After a quick search, he identifies Graphite as one of his options. He had a bad taste in his mouth related to ELK, which cost him arm and a leg for just a search feature. This time though, he thought it will be different. Metrics were only 12 bytes in size! He thought how hard can it be to store 12 Bytes of data for 200 Machines? He chose Graphite as their open-source host metrics system. He downloads and installs the latest graphite on AWS t2.medium @ $0.016 USD per hour, Mike can get 4GB RAM with 2 vCPU. In less than $300 USD Mike is ready to test his new Metrics system. Graphite has three main components. Carbon, whisper and Graphite Web. Carbon listens on a TCP port and expects time series metrics. Whisper is a flat-file database while Graphite Web is a Django application that can query Carbon-cache and Whisper. He installs all of this on one single server. The logical architecture looks some like in Figure 1 below. Figure 1: Simple Graphite Logical Architecture on a Single Server Summary: At the end of stage 1, Mike had a working solution with a couple of servers on AWS. Stage 2: New Metrics stopped updating – the First issue with Graphite On a busy day, suddenly new metrics were not shown in the UI. This was the first time ever after few months of operations that Graphite was facing issues. After careful analysis, it was clear that metrics were getting written to the whisper files. Mike, thought for a second and realized that whisper pre-allocates the disk space to whisper files based on the configuration in carbon.conf file. To make it more concrete, 31.1 MB is pre-allocated by whisper for 1 metric collected every 1 second for one host and retained for 30 days. Total Metric Storage = 1 Host* 1 metric/sec* 60 sec *60 mins *24 hrs *30 days retention. He realized that he might have run out of disk space and sure enough, that was the case. He doubled the disk space, restarted the graphite server and now new data points started showing up. Mike was happy that he was able to resolve the issue before it got escalated. However, his mind started creating “What-If” scenarios. What if the application he is monitoring goes down exactly at the same time Graphite gives up? He parks that scenario in the back of his head and goes back to working on other priorities. Summary: At the end of stage 2, Mike already had incurred additional storage cost and ended up buying EBS Provisioned IoPS volume. SSD would have been better but this is the best he could do with the allocated budget. Stage 3: And Again New Metrics Stopped Updating On Saturday night 10 PM there was a marketing promotion. Suddenly it went viral and a lot of users logged into the application. Engineering had auto-scaling enabled on its front end while Mike had ensured that new images will automatically enable StatsD. Suddenly the metrics data points per minute (DPM) grew significantly and way above average DPM. Mike, had no idea about these series of events. The ticket with only information he received was “New Metrics are not showing up, AGAIN!” He quickly found out the following. MAX_UPDATES_PER_SECOND which determines how many updates you must have per second was increasing gradually also MAX_CREATES_PER_MINUTE was at its max. Mike quickly realized the underlying problem. It was the I/O problem causing the server to crash because graphite server is running out of memory. Here is how he connects the dots. Auto-scaling kicks in and suddenly 800 servers start sending the metrics to graphite. This is four times the load than the average number of hosts running at any given time. This quadruples the metrics ingested as well. Graphite configurations MAX_UPDATE_PER_SECOND and MAX_CREATES_PER_MINUTE reduces the load on disk I/O but it has an upstream impact. Suddenly carbon-cache starts using more and more memory. Considering “MAX_CACHE_SIZE” was set to infinite, Carbon-cache kept storing the metrics in the memory that was waiting to be written to whisper/disk. As carbon-cache process ran out of memory it crashed and sure enough, metrics stopped getting updated. So Mike added EBS volume with provisioned I/O and upgraded the server to M3 Medium instead of t2. Summary: At the end of stage 3, Mike has already performed two migrations. First, by changing the hard-drive he had to transfer the graphite data. Second, after changing the machine he had to reinstall and repopulate the data. Not to mention this time he has to reconfigure all the clients to send metrics to this new server. Figure 2: Single Graphite M3 Medium Server after Stage 3 Stage 4: Graphite gets resiliency, but at what cost? Mike from his earlier ELK experience learned one thing, that he cannot have any single point of failures in his data ingest pipeline at the same time he has to solve for the Carbon relay crash. Before anything happens he has to resolve the single point of failure in the above architecture and allocate more memory to carbon-relay. He decided to replicate similar graphite deployment in a different availability zone. This time he turns on the replication in the configuration file and creates the architecture as below. The architecture below ensures replication and adds more memory to carbon-relay process so that it can hold metrics in memory while whisper is busy writing them to the disk. Summary: At the end of stage 4, Mike has resiliency with replication and more memory for Carbon relay process. This change has doubled the Graphite cost from the last time. Figure 3: Two Graphite M3 Medium Server with replication after Stage 4 Stage 5: And another one bites the dust… Yet another Carbon Relay issue. Mike was standing in the line for the hot breakfast. At this deli, one has to pay first and then get their breakfast. He saw a huge line at the cashier. The cashier seemed to be a new guy. He was slow and the line was getting longer and longer. It was morning and everyone wanted to quickly get back. Suddenly Mike’s brain started drawing an analogy. He thought carbon-relay as a cashier, person serving the breakfast as a carbon-cache and chef as a whisper. The chef takes the longest time because he has to cook the breakfast. Suddenly he realizes the flaw in his earlier design. There is a line port (TCP 2003) and a Pickle port(TCP 2004) on Carbon-relay. Every host is configured to throw metrics at those ports. The moment Carbon-Relay gets saturated there is no way to scale them up without adding new servers and some network reconfigurations and hosts configuration changes. To avoid that kind of disruptions, he quickly comes up with a new design he calls it relay-sandwich. He separates out HA proxy on its dedicated server. Carbon-relay also gets its own server so that it can scale horizontally without changing the configuration at the host level. Summary: Each Graphite instance has four servers and total of 8 servers across two graphite instances. At this point, the system is resilient with headroom to scale carbon-relay. Figure 4: Adding more servers with HA Proxy and Carbon Relay Stage 6: Where is my UI? As you all must have noticed this is just the backend architecture. Mike was the only person running the show but if he wants more users to have access to this system, he must scale front end as well. He ends up installing Graphite-Web and the final architecture becomes as shown in figure 5. Summary: Graphite evolved from single server to 10 machine Graphite cluster instance managing metrics only for the fraction of their infrastructure. Figure 5: Adding more servers with HA Proxy and Carbon Relay Conclusion: It was Deja-vu for Mike. He had seen this movie before with ELK. After 20 servers in with Graphite, he was just getting started. He quickly realizes that if he enables custom metrics he has to double the size of his graphite cluster. Currently, the issue is graphite has metrics indicating “What” is wrong with the system while with Sumo Logic platform with correlated logs and metrics not only indicates “what” is wrong with the system but also indicates “why” something is wrong. Mike, turns on Sumo Logic metrics on the same collectors collecting logs and gets correlated logs and metrics on Sumo Logic platform. Best part he is not on the hook to manage the management system.

May 31, 2017

Blog

6 Metrics You Should Monitor During the Application Build Cycle

Monitoring application metrics and other telemetry from production environments is important for keeping your app stable and healthy. That you know. But app telemetry shouldn’t start and end with production. Monitoring telemetry during builds is also important for application quality. It helps you detect problems earlier on, before they reach production. It also allows you to achieve continuous, comprehensive visibility into your app. Below, we’ll take a look at why monitoring app telemetry during builds is important, then discuss the specific types of data you should collect at build time. App Telemetry During Builds By monitoring application telemetry during the build stage of your continuous delivery pipeline, you can achieve the following: Early detection of problems. Telemetry statistics collected during builds can help you to identify issues with your delivery chain early on. For example, if the number of compiler warnings is increasing, it could signal a problem with your coding process. You want to address that before your code gets into production. Environment-specific visibility. Since you usually perform builds for specific types of deployment environments, app telemetry from the builds can help you to gain insight into the way your app will perform within each type of environment. Here again, data from the builds helps you find potential problems before your code gets to production. Code-specific statistics. App telemetry data from a production environment is very different from build telemetry. That’s because the nature of the app being studied is different. Production telemetry focuses on metrics like bandwidth and active connections. Build telemetry gives you more visibility into your app itself—how many internal functions you have, how quickly your code can be compiled, and so on. Continuous visibility. Because app telemetry from builds gives you visibility that other types of telemetry can’t provide, it’s an essential ingredient for achieving continuous visibility into your delivery chain. Combined with monitoring metrics from other stages of delivery, build telemetry allows you to understand your app in a comprehensive way, rather than only monitoring it in production. Metrics to Collect If you’ve read this far, you know the why of build telemetry. Now let’s talk about the how. Specifically, let’s take a look at which types of metrics to focus on when monitoring app telemetry during the build stage of your continuous delivery pipeline. Number of environments you’re building for. This might seem so basic that it’s not worth monitoring. But in a complex continuous delivery workflow, it’s possible that the types of environments you target will change frequently. Tracking the total number of environments can help you understand the complexity of your build process. It can also help you measure your efforts to stay agile by maintaining the ability to add or subtract target environments quickly. Total lines of source code. This metric gives you a sense of how quickly your application is growing—and by extension, how many resources it will consume, and how long build times should take. The correlation between lines of source code and these factors is rough, of course. But it’s still a useful metric to track. Build times. Monitoring how long builds take, and how build times vary between different target environments is another way to get a sense of how quickly your app is growing. It’s also important for keeping your continuous delivery pipeline flowing smoothly. Code builds are often the most time-consuming process in a continuous delivery chain. If build times start increasing substantially, you should address them in order to avoid delays that could break your ability to deliver continuously. Compiler warnings and errors. Compiler issues are often an early sign of software architecture or coding issues. Even if you are able to work through the errors and warnings that your compiler throws, monitoring their frequency gives you an early warning sign of problems with your app. Build failure rate. This metric serves as another proxy for potential architecture or coding problems. Code load time. Measuring changes in the time it takes to check out code from the repository where you store it helps you prevent obstacles that could hamper continuous delivery. Monitoring telemetry during the build stage of your pipeline by focusing on the metrics outlined above helps you not only build more reliably, but also gain insights that make it easier to keep your overall continuous delivery chain operating smoothly. Most importantly, they help keep your app stable and efficient by assisting you in detecting problems early and maximizing your understanding of your application.

Blog

7 Ways the Sumo Logic Redesign Will Change Your Life

We’re excited to announce our biggest user experience overhaul as Sumo Logic enters its 8th year. Here’s quick list of some amazing things in the new UI. 1. An integrated workspace Everything now opens in tabs. This makes workflows like drilling down into dashboards, switching between log searches, or jumping between Metrics and Log Searching much smoother. The workspace remembers your state so when you log back into Sumo Logic, it fires up your tabs from the previous session. An Integrated Workspace 2. Quick access to your content Did you know about Sumo Logic library? It’s front and center now so you can quickly access your saved content and content shared with you. If you find yourself running the same searches and opening the same dashboards over and over again, you can now quickly launch them from the Recent tab. Quick access to content 3. Sumo Home Do you feel like a deer in headlights when you log in and see the blinking cursor on the search page? Not anymore! Sumo Home gives you a starting point full of useful content. Future improvements will let you personalize this page for your unique workflows. 4. A modern, cleaner, more consistent interface A fresh set of icons, an updated content library, and a tabbed, browser-like behavior are some of the many visual upgrades we did to get Sumo Logic ready for 2017. A modern interface 5. A beautiful App Catalog The App Catalog was redesigned from the ground up and now gives you a visual way of browsing through prebuilt content to help you get started faster and get more out of your logs. 6. Distraction-free mode Sometimes you need all the space you can get while troubleshooting. You can collapse the left navigation and pop out Dashboard tabs to give more real estate to your data. Distraction Free Mode 7. The Back Button works Hitting your browser’s back button will not log you out of Sumo Logic anymore, thanks to a smart UI routing. We solved one of biggest user pet peeves Check out the redesign! If you have any feedback on the redesign, please feel free to reach out to us at ux-feedback@sumologic.com or leave us comments directly on the Home Page.

May 24, 2017

Blog

AWS Config: Monitoring Resource Configurations for Compliance

AWS Config is an indispensable service with a bit of an identity problem: It really should be called something like “AWS Monitors Everything And Keeps Your Apps In Compliance,” because it is that important. But since there’s no way to put everything it does in a short, snappy name, “AWS Config” will do. What does AWS Config do? Basically, it monitors the current and past configurations of your AWS resources, compares those configurations to your target configurations, and reports current configurations, changes to configurations, and the ways in which your resources interact (with reference to configuration). Let’s take a closer look at what that means and how it works, starting with the “how it works” part… How AWS Config Works AWS Config continually monitors the configuration of your AWS resources. It records configuration changes in a normalized format, and makes that information available through its API. It also compares current configurations with configuration standards you have established, and makes that information available in dashboard format via its API. AWS Config can also be optionally set to send text alerts regarding both configuration changes and its evaluation of existing configurations vs. your configuration standards. By default, AWS Config tracks the configuration of all of your resources, recording configurations, metadata, attributes, and associated relationships and events. You can, however, tell it to track only specific types of resources. It takes snapshots of resource configurations, and it records an ongoing stream of resource configuration changes, storing this data in configuration histories. These histories can include software (down to the application level), providing you with a comprehensive record of your AWS operation’s configuration. Configuration standards are contained in rules. You can use Amazon’s preconfigured set of rules (which may be fully adequate for many operations), customize those rules, or define your own set of rules. In all cases, AWS Config checks configurations against these rules, and reports the current state of compliance with them. What AWS Config Means to You What does this mean for your organization’s AWS operations? Monitoring is vital to any Internet or network-based application or service, of course. Without it, you cannot guarantee the functionality or security of your software. Configuration monitoring has a special role, since it provides direct insight into an application’s state, its relationship with its environment, and the rules and conditions under which it is currently operating. Most kinds of software monitoring are symptomatic, recording behavior in one form or another, whether it is I/O, CPU or memory use, calls to other modules or system resources, or error messages. This makes it possible to detect many types of trouble and track performance, but it generally does not directly indicate the cause of most functional or performance problems. Configuration monitoring, on the other hand, can give you a direct view into the possible causes of such problems. How does this work? Since AWS Config allows you to codify configuration rules, let’s start with compliance. Regulatory Compliance Many of the online services available today are in regulated industries. This is true of banking and other financial services, of course, but it also applies to such things as health services, insurance, and public utilities. In many cases, failure to comply with regulatory standards for online services can result in significant financial or even legal penalties. These standards (particularly those affecting confidentiality and data security) can and often are reflected in configuration settings. If, for example, you provide online financial services, you may be required to provide a high level of security for both customer and transaction records, to maintain secure records of all activity, and to detect and record anomalous actions. At least some of these requirements may in turn require you to maintain specific configuration settings. If you include the required settings in your customized AWS Config rules, you will have a way to automatically determine whether your site’s configuration has gone out of compliance. You can set AWS Config to automatically send a text alert to the engineers and managers responsible for compliance, so that they can quickly investigate the problem and adjust the configuration to bring your services back into compliance. In-House Standards Even if you do not operate in a regulated industry, you may need to comply with in-house standards within your company, particularly when it comes to things such as security and performance, both of which can require you to maintain specific configuration settings. AWS Config can automatically notify you of any configuration changes which may have an effect on security or performance, so that you remain fully compliant with your company’s standards. Error and Performance Troubleshooting The configuration histories that AWS Config records can also be very valuable in tracing both errors and performance problems. You can look back through the historical record to find out when specific configuration changes took place, and try to correlate them with software failures or performance degradation. AWS Config and Sumo As is often the case with monitoring data, the output from AWS Config becomes considerably more valuable when it is integrated into a comprehensive, analytics-based dashboard system. The Sumo Logic App for AWS Config provides easy integration of AWS Config data into Sumo’s extensive analytics and dashboard system. It gives you not only a powerful overview, but also a detailed look at resource modifications, as well as drill-down insight into resource details. Analytics-based features such as these, which turn AWS Config’s raw data into genuine, multidimensional insights, make it possible to use such data for real-time configuration and performance management, security monitoring, and application optimization. Monitoring configuration data gives you greater hands-on control over security, performance, and functionality, and it provides you with insights which are simply not available with conventional, behavior-based application monitoring by itself. By combining AWS Config and the power of Sumo Logic’s analytics, you can turn your team into genuine software-management superheroes. About the Author Michael Churchman is involved in the analysis of software development processes and related engineering management issues. AWS Config: Monitoring Resource Configurations for Compliance is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Mesosphere DC/OS Logging with Sumo Logic

Mesosphere DC/OS (Data Center Operating System) lets you manage your data center as if it were a single powerful computer. It separates the infrastructure logic from the application logic, and makes it easy to manage distributed systems. Considering DC/OS is meant for highly scalable, distributed systems, logging and monitoring plays a key role in day-to-day operations with DC/OS. In this post, we’ll take a look at the goals of logging with DC/OS, and how you can set up DC/OS logging with Sumo Logic. Why Mesosphere DC/OS Clusters Need Logging When you work with Mesosphere DC/OS, you typically have hundreds, if not thousands of nodes that are grouped in clusters. Tasks are executed on these nodes by Mesos “agent” instances, which are controlled by “master” instances. By grouping the nodes in clusters, DC/OS ensures high availability so that if any node or cluster fails, its workload is automatically routed to the other clusters. DC/OS uses two scheduling tools—Chronos for scheduled tasks like ETL jobs, and Marathon for long-running tasks like running a web server. Additionally, it includes app services like Docker, Cassandra, and Spark. DC/OS supports hybrid infrastructure, allowing you to manage bare metal servers, VMs on-premises, or cloud instances, all from a single pane of glass. Together, all of these components make for a complex system that needs close monitoring. There are two key purposes for collecting and analyzing DC/OS logs. The first is debugging. As new tasks are executed, DC/OS makes decisions in real time on how to schedule these tasks. While this is automated, it needs supervision. Failover needs logging so you can detect abnormal behavior early on. Also, as you troubleshoot operational issues on a day-to-day basis, you need to monitor resource usage at a granular level, and that requires a robust logging tool. Second, for certain apps in enterprises, compliance is a key reason to store historic logs over a long period of time. You may need to comply to HIPAA or PCI DSS standards. Viewing raw logs in DC/OS DC/OS services and tasks write stdout and stderr files in their sandboxes by default. You can access logs via the DC/OS CLI or the console. You can also SSH into a node and run the following command to view its logs: $ journalctl -u "dcos-*" -b While this is fine if you’re running just a couple of nodes, once you scale to tens or hundreds of nodes, you need a more robust logging tool. That’s where a log analysis tool like Sumo Logic comes in. Sharing DC/OS logs with Sumo Logic DC/OS shares log data via a HTTP endpoint which acts as a source. The first step to share Mesosphere DC/OS logs with Sumo Logic is to configure a HTTP source in Sumo Logic. You can do this from the Sumo Logic console by following these steps. You can edit settings like timestamp, and allow multi-line messages like stack traces. Your data is uploaded to a unique source URL. Once uploaded, the data is sent to a Sumo Logic collector. This collector is hosted and managed by Sumo Logic, which makes setup easy, and reduces maintenance later. The collector compresses the log data, encrypts it, and sends it to the Sumo Logic cloud, in real time. During this setup process, you can optionally create Processing rules to filter data sent to Sumo Logic. Here are some actions you can take on the logs being shared: Exclude messages Include messages Hash messages Mask messages Forward messages These processing rules apply only to data sent to Sumo Logic, not the raw logs in DC/OS. It may take a few minutes for data to start showing in the Sumo Logic dashboard, and once it does, you’re off to the races with state-of-the-art predictive analytics for your log data. You gain deep visibility into DC/OS cluster health. You can setup alerts based on the log data and get notifications when failed nodes reach a certain number, or when a high priority task is running too slow, or if there is any suspicious user behavior. Whether it’s an overview, or a deep dive to resolve issues, Sumo Logic provides advanced data analysis that builds on the default metrics of DC/OS. It also has options to archive historic log data for years so you can comply with various security standards like HIPAA or PCI DSS. DC/OS is changing the way we view data centers. It transforms the data center from hardware- centric to software-defined. A comprehensive package, it encourages hybrid infrastructure, prevents vendor lock-in, and provides support for container orchestration. DC/OS is built for modern web scale apps. However, it comes with a new set of challenges with infrastructure and application monitoring. This is where you need a tool like Sumo Logic so that you not only view raw log data, but are also able to analyze it and derive insights before incidents happen. About the Author Twain began his career at Google, where, among other things, he was involved in technical support for the AdWords team. Today, as a technology journalist he helps IT magazines, and startups change the way teams build and ship applications. Mesosphere DC/OS Logging with Sumo Logic is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Improving Your Performance via Method Objects

When Sumo Logic receives metrics data, we put those metrics datapoints into a Kafka queue for processing. To help us distribute the load, that Kafka queue is broken up into multiple Kafka Topic Partitions; we therefore have to decide which partition is appropriate for a given metrics datapoint. Our logic for doing that has evolved over the last year in a way that spread the decision logic out over a few different classes; I thought it was time to put it all in one place. My initial version had an interface like this: def partitionFor(metricDefinition: MetricDefinition): TopicPartition As I started filling out the implementation, though, I began to feel a little bit uncomfortable. The first twinge was when calculating which branch to go down in one of the methods: normally, when writing code, I try to focus on clarity, but when you’re working at the volumes of data that Sumo Logic has to process, you have to keep efficiency in mind when writing code that is evaluated on every single data point. And I couldn’t convince myself that one particular calculation was quite fast enough for me to want to perform it on every data point, given that the inputs for that calculation didn’t actually depend on the specific data point. So I switched over to a batch interface, pulling that potentially expensive branch calculation out to the batch level: class KafkaPartitionSelector { def partitionForBatch(metricDefinitions: Seq[MetricDefinition]): Seq[TopicPartition] = { val perMetric = calculateWhetherToPartitionPerMetric() metricDefinitions.map { metric => partitionFor(metric, perMetric) } } private def partitionFor(metricDefinition: MetricDefinition, perMetric: Boolean): TopicPartition = { if (perMetric) { ... } else { ... } } } That reduced the calculation in question from once per data point to once per batch, getting me past that first problem. But then I ran into a second such calculation that I needed, and a little after that I saw a call that could potentially translate into a network call; I didn’t want to do either of those on every data point, either! (The results of the network call are cached most of the time, but still.) I thought about adding them as arguments to partitionFor() and to methods that partitionFor() calls, but passing around three separate arguments would make the code pretty messy. To solve this, I reached a little further into my bag of tricks: this calls for a Method Object. Method Object is a design pattern that you can use when you have a method that calls a bunch of other methods and needs to pass the same values over and over down the method chain: instead of passing the values as arguments, you create a separate object whose member variables are the values that are needed in lots of places and whose methods are the original methods you want. That way, you can break your implementation up into methods with small, clean signatures, because the values that are needed everywhere are accessed transparently as member variables. In this specific instance, the object I extracted had a slightly different flavor, so I’ll call it a “Batch Method Object”: if you’re performing a calculation over a batch, if every evaluation needs the same data, and if evaluating that data is expensive, then create an object whose member variables are the data that’s shared by all batches. With that, the implementation became: class KafkaPartitionSelector { def partitionForBatch(metricDefinitions: Seq[MetricDefinition]): Seq[TopicPartition] = { val batchPartitionSelector = new BatchPartitionSelector metricDefinitions.map(batchPartitionSelector.partitionFor) } private class BatchPartitionSelector { private val perMetric = calculateWhetherToPartitionPerMetric() private val nextExpensiveCalculation = ... ... def partitionFor(metricDefinition: MetricDefinition): TopicPartition = { if (perMetric) { ... } else { ... } } ... } } One question that came up while doing this transformation was whether every single member variable in BatchPartitioner was going to be needed in every batch, no matter what the feature flag settings were. (Which was a potential concern, because they would all be initialized at BatchPartitioner creation time, every time this code processes a batch.) I looked at the paths and checked that most were used no matter the feature flag settings, but there was one that only mattered in some of the paths. This gave me a tradeoff: should I wastefully evaluate all of them anyways, or should I mark that last one as lazy? I decided to go the route of evaluating all of them, because lazy variables are a little conceptually messy and they introduce locking behind the scenes which has its own efficiency cost: those downsides seemed to me to outweigh the costs of doing the evaluation in question once per batch. If the potentially-unneeded evaluation had been more expensive (e.g. if it had involved a network call), however, then I would have made them lazy instead. The moral is: keep Method Object (and this Batch Method Object variant) in mind: it’s pretty rare that you need it, but in the right circumstances, it really can make your code a lot cleaner. Or, alternatively: don’t keep it in mind. Because you can actually deduce Method Object from more basic, more fundamental OO principles. Let’s do a thought experiment where I’ve gone down the route of performing shared calculations once at the batch level and then passing them down through various methods in the implementation: what would that look like? The code would have a bunch of methods that share the same three or four parameters (and there would, of course, be additional parameters specific to the individual methods). But whenever you see the same few pieces of data referenced or passed around together, that’s a smell that suggests that you want to introduce an object that has those pieces of data as member variables. If we follow that route, we’d apply Introduce Parameter Object to create a new class that you pass around, called something like BatchParameters. That helps, because instead of passing the same three arguments everywhere, we’re only passing one argument everywhere. (Incidentally, if you’re looking for rules of thumb: in really well factored code, methods generally only take at most two arguments. It’s not a universal rule, but if you find yourself writing methods with lots of arguments, ask yourself what you could do to shrink the argument lists.) But then that raises another smell: we’re passing the same argument everywhere! And when you have a bunch of methods called in close proximity that all take exactly the same object as one of their parameters (not just an object of the same type, but literally the same object), frequently that’s a sign that the methods in question should actually be methods on the object that’s a parameter. (Another way to think of this: you should still be passing around that same object as a parameter, but the parameter should be called this and should be hidden from you by the compiler!) And if you do that (I guess Move Method is the relevant term here?), moving the methods in question to BatchParameters, then BatchParameters becomes exactly the BatchPartitionSelector class from my example. So yeah, Method Object is great. But more fundamental principles like “group data used together into an object” and “turn repeated function calls with a shared parameter into methods on that shared parameter” are even better. And what’s even better than that is to remember Kent Beck’s four rules of simple design: those latter two principles are both themselves instances of Beck’s “No Duplication” rule. You just have to train your eyes to see duplication in its many forms.

May 9, 2017

Blog

Building Java Microservices with the DropWizard Framework

Blog

An Introduction to the AWS Application Load Balancer

Blog

The Importance of Logs

Blog

The DockerCon Scoop - Containers, Kubernetes and more!

Ahhh DockerCon, the annual convention for khaki pant enthusiasts. Oh, wait, not that Docker. Last week DockerCon kicked off with 5500 Developers, IT Ops Engineers and enterprise professionals from across the globe. With the announcement of new features like LinuxKit and the Moby project, Docker is doubling down on creating tools that enable mass innovation while simplifying and accelerating the speed of the delivery cycle. Docker is starting to turn a corner, becoming a mature platform for creating mission-critical, Enterprise class applications. Throughout all of this, monitoring and visibility into your infrastructure continues to be critical to success. Current Trends In the world of containers, there are three trends we are seeing here at Sumo Logic. First, is the rapid migration to containers. Containers provide great portability of code and easier deployments. Second is the need for visibility. While migrating to containers have simplified the deployment process, it is definitely a double-edged sword. The ability to monitor your containers health, access the container logs and monitor the cluster on which your containers run is critical to maintaining the health of your application. The last trend is the desire to consolidate tools. You may have numerous tools helping you monitor your applications. Having multiple tools introduces “swivel chair” syndrome, where you have to switch back and forth between different tools to help diagnose issues as they are happening. You may start with a tool showing you some metrics on CPU and memory, indicating something is going wrong. Metrics only give you part of the visibility you need. You need to turn to your logs to figure out why this is happening. Monitoring Your Containers and Environment Sumo Logic’s Unified Logs and Metrics are here to help give you full visibility into your applications. To effectively monitor your applications, you need the whole picture. Metrics give you insights into what is happening, and logs give you insights into why. The union of these two allow you to perform root cause analysis on production issues to quickly address the problem. Sumo Logic can quickly give you visibility into your Docker containers leveraging our Docker Logs and Docker Stats sources. Our Docker application allows you to gain immediate visibility into the performance of your containers across all of your Docker hosts. Collecting Logs and Metrics From Kubernetes At DockerCon, we saw an increased use of Kubernetes and we received many questions on how to collect data from Kubernetes clusters. We have created a demo environment that is fully monitored by Sumo Logic. This demo environment is a modern application leveraging a micro-services architecture running in containers on Kubernetes. So how do we collect that data? Well, the below diagram helps illustrate that. We created a FluentD plugin to gather the logs from the nodes in the cluster and enrich them with metadata available in Kubernetes. This metadata can be pulled into Sumo Logic giving you increased ability to search and mine your data. We run the FluentD plugin as a Daemonset which ensures we collect all the logs for every node in our cluster. For metrics, we are leveraging Heapster’s ability to output to a Graphite sink and using a Graphite Source on our collector to get the metrics into Sumo Logic. Since Heapster can monitor metrics at the cluster, container and node level, we just need to run it and the collector as a deployment to get access to all the metrics that Heapster has to offer. What's Next What if you are not running in Kubernetes? In a previous post, we discussed multiple ways to collect logs from containers. However, due to the fast-paced growth in the container community, it is time to update that and we will add a post to dive deeper into that.

Blog

Add Logging to Your Apps with the New Sumo Logic Javascript Logging SDK

April 26, 2017

Blog

Best Practices for Creating Custom Logs - Part I

Overview When logging information about your operating system, services, network, or anything else, usually there’s predefined log structures in place by the vendor. There are times when there aren’t predefined logs created by some software or you have custom application logs from your own software. Without properly planning your log syntax you’ll be using, things can get messy and your data may lose its integrity to properly tell the story. These best practices for creating custom logs can be applied to most logging solutions. The 5 W's There are 5 critical components of a good log structure*: When did it happen (timestamp) What happened (e.g., error codes, impact level, etc) Where did it happen (e.g., hostnames, gateways, etc) Who was involved (e.g., usernames) Where he, she, or it came from (e.g., source IP) Additionally, your custom logs should have a standard syntax that is easy to parse with distinct delimiters, a key-value pair, or a combination of both. An example of a good custom log is as follows: 2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flight/ - credit.payments.io - Success - 2 - 241.98 This log message shows when it was performed, what was performed, where it happened in your system, who performed it, and where that user came from. It’s also structured cleanly with a space-dash-space as the delimiters of each field. Optionally, you can also have key-value pairs to assist with parsing: timestamp: 2017-04-10 09:50:32 -0700 - username: dan12345 - source_ip: 10.0.24.123 - method: GET - resource: /checkout/flight/ - gateway: credit.payments.io - audit: Success - flights_purchased: 2 - value: 241.98 Once you have your log syntax and what will be going into the logs, be sure to document this somewhere. You can document it by adding a comment at the top of each log file. Without documentation, you may forget or someone may not know what something like “2” or “241.98” represents (for this example, it means 2 flights in the checkout at a value of $241.98). You can document our log syntax as such: Timestamp - username - user_ip - method - resource - gateway - audit - flights_purchased - value In the second part of this three part series, we'll go into deeper details around timestamps and log content. In the final part, we'll go even deeper into log syntax and documentation. *Source: Logging and Log Management: The Authoritative Guide to Understanding the Concepts Surrounding Logging and Log Management. Chuvakin, A., Phillips, C., & Schmidt, K. J. (2013).

April 20, 2017

Blog

Best Practices for Creating Custom Logs - Part II

Diving Deeper Now that you have an overview for custom logs and what is involved in creating a good log practice from Part I of the series, it’s time to look further into what and why you should log in your system. This will be broken up into two parts. The first will cover timestamps and content, and the second part will cover syntax and documentation. Timestamp The first and most critical component of just about any log syntax is your timestamp - the “when”. A timestamp is important as it will tell you exactly the time an event took place in the system and was logged. Without this component, you’ll be relying on your log analysis solution to stamp it based upon when it came in. Adding a timestamp at the exact point of when an entry is logged will make sure you are consistently and accurately placing the entry at the right point in time for when it occurred. RFC 3339 is used to define the standard time and date format on the internet. Your timestamp should include year, month, day, hour, minute, second, and timezone. Optionally, you’ll also want to include sub-second depending on how important and precise you’ll need your logs to get for analysis. For Sumo Logic, you can read about the different formats for timestamps that are supported here - Timestamps, Time Zones, Time Ranges, and Date Formats. Log Content To figure out what happened, this can include data such as the severity of the event (e.g., low, medium, high; or 1 through 5), success or failure, status codes, resource URI, or anything else that will help you or your organization know exactly what happened in an event. You should be able to take a single log message or entry out of a log file and know most or all critical information without depending on logs’ file name, storage locations, or automatic metadata tagging from your tool. Your logs should tell a story. If they’re complex, they should also be documented as discussed later on. Bad Logs For a bad example, you may have a log entry as such: 2017-04-10 09:50:32 -0700 Success While you know that on April 10, 2017 at 9:50am MT an event happened and it was a success, you don’t really know anything else. If you know your system inside and out, you may know exactly what was successful; however, if you handed these logs over to a peer to do some analysis, they may be completely clueless! Good Logs Once you add some more details, the picture starts coming together: 2017-04-10 09:50:32 -0700 GET /checkout/flights/ Success From these changes you know on April 10th, a GET method was successfully performed on the resource /checkout/flights/. Finally, you may need to know who was involved and where. While the previous log example can technically provide you a decent amount of information, especially if you have tiny environment, it’s always good to provide as much detail since you don’t know what you may need to know for the future. For example, usernames and user IPs are good to log: 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ Success Telling the Story Now you have even more details about what happened. A username or IP may individually be enough, but sometimes (especially for security) you’ll want as much as you can learn about the user since user accounts can be hacked and/or accessed from other IPs. You have just about enough at this point to really tell a story. To make sure you know whatever you can about the event, you also want to know where things were logged. Again, while your logging tool may automatically do this for you, there’s many factors that may affect the integrity and it’s best to have your raw messages tell as much as possible. To complete this, let’s add the gateway that logged the entry: 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ credit.payments.io Success Now you know that this was performed on a gateway named credit.payments.io. If you had multiple gateways or containers, you may come to a point of needing to identify which to fix. Omitting this data from your log may result in a headache trying to track down exactly where it occurred. This was just 1 example of some basics of a log. You can add as much detail to this entry to make sure you know whatever you can for any insight you need now or in the future. For example, you may want to know other info about this event. How many flights were purchased? 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ credit.payments.io Success 2 Where 2 is the amount of flights. What was the total value of the flights purchased? 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ credit.payments.io Success 2 241.9 Where 2 is the amount of flights, and they totalled $241.98. Now that you know what to put into your custom logs, you should also consider deciding on a standard syntax throughout your logs. This will be covered in the last part of this series on best practices for creating custom logs.

April 20, 2017

Blog

Best Practices for Creating Custom Logs - Part III

Diving Even Deeper In Part I there was a general overview of custom logs and Part II discussed timestamps and log content. At this point, you have a log that contains a bunch of important data to help you analyze it to gather useful information about your systems. In this final part of this series, you’ll learn about how to organize the data in your logs and how to make sure you properly document it. Log Syntax You may have the most descriptive and helpful data in your logs, but it can be very difficult to analyze your logs if you don’t have a defined and structured syntax. There are generally 2 ways to go about structuring your logs. Key-Value When it comes to log analysis and parsing your logs, a key-value pair may be the simplest and allow for the most readable format. In our previous example, it may not be the most human-readable format and it may be a little more difficult to find anchors to parse against. You can change the message to be easier to read by humans and easier to parse in a tool like Sumo Logic: timestamp: 2017-04-10 09:50:32 -0700, username: dan12345, source_ip: 10.0.24.123, method: GET, resource: /checkout/flights/, gateway: credit.payments.io, audit: Success, flights_purchased: 2, value: 241.98 You can take it a step further and structure your logs in a JSON format: { timestamp: 2017-04-10 09:50:32 -0700, username: dan12345, source_ip: 10.0.24.123, method: GET, resource: /checkout/flights/, gateway: credit.payments.io, audit: Success, flights_purchased: 2, value: 241.98, } In Sumo Logic, you have various ways to parse through this type of structure including a basic Parse operator on predictable patterns or even Parse JSON. While it is ideal to use some sort of key-value pairing, it is not always the most efficient as you’re potentially doubling the size of an entry that gets sent and ingested. If you have low log volume, this wouldn’t be an issue; however, if you are generating logs at a high rate, it can become very costly to have log entries of that size. This brings us to the other format, which are delimited logs. Delimited Delimited logs are essentially the type of log you built in the previous examples. This means that it’s a set structure to your log format, and different content is broken up by some sort of delimiter. 2017-04-10 09:50:32 -0700 dan12345 10.0.24.123 GET /checkout/flights/ credit.payments.io Success 2 241.98 Because of how this example is structured, spaces are the delimiters. To an extent, this is perfectly reasonable. The problem this provides you when parsing is figuring out where fields start and end as you see with the timestamp, though it may be the most efficient and smallest size you can get for this log. If you need to stick with this format, you’ll probably be sticking to regular expressions to parse your logs. This isn’t a problem to some, but others regular expressions can understandably be a challenge. To try and reduce the need for regular expressions, you’ll want to use a unique delimiter. A space can sometimes be one, but it may require us to excessively parse the timestamp. You may want to use a delimiter such as dash, semicolon, comma, or another character (or character pattern) that you can guarantee will never be used in the data of your fields. 2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flights/ - credit.payments.io - Success - 2 - 241.98 A syntax like this will allow you to parse out the entire message with a space-dash-space ( - ) as your delimiter of the fields. Space-dash-space would make sure that the dashes in the timestamp are not counted as a delimiter. Finally, to make sure you don’t have an entry that can be improperly parsed, always make sure you have some sort of filler in place of any fields that may not have data. For example: 2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flights/ - credit.payments.io - Failure - x - x Furthermore from the example, you know that the event was a failure. Because it failed, it didn’t have flight totals or values. To prevent needing additional parsers for not having those fields, you simply can replace those fields with something like an ‘x’. Note that if you’re running aggregates or math against a field that may typically be a number, you may require adding some additional logic to your search queries. Documentation You may have the greatest log structure possible, but without proper documentation it’s possible to forget why something was part of your logging structure or you may forget what certain fields represented. You should always have documented what your log syntax represents. Referring back to the previous log example: 2017-04-10 09:50:32 -0700 - dan12345 - 10.0.24.123 - GET - /checkout/flights/ - credit.payments.io - Success - 2 - 241.98 You can document your log syntax as such: Timestamp - username - user_ip - method - resource - gateway - audit - flights_purchased - value This log syntax can placed at the very start of the log file one time for future reference if necessary. Conclusion At Sumo Logic, we regularly work with those who are new to logging and have many questions around how to get the most out of their logs. While you can start ingesting your logs and getting insights almost immediately, the information provided from the tool is only as good as the data we receive. Though most vendors do a good job in sticking to standard log structures with great data to get these insights, it’s up to you to standardize a custom created log. In this series, I set out to help you create logs that have relevant data to know as much as you can about your custom applications. As long as you stick to the “5 W’s”, you structure your logs in a standard syntax, and you document it, then you’ll be on the right track to getting the most out of Sumo Logic. Be sure to sign up for a free trial of Sumo Logic to see what you can do with your logs!

April 20, 2017

Blog

What does it take to implement & maintain a DevSecOps approach in the Cloud

Operational and Security Tips, Tricks and Best Practices In Gartner’s Top 10 Strategic Technology Trends for 2016: Adaptive Security Architecture, they argued that “Security must be more tightly integrated into the DevOps process to deliver a DevSecOps process that builds in security from the earliest stages of application design.” We ultimately need to move to this model if we are going to be successful and continue to reduce the dwell time of cyber criminals who are intent on compromising our applications and data. But how do we get from this: To this? Easier said than done. To answer this question, I sat down with our CISO and IANS Faculty Member George Gerchow about what it means to implement and maintain a DevSecOps approach in the cloud – and what operational and security best practices should organizations follow to ensure success in their move to the cloud. Below is a transcript of the conversation. DevSecOps seems like a buzz word that everyone is using these days. What does DevSecOps really mean? George: It is really about baking security in from Day 1. When you’re starting to put new workloads in the cloud or have these green field opportunities identified, start changing your habits and your behavior to incorporate security in from the very beginning. In the past we used to have a hard shell soft center type approach to security and in the cloud there is no hardshell, and we don’t run as many internal applications anymore. Now we’re releasing these things out into the wild into a hostile environment so you gotta be secure since day 1. Your developers and engineers, you have to have people who think security first when they’re developing code, that is most important take away” What does it really mean when you say baking security in….or the term shifting left, which I am starting to hear our there? George: It is about moving security earlier into the conversation, earlier into the software development lifecycle. You need to get developers to do security training. I’m talking about code review, short sprints, understanding what libraries are safe to use, and setting up feature flags that will check code in one piece at a time. The notion of a full release is a thing of the past – individual components are released continually. There also needs to be a QA mindset of testing the code and micro services to break it and then, fix accordingly through your agile DevSecOps methodologies. Sumo Logic is a cloud native service running in AWS for over 7 years now – why did you decide to build your service in the cloud? Can you describe a bit about that journey, what was it like, what obstacles did you face, how did you overcome them? And lastly, what did you learn along the way? George: Our company founders came from HP Arcsight and new full well of the pain in managing the execution environment – the hardware and software provisioning, the large teams needed, the protracted time to roll out new services. The cloud enabled us to be agile, flexible, highly elastic, and do this all securely at scale – it is at a level that was just not possible if we chose an on-prem model. The simplicity and automation capabilities of AWS was hugely attractive. You start setting up load balancers to be able to leverage tools like Chef to be able to do manage machine patching – it gets easier – and then you can start automating things from the very beginning so I think it’s that idea of starting very simple and leveraging native services that cloud service providers give you you and then looking for the gaps. The challenge initially was that this is a whole new world out and then the bigger challenge became getting people to buy off on the fact that cloud is more secure. People just weren’t there yet. What does Sumo Logic’s footprint look like in AWS? George: 100PB+ of data that is analyzed daily, 10K EC2 instances on any given day, 10M keys under management! We have over 1,300 customers and our service is growing by leaps and bounds. At this stage, it is all about logos – you wanna bring people in and you can’t afford to have bad customer service because this is a subscription-based model. When you think about the scale that we have, it’s also the scale that we have to protect our data. Now the challenge of quadrupling that number every year is extremely difficult so you have a long term view when it comes to scalability of security. 10,000+ instances it’s a very elastic type environment and auditors really struggle with this. One of the things that i’m the most proud of…if you look at hundreds of petabytes processed and analyzed daily, thats insane…thats the value of being in the cloud. 10 million of keys under management…thats huge…really?? George: It’s a very unique way that we do encryption. It makes our customers very comfortable with the dual control models…that they have some ownership over the keys and then we have capability to vault the keys for them. We do rotate the keys every 24 hours. The customers end up with 730 unique key rings on their keys at the end of the year. It’s a very slick, manageable program. We do put it into a vault and that vault is encrypted with key encryption key (KEK). So What Tools and Technologies are you using in AWS? George: Elastic load balancers are at the heart of what we do…we set up those load balancers to make sure that nothing that’s threatening gets through…so that’s our first layer of defense and then use security groups and we use firewalls to be able to route traffic to the right places to make sure only users can access. We use file integrity monitoring and we happen to use host sec for that across every host and we manage and that gives us extreme visibility. We also leverage IDS and snort and those signatures across those boxes to detect any kind of signature based attacks. Everything we do in the cloud is agentless or on the host. When you’re baking security in you have it on ALL of your systems, spun up automatically via scripts. We also have a great partnership with Crowdstrike, where threat intelligence is baked into our platform to identify malicious indicators of compromise and match that automatically to our customers logs data – very powerful So how are you leveraging Sumo to secure your own service? Can you share some of the tips, tricks and best practices you have gleaned over the years? George: Leveraging apps like CloudTrail, now we are able to see when a event takes place who is the person behind the event, and start looking for the impact of the event. I’m constantly looking for authorization type events (looking at Sumo Dashboards). When it comes to compliance I have to gather evidence of who is in the security groups. Sumo is definitely in the center of everything that we do. We have some applications built also for PCI and some other things as well to VPC flow logs but it gives us extreme visibility. We have dashboards that we have built internally to manage the logs and data sources. It is extremely valuable once you start correlating patterns of behavior and unique forms of attack patterns across the environment. You need to be able to identify how does that change that you just made impact the network traffic and latency in my environment and pulling in things like AWS inspector…How did that change that you made have an impact on my compliance and security posture. You want to have the visibility but then measure the level of impact when someone does make a change and even more proactively I want to have the visibility when something new is added to the environment or when something is deleted from the environment. Natively in AWS, it is hard to track these things” How does the Sumo Logic technology stack you talked about earlier help you with Compliance? George: Being able to do evidence gathering and prove that you’re protecting data is difficult. We’re protecting cardholder data, healthcare data and a host of other PII from the customers we serve across dozens of industries. We pursue our own security attestations like PCI, CSA Star, ISO 27001, SOC 2 Type 2, and more. We do not live vicariously through the security attestations of AWS like too many organizations do. Also, encryption across the board. All of these controls and attestations give people a level of confidence that we are doing the right things to protect their data and there’s actual evidence gathering going on. Specifically with respect to PCI, we leverage Sumo Logic PCI apps for evidence gathering -nonstop- across CloudTrail, Windows and Linus servers. We built out those apps for internal use, but released them to the public at RSA. There are a lot of threat actors out there, from Cyber Criminals, Corporate Spies, Hacktivists and Nation States. How do you see the threat landscape changing wrt the cloud. Is the risk greater given the massive scale of the attack surface? If someone hacked into an account, could they cause more damage by pointing their attack at Amazon, from within the service, possibly affecting millions of customers? George: It all starts with password hygiene. People sacrifice security for convenience. It’s a great time for us to start leveraging single sign on and multi factor authentication and all these different things that need to be involved but at a minimum end users should use heavily encrypted passwords…they should not bring in their personal type application passwords into the business world…If you start using basic password hygiene since day 1, you’re gonna follow the best habits in the business world. The people who should be the most responsible are not…I look at admins and developers in this way…all the sudden you have a developer put their full blown credentials into a slack channel. So when you look out toward the future, wrt the DevSecOps movement, the phenomenal growth of cloud providers like AWS and Azure, Machine learning and Artificial Intelligence, the rise of security as code, ….What are your thoughts, where do you see things going, and how should companies respond? George: First off, for the organizations that aren’t moving out to the cloud, at one point or the other, you’re gonna find yourself irrelevant or out of business. Secondly, you’re going to find that that the cloud is very secure. You can do a lot using cloud-based security if you bake security in since day one and work with your developers…if you work with your team…. you can be very secure. The future will hold a lot of cloud-based attacks. User behavior analytics…I can’t no longer go through this world of security and have hard-coded rules and certain things that I’m constantly looking for with all these false positives. I have to be able to leverage machine learning algorithms to consume and crunch through that data. The world is getting more cloudy more workloads moving into the cloud, teams will be coming together…security will be getting more backed in into the process. How would you summarize everything? George: “You’re developing things, you wanna make sure you have the right hygiene and security built into it and you have visibility into that and that allows you to scale as things get more complex where things actually become more complex is when you start adding more humans into it and you have less trust but if you have that scalability and visibility from day one and a simplistic approach, it’s going to do a lot of good for you. Visibility allows you to make quick decisions and it allows you to automate the right things and ultimately you need to have visibility because it allows you to have the evidence that you need to be compliant to help people feel comfortable that you’re protecting your data in the right way. George Gerchow can be reached at https://www.linkedin.com/in/georgegerchow or @georgegerchow

Blog

Top Patterns for Building a Successful Microservices Architecture

Why do you need patterns for building a successful microservices architecture? Shouldn’t the same basic principles apply, whether you’re designing software for a monolithic or microservices architecture? Those principles do largely hold true at the highest and most abstract levels of design (i.e., the systems level), and at the lowest and most concrete levels (such as classes and functions). But most code design is really concerned with the broad range between those two extremes, and it is there that the very nature of microservices architecture requires not only new patterns for design, but also new patterns for reimagining existing monolithic applications. The truth is that there is nothing in monolithic architecture that inherently imposes either structure or discipline in design. Almost all programming languages currently in use are designed to enforce structure and discipline at the level of coding, of course, but at higher levels, good design still requires conscious adherence to methodologies that enforce a set of architectural best practices. Microservices architecture, on the other hand, does impose by its very nature a very definite kind of structural discipline at the level of individual resources. Just as it makes no sense to cut a basic microservice into arbitrary chunks, and separate them, it makes equally little sense to bundle an individual service with another related or unrelated service in an arbitrary package, when the level of packaging that you’re working with is typically one package per container. Microservices Architecture Requires New Patterns In other words, you really do need new patterns in order to successfully design microservices architecture. The need for patterns starts at the top. If you are refactoring a monolithic program into a microservices-based application, the first pattern that you need to consider is the one that you will use for decomposition. What pattern will you use as a guide in breaking the program down into microservices? What are the basic decomposition patterns? At the higher levels of decomposition, it makes sense to consider such functional criteria as broad areas of task-based responsibility (subdomains), or large-scale business/revenue-generating responsibilities (business capabilities). In practice, there is considerable overlap between these two general functional patterns, since a business’ internal large-scale organization of tasks is likely to closely match the organization of business responsibilities. In either case, decomposition at this level should follow the actual corporate-level breakdown of basic business activities, such as inventory, delivery, sales, order processing, etc. In the subsequent stages of decomposition, you can define groups of microservices, and ultimately individual microservices. This calls for a different and much more fine-grained pattern of decomposition—one which is based largely on interactions within the application, with individual users, or both. Decomposition Patterns for Microservices Architecture There are several ways to decompose applications at this level, depending in part on the nature of the application, as well as the pattern for deployment. You can combine decomposition patterns, and in many if not most cases, this will be the most practical and natural approach. Among the key microservice-level decomposition patterns are: Decomposition by Use Case In many respects, this pattern is the logical continuation of a large-scale decomposition pattern, since business capabilities and subdomains are both fundamentally use case-based. In this pattern, you first identify use cases: sequences of actions which a user would typically follow in order to perform a task. Note that a user (or actor) does not need to be a person; it can, in fact, be another part of the same application. A use case could be something as obvious and common as filling out an online form or retrieving and displaying a database record. It could also include tasks such as processing and saving streaming data from a real-time input device, or polling multiple devices to synchronize data. If it seems fairly natural to model a process as a unified set of interactions between actors with an identifiable purpose, it is probably a good candidate for the use case decomposition pattern. Decomposition by Resources In this pattern, you define microservices based on the resources (storage, peripherals, databases, etc.) that they access or control. This allows you to create a set of microservices which function as channels for access to individual resources (following the basic pattern of OS-based peripheral/resource drivers), so that resource-access code does not need to be duplicated in other parts of the application. Isolating resource interfaces in specific microservices has the added advantage of allowing you to accommodate changes to a resource by updating only the microservice that accesses it directly. Decomposition by Responsibilities/Functions This pattern is likely to be most useful in the case of internal operations which perform a clearly defined set of functions that are likely to be shared by more than one part of the application. Such responsibility domains might include shopping cart checkout, inventory access, or credit authorization. Other microservices could be defined in terms of relatively simple functions (as is the case with many built-in OS-based microservices) rather than more complex domains. Microservices Architecture Deployment Patterns Beyond decomposition, there are other patterns of considerable importance in building a microservices-based architecture. Among the key patterns are those for deployment. There are three underlying patterns for microservices deployment, along with a few variations: Single Host/Multiple Services In this pattern, you deploy multiple instances of a service on a single host. This reduces deployment overhead, and allows greater efficiency through the use of shared resources. It has, however, greater potential for conflict, and security problems, since services interacting with different clients may be insufficiently isolated from each other. Single Service per Host, Virtual Machine, or Container This pattern deploys each service in its own environment. Typically, this environment will be a virtual machine (VM) or container, although there are times when the host may be defined at a less abstract level. This kind of deployment provides a high degree of flexibility, with little potential for conflict over system resources. Services are either entirely isolated from those used by other clients (as is the case with single-service-per-VM deployment), or can be effectively isolated while sharing some lower-level system resources (i.e., containers with appropriate security features). Deployment overhead may be greater than in the single host/multiple services model, but in practice, this may not represent significant cost in time or resources. Serverless/Abstracted Platform In this pattern, the service runs directly on pre-configured infrastructure made available as a service (which may be priced on a per-request basis); deployment may consist of little more than uploading the code, with a small number of configuration settings on your part. The deployment system places the code in a container or VM, which it manages. All you need to make use of the microservice is its address. Among the most common serverless environments are AWS Lambda, Azure Functions, and Google Cloud Functions. Serverless deployment requires very little overhead. It does, however, impose significant limitations, since the uploaded code must be able to meet the (often strict) requirements of the underlying infrastructure. This means that you may have a limited selection of programming languages and interfaces to outside resources. Serverless deployment also typically rules out stateful services. Applying Other Patterns to Microservices Architecture There are a variety of other patterns which apply to one degree or another to microservices deployment. These include patterns for communicating with external applications and services, for managing data, for logging, for testing, and for security. In many cases, these patterns are similar for both monolithic and microservices architecture, although some patterns are more likely to be applicable to microservices than others. Fully automated parallel testing in a virtualized environment, for example, is typically the most appropriate pattern for testing VM/container-based microservices. As is so often the case in software development (as well as more traditional forms of engineering), the key to building a successful microservices architecture lies in finding the patterns that are most suitable to your application, understanding how they work, and adapting them to the particular circumstances of your deployment. Use of the appropriate patterns can provide you with a clear and accurate roadmap to successful microservices architecture refactoring and deployment. About the Author Michael Churchman is involved in the analysis of software development processes and related engineering management issues. Top Patterns for Building a Successful Microservices Architecture is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Getting Started with Graphite Monitoring

Graphite is a complete monitoring tool for both simple and complex environments. It can be used for the monitoring of various networked systems—including sites, apps, services and servers in local and cloud environments. This range of possibilities serves companies of diverse segments and sizes. In this post all explain how graphite monitoring can help you get greater visibility into your application and infrastructure. A Quick Graphite Monitoring FAQ Graphite can support a company that has a specialized infrastructure, with a dedicated team and complex environments. It can also work well for small companies that have smaller teams and equipment. When it comes to choosing a monitoring tool, an admin usually asks several questions, such as: How long does it take to deploy? Does this tool address our issues? Will my team be able to work with and understand the essence of the tool? What reports are available? An administrator needs to ask these questions so that the choice of tool is as accurate as possible. Below are the answers to these questions for Graphite. How long does Graphite take to set up? Setting up and deploying Graphite is simple thanks to an installation script called Synthesize. And extensive technical documentation on Graphite makes it possible to gather information on the tool quickly. Essentially, Synthesize is a script that allows the installation and configuration of a range of components to automate the configuration of Graphite. Does this tool work in my environment? Graphite can support almost any type of environment you need to run it in. It works in the cloud. It runs on hybrid infrastructure. It works with on-premises servers. And it can run efficiently no matter what the size of your environment. Will my team be able to work with and understand the essence of the tool? If your team is able to read and interpret, they will understand the essence of the tool.As previously mentioned, Graphite has thorough documentation with step-by-step instructions, and scripts available to change according to your needs. What reports are available? Graphite reports are inclusive, well-crafted and easy to manipulate. This is useful because reports are often used by people who are not part of the technology team, and they need to be able to understand the information quickly. The reports will be used most often to justify requests for purchases of new equipment, as well as hardware upgrades, and for performance measurement. How Graphite Monitoring Works Now that we’ve discussed what Graphite is and where you can use it, let’s take a look at how it works. Graphite is composed of the following items: Carbon A service that will be installed on the client computer and will listen for the TCP/UDP packets to be sent over the network. Whisper The database that will store data collected from the machines. Graphite Webapp Django web app for graphics generation. To provide a sense of Graphite in action, I’ll next provide an overview of how it works on different Linux distributions. I’ll discuss my experience installing and configuring Graphite on both Ubuntu and CentOS, so that I cover both sides (the Ubuntu/Debian and CentOS/Red Hat of the Linux universe). Installing Graphite on Ubuntu Using Synthesize I installed Graphite on an Ubuntu server hosted in the cloud. Everything went smoothly, but here are a couple of special tweaks I had to perform: After logging into the cloud environment and installing Ubuntu 14.04 to test the Synthesize script, it was necessary for the cloud platform to release a port on the firewall so that the data could be sent to the dashboard (which should be done in any infrastructure, be it cloud or local). I had to be careful to release only the application port and not the full range. After that, I used a data collection dashboard recommended by Graphite, called Grafana. However, you can also stream Graphite data directly into Sumo Logic. Manual Installation of Graphite on CentOS Now let’s try a manual installation of Graphite, without using Synthesize. In a manual installation, we have to be careful about the dependencies that the operating system requires to run smoothly. To make the job a bit more complex for this example, I decided to use CentOS, and I followed the steps below. The first requirement for every operating system is to upgrade (if you haven’t already) so that it does not cause dependency problems later. sudo yum -y update Install the dependencies: sudo yum -y install httpd gcc gcc-c++ git pycairo mod_wsgi epel-release python-pip python-devel blas-devel lapack-devel libffi-devel Access the local folder to download the sources: cd /usr/local/src Clone the source of Carbon: sudo git clone https://github.com/graphite-project/carbon.git Access the Carbon folder: cd /usr/local/src/carbon/ Install Carbon: sudo python setup.py install Clone the source of Graphite Web: sudo git clone https://github.com/graphite-project/graphite-web.git Access the Graphite Web folder: cd /usr/local/src/graphite-web/ Install Graphite Web: sudo pip install -r /usr/local/src/graphite-web/requirements.txt sudo python setup.py install Copy the Carbon configuration file: sudo cp /opt/graphite/conf/carbon.conf.example /opt/graphite/conf/carbon.conf Copy the storage schemas configuration file: sudo cp /opt/graphite/conf/storage-schemas.conf.example /opt/graphite/conf/storage-schemas.conf Copy the storage aggregation configuration file: sudo cp /opt/graphite/conf/storage-aggregation.conf.example /opt/graphite/conf/storage-aggregation.conf Copy the relay rules configuration file: sudo cp /opt/graphite/conf/relay-rules.conf.example /opt/graphite/conf/relay-rules.conf Copy the local settings file: sudo cp /opt/graphite/webapp/graphite/local_settings.py.example /opt/graphite/webapp/graphite/local_settings.py Copy the Graphite WSGI file: sudo cp /opt/graphite/conf/graphite.wsgi.example /opt/graphite/conf/graphite.wsgi Copy the virtual hosts file: sudo cp /opt/graphite/examples/example-graphite-vhost.conf /etc/httpd/conf.d/graphite.conf Copy the initi files to /etc/init.d: sudo cp /usr/local/src/carbon/distro/redhat/init.d/carbon-* /etc/init.d/ Give permission to execute the init files: sudo chmod +x /etc/init.d/carbon-* Start the Carbon cache: sudo systemctl start carbon-cache Enable httpd: sudo systemctl enable httpd Start http: sudo systemctl start httpd With this configuration, we can access the Graphite web interface in https://localhost:8080, and monitor the local server running CentOS. Graphite and Sumo Logic Now that you have Graphite running, you can stream Graphite-formatted metrics directly into Sumo Logic. All you need do is set up an installed collector and connect it to a metrics source. This webinar walks you through the steps: Sources are the environments that Sumo Logic Collectors connect to to collect data from your site. Each Source is configured to collect files in a specific way, depending on the type of Collector you’re using. The Setup Wizard in Sumo Logic walks you through the process. You’ll find Linux, Mac OS and Windows instructions for installing a new Graphite collector here. Part of this process defines how your data will be tagged for _sourceCategory, the protocol and port you’ll use to stream data. Next, simply configure a Graphite source for the collector to connect to. Here are the steps for configuring your Graphite source. That’s it. Now you can use Sumo Logic’s advanced analytics to search and visualize data streaming from your application and infrastructure. Enjoy! About the Author Brena Monteiro is a software engineer with experience in the analysis and development of systems. She is a free software enthusiast and an apprentice of new technologies. Getting Started with Graphite Monitoring is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

CloudFormation and Sumo Logic - Build Monitoring into your Stack

Curious about Infrastructure as Code (IaC)? Whether you're new to AWS CloudFormation, or you control all of your cloud infrastructure through CloudFormation templates, this post demonstrates how to integrate Sumo Logic's monitoring platform into an AWS CloudFormation stack. Collect Logs and Metrics from your Stack Sumo Logic's ability to Unify your Logs and Metrics can be built into your CloudFormation Templates. Collect operating system logs, web server logs, application logs, and other logs from an EC2 instance. Additionally, Host Metrics, AWS CloudWatch Metrics, and Graphite formatted metrics can be collected and analyzed. With CloudFormation and Sumo Logic, you can achieve version control of your AWS infrastructure and your monitoring platform the same way you version and improve your software. CloudFormation Wordpress Stack with Sumo Logic Built-In Building off of the resources Adrian Cantrill provided in his Advanced CloudFormation course via A Cloud Guru, we will launch a test Wordpress stack with the following components: Linux EC2 instance - you choose the size! RDS instance - again, with a configurable size S3 bucket The Linux EC2 instance is bootstrapped with the following to create a LAMP stack: Apache MySQL PHP MySQL-PHP Libraries We also install Wordpress, and the latest version of the Sumo Logic Linux collector agent. Using the cfn-init script in our template, we rely on the file key of AWS::CloudFormation::Init metadata to install a sources.json file on the instance. This file instructs Sumo Logic to collect various types of logs and metrics from the EC2 instance: Linux OS Logs (Audit logs, Messages logs, Secure logs) Host Metrics (CPU, Memory, TCP, Network, Disk) Apache Access logs cfn-init logs Tutorial - Launch a CloudFormation Stack and Monitor Logs and Metrics Instantly First, you'll need a few things: A Sumo Logic account - Get a free one Here Access to an AWS account - If you don't have access you can sign up for the free tier here A local EC2 Key Pair - if you don't have one you can create one like this After you have access to your Sumo Logic account and an AWS account, navigate to an unused Region if you have one. This will give you a more isolated sandbox to test in so that we can more clearly see what our CloudFormation template creates. Make sure you have an EC2 key pair in that Region, you'll need to add this to the template. *Leveraging pseudo parameters, the template is portable, meaning it can be launched in any Region. First, log into AWS and navigate to CloudFormation. Choose 'Create New Stack' Then, download the example CloudFormation template from GitHub here Next, on line 87, in the EC2 Resources section, make sure to edit the value of the "KeyName" field to whatever your EC2 key is named for your current Region *Make sure the Region you choose to launch the stack in has an EC2 Key Pair, and that you update line 87 with your key's name. If you forget to do this your stack will fail to launch! Select 'Choose File' and upload the template you just downloaded and edited, then click Next Title your stack Log into Sumo Logic. and in the top-right click on your email username, then preferences, then '+' to create a Sumo Logic Access key pair Enter the Sumo Logic key pair into the stack details page. You can also select an EC2 and RDS instance size, and enter a test string that we can navigate to later when checking that we can communicate with the instance. Click 'Next', name/tag your stack if you'd like, then click 'Next' again, the select 'Create' to launch your stack! Now What? View Streaming Logs and Metrics! You've now launched your stack. In about 10-15 minutes, we can visit our Wordpress server to verify everything is working. We can also search our Apache logs and see any visitors (probably just us) that are interacting with the instance. Follow these steps to explore your new stack, and your Sumo Logic analytics: View the CloudFormation Events log. You should see four CREATE_COMPLETE statuses like so: Check your Sumo Logic account to see the collector and sources that have been automatically provisioned for you: What's Next? Sumo Logic collects AWS CloudWatch metrics, S3 Audit logs, and much more. Below is more information on the integrations for AWS RDS Metrics and also S3 Audit Logs: Amazon RDS Metrics Amazon S3 Audit Explore your logs! Try visiting your web server by navigating to your EC2 instance's public IP address This template uses the default security group of your Region's VPC, so you'll need to temporarily allow inbound HTTP traffic from either your IP, or anywhere (your IP is recommended) To do this, navigate to the EC2 console and select the Linux machine launched via the CloudFormation Template Then, scroll down to the Security Group and click 'default' as shown below Edit the inbound rules to allow HTTP traffic in, either from your IP or anywhere After you've allowed inbound HTTP traffic, navigate in your browser to <your-public-ip>/wordpress (something like 54.149.214.198/wordpress) and you'll see you're new Wordpress front end: You can also test the string we entered during setup by navigating to <your-public-ip>/index2.html Search you Sumo Logic account with _sourceCategory=test/apache and view your visits to your new Wordpress web server in the logs Finally, check out the metrics on your instance by installing the Host Metrics App: Cleanup Make sure to delete you stack as shown below, and to remove inbound HTTP rules on your default Security Group. If you have any questions or comments, please reach out via my LinkedIn profile, or via our Sumo Logic public Slack Channel: slack.sumologic.com (@grahamwatts-sumologic). Thanks for reading!

Blog

ELK Stack vs. Sumo Logic: Building or Buying Value?

Blog

The Great Big Wall and Security Analytics

Not long ago I was visiting the CISO of a large agriculture biotechnology company in the Midwest – we’ll call him Ron – and he said to me “Mark these cyber terrorists are everywhere, trying to hack into our systems from Russia and China, trying to steal our intellectual property. We have the biggest and the brightest people and the most advanced systems working on it, but they are still getting through. We are really challenged in our ability to identify and resolve these cyber threats in a timely manner. Can you help us?” Business issues that CISOs and their security teams face are significant. Customers are now making different decisions based on the trust factors they have with the companies they do business with. So implementing the right levels of controls, increasing team efficiency, to rapidly identify and resolve security incidents becomes of paramount importance. But despite this big wall that Ron has built, and the SIEM technology they are currently using, threats are still permeating the infrastructure, trying to compromise their applications and data. With over 35 security technologies in play, trying to get holistic visibility was a challenge, and with a small team, managing their SIEM was onerous. Additionally, the hardware and refresh cycles over the years, as their business has grown, has been challenged by flat budget allocations. “Do more with less” was frequently what they heard back from the CIO. Like any company that wants to be relevant in this modern age, they are moving workloads to the cloud, adopting DevOps methodologies to increase the speed of application delivery, creating new and disruptive experiences for their customers, to maintain their competitive edge. But as workloads were moved to the cloud – they chose AWS- the way things were done in the past were no longer going to work. The approach to security needed to change. And it was questionable if the SIEM solution they were using was even going to run in the cloud and support native AWS services, as scale. SIEMs are technologies that were architected over 15 year ago, and they were really designed to solve a different kind of problem – traditional on prem, perimeter based, mode 1 type security applications, going after known security threats. But as organizations are starting to move to the cloud, accelerating the pace at which they roll our new code, adopting DevOps methodologies, they need something different. Something that aligns to the Mode 2 digital initiatives of modern applications. Something that is cloud native, provides elasticity on demand, and delivers rapid time to value, not constrained by fixed rule sets going after known threats but instead, leveraging machine learning algorithms to uncover anomalies, deviations and unknown threats in the environment. And lastly, something that integrates threat intelligence OOTB to increase velocity and accuracy of threat detection – so you can get a handle on threats coming at your environment trying to compromise your applications and data. Is that great big wall working for you? Likely not. To learn more about Sumo Logic’s Security Analytics capabilities, please checkout our press release, blog or landing page. Mark Bloom can be reached at https://www.linkedin.com/in/markbloom or @bloom_mark

Blog

Ever wondered how many AWS workloads run on Docker?

Blog

Provide Real-Time Insights To Users Without A Sumo Logic Account

You just finished building some beautiful, real-time Sumo Logic dashboards to monitor your infrastructure and application performance and now you want to show them off to your colleagues. But your boss doesn’t have a Sumo Logic account and your ops team wants this information on TVs around the office. Sound like a familiar situation? We’ve got you covered. You can now share your live dashboards in view-only mode with no login required, all while maintaining the security and transparency that your organization requires. We’ll even kick things off with a live dashboard of our own. Share Information with Colleagues and Customers This new feature enables you to share a dashboard so that anyone with the URL can view your dashboard without logging in. It reduces the friction for sharing information even further so that the right people have the right information when they need it. For example: Colleagues: Share operational and business KPIs with colleagues or executives who do not have a Sumo Logic account. Internal TVs: Display real-time information about your infrastructure and application on monitors throughout your building. Customers: Provide SLA performance or other statistics to your customers. Granular Permissions for Administrators Sharing your sensitive information to users without a login is a serious matter. With great power comes great responsibility, and no matter how much you trust your colleagues that use Sumo Logic, you may not want this power being wielded by all of your team members. If you are an administrator, you can decide which users have this permission and educate them on best practices for sharing information within and outside of your organization. By default, this capability is turned off and can only be enabled by administrators on the account. Protect Dashboard URLs with an IP / CIDR Whitelist For those who want even more protection over who can view these dashboards without logging in, you can restrict viewers to only those accessing it from specific IP addresses or CIDRs. This works great when you are placing live dashboards on TVs throughout your building and you want to make sure that this information stays in your building. Similarly, you might want to help your internal ops team troubleshoot a problem quickly without logging in. Send them the URL via email or Slack, for example, and rest assured that the information will remain in the right hands. If you decide to remove an IP address from your whitelist, any users connecting from that IP will no longer be able to view that dashboard. Complete Visibility through Audit Logs As an extra layer of transparency, you can keep track of which dashboards are shared outside of your organization and see which IPs are viewing them through your audit logs. With this information, you can: Configure real-time alerts to get notified anytime a user shares a dashboard Generate daily or weekly reports with a list of users and their shared dashboards Create dashboards of your shared dashboards – see where your dashboards are being viewed from so you can follow up on any suspicious activity. Receive alerts when someone shares a dashboard outside of your organization Use audit logs to see where your dashboards are being viewed from Learn More So go ahead – earn those bonus points with your boss and show off your dashboards today! Check out this webinar for a refresher on creating dashboards, then head over to Sumo Logic DocHub for more information on sharing these to users without an account.

March 10, 2017

Blog

Sumo Logic launches Multi-Factor Authentication for its Platform

The biggest risk to any organization is the end user and their password hygiene. It is an age old problem as users want to keep things easy using the same dictionary based password for all applications! This problem will continue to exist until we change end user behavior via policy enforcement and apply other layers of protection such as Single Sign On and Multi Factor Authentication (MFA). Because of this, MFA is becoming more of a must than a nice to have as companies starting to adopt a healthier security posture/program. In fact, MFA has become a full blown requirement to achieve critical compliance certifications that would provide your company with a better security reputation and demonstrate evidence of data protection. As a Cloud Security Engineer, I would love for MFA to be adopted across the board, which is part of the reason we are writing this blog, to provide our insights into the importance of implementing MFA across an enterprise. As some of you may have recently heard, Sumo Logic is now PCI 3.2 DSS compliant, which we could not have achieved without the diligence of our DevSecOps team putting some cycles together to get Multi-Factor Authentication delivered to the Sumo Logic base via the platform for another layer of password defense. When logging into the Sumo platform, you can now enable the 2-step verification for your entire organization, within the security policies section of Sumo, as seen below. When Multi Factor Authentication is enabled globally for the Org, you will be prompted with the following screen, to configure your MFA. Every login from here on out will now prompt the following screen after completed configuration. What does Multi-factor authentication provide to the end user? A low friction way to keep their credentials from being compromised and make it extremely difficult for attackers to take advantage of weak end user passwords. With the emergence of Cloud Computing, password-based security just won’t cut it anymore. Applying this extra layer of defense to credentials drastically drops the chance of your account ever being compromised. At Sumo Logic, we are glad to extend this extra layer of defense to our customers as they access our Multi Tenant Saas based offering.

Blog

AWS CodePipeline vs. Jenkins CI Server

Blog

OneLogin Integrates with Sumo Logic for Enhanced Visibility and Threat Detection

OneLogin and Sumo Logic are thrilled to announce our new partnership and technology integration (app coming May 2017) between the two companies. We’re alike in many ways: we’re both cloud-first, our customers include both cloud natives and cloud migrators, and we are laser-focused on helping customers implement the best security with the least amount of effort. Today’s integration is a big step forward in making effortless security a reality. What does this integration do? OneLogin’s identity and access management solution allows for the easy enforcement of login policies across all their laptops, both Macs and Windows, SaaS applications, and SAML-enabled desktop applications. This new partnership takes things a step further by making it possible to stream application authentication and access events to over 200 application-related events. This includes over 200 application-related events, including: Who’s logged into which laptops — including stolen laptops Who’s accessed which applications — e.g., a salesperson accessing a finance app Who’s unsuccessfully logged in — indicating a potential attack in progress Who’s recently changed their password — another potential indicator of an attack Which users have lost their multi-factor authentication device — indicating a potential security weakness Which users have been suspended — to confirm that a compromised account is inactive User provision and de-provision activity – to track that users are removed from systems after leaving the company And finally, which applications are the most popular and which might be underutilized, indicating potential areas of budget waste These capabilities are critical for SecOps teams that need to centralize and correlate machine data across all applications. This, in turn, facilitates early detection of targeted attacks and data breaches, extends audit trails to device and application access, and provides a wider range of user activity monitoring. Because OneLogin has over 4000 applications in our app catalog, and automatically discover new applications and add them to its catalog, we can help you extend visibility across a wide range of unsanctioned Shadow IT apps. The integration uses streaming, not polling. This means that events flow from OneLogin into Sumo as soon as they are generated, not after a polling interval. This lets you respond more quickly to attacks in progress. How does the integration work? Since both OneLogin and Sumo Logic are cloud-based, integrating the two is a simple one-screen setup. Once integration is complete, you can use Sumo Logic to query OneLogin events, as well as view the following charts: Visitors heatmap by metro area. Suppose you don’t have any known users in Alaska — that anomaly is quite clear here, and you can investigate further. Logins by country. Suppose you don’t have any known users in China; 80 potentially malicious logins are evident here. Failed logins over time. If this number spikes, it could indicate a hacking attempt. Top users by events. If one user has many events, it could indicate a compromised account that should be deactivated in OneLogin. Events by app. If an app is utilized more than expected, it could indicate anomalous activity, such as large amounts of data downloads by an employee preparing to leave the company. All this visibility helps customers better understand how security threats could have started within their company. This is especially helpful when it comes to phishing attacks, which, according to a recent report by Gartner, are “the most common targeted method of cyberattacks, and even typical, consumer-level phishing attacks can have a significant impact on security.” Summing up: Better Threat Detection and Response Sumo Logic’s vice president of business development, Randy Streu, sums it up well: “Combining OneLogin’s critical access and user behavior data with Sumo Logic’s advanced real-time security analytics solution provides unparalleled visibility and control for both Sumo Logic and OneLogin customers.” This deep and wide visibility into laptop and application access helps SecOps teams uncover weak points within their security infrastructures so that they know exactly how to best secure data across users, applications, and devices. Get started for free Even better, OneLogin and Sumo Logic are each offering free versions of their respective products to each other’s customers to help you get started. The OneLogin for Sumo Logic Plan includes free single sign-on and directory integration, providing customers with secure access to Sumo Logic through SAML SSO and multi-factor authentication while eliminating the need for passwords. Deep visibility. Incredibly simple integration. Free editions. We’re very pleased to offer all this to our customers. Click here to learn more. *The Sumo Logic App for One Login, for out of the box visualizations and dash boarding will be available May 2017* This blog was written by John Offenhartz who is the Lead Product Owner of all of OneLogin’s integration and development programs. John’s previous experiences cover over twenty years in Cloud-based Development and Product Management with such companies as Microsoft, Netscape, Oracle and SAP. John can be reached at https://www.linkedin.com/in/johnoffenhartz

February 17, 2017

Blog

Analyze Azure Network Watcher Flow Logs with Sumo Logic

Azure Network Watcher Azure Network Watcher is a network performance and diagnostic service which enables you to monitor your Azure Network. This service lets you collect “Network Security Group (NSG) Flow Logs”. NSG flows logs have 5-tuple information (source, destination, Traffic Flow, Traffic : Allowed/Denied) about ingress and egress IP traffic that are either blocked or allowed by the NSG, allowing you to troubleshoot traffic and security issues. NSG flow logs can enabled via Portal, PowerShell and CLI, more info here. Why Integrate and Analyze Azure Network Watcher Flow Logs with Sumo Logic ? Using Sumo Logic’s machine learning algorithm and search capabilities, you can monitor your Azure Network and alert on key metrics to rapidly identify problems and security issues. Sumo Logic App for Azure Network Watcher leverages NSG flow logs to provide real-time visibility and analysis of your Azure Network. It provides preconfigured Dashboards that allow you to monitor inbound traffic, outlier in traffic flow, and denied flows. Furthermore, this data can be co-related with other Sumo Logic App for Azure Web Apps and Audit for more contextual information. Also, Sumo Logic Threat Intelligence feed can give you extra layer of security on the top of your flow logs. Sumo Logic App for Azure Network Watcher comes with following preconfigured dashboards: Network Watcher – Overview This Dashboard provides general information of the NSG flow logs, including Panels that drill-down into queries with NIC, tuple and traffic flow information. The Overview Dashboard gives a good starting point for detecting outlier in denied traffic and geographic hotspots for inbound traffic. Dashboard also allows panels to be filtered by rule name, source/destination IP and port, and other metadata fields. Network Watcher – Overview Source Address Location of Inbound Traffic. Displays geolocation of Inbound Traffic Flow Traffic by Rule Name. Shows the breakdown of all traffic by security rule name set up at NSG level. Denied Traffic per Minute. Shows trend in denied inbound traffic flow per minute. Breakdown of Traffic (Allowed or Denied). Displays traffic breakdown by Allowed or Denied flow. Top 10 Destination Ports. Shows top 10 destination ports in last 24 hours. Flow Traffic by Protocol. Displays trend of traffic by its protocol ( TCP/UDP). Denied Traffic per Hour – Outlier. This panel, using Sumo Logic machine learning Outlier operator, shows any unexpected sequence in denied traffic. Denied Traffic Comparison (Today Vs Yesterday) – Outlier. Compares denied traffic of last 24 hours with previous 24 hours and shows any unexpected difference between two time periods. Get Started with Sumo Logic App for Azure Network Watcher For more info on the App – please visit Sumo Logic for Azure Network Watcher. To set up the App, follow Collect Logs for Azure Network Watcher and Install the Azure Network Watcher App section at Azure App page

Blog

New DevOps Site Chronicles the Changing Face of DevOps

Blog

Sumo Logic Delivers Industry's First Multi-Tenant SaaS Security Analytics Solution with Integrated Threat Intelligence

Integrated Threat Intelligence Providing Visibility into Events that Matter to You! You’ve already invested a great deal in your security infrastructure to prevent, detect, and respond to cybersecurity attacks. Yet you may feel as if you’re still constantly putting out fires and are still uncertain about your current cybersecurity posture. You’re looking for ways to be more proactive, more effective, and more strategic about your defenses, without having to “rip and replace” all your existing defense infrastructure. You need the right cyber security intelligence, delivered at the right time, in the right way to help you stop breaches. That is exactly what Sumo Logic's integrated threat intelligence app delivers. Powered by Crowdstrike, Sumo's threat intelligence offering addresses a number requests we were hearing from customers: Help me increase the velocity & accuracy of threat detection. Enable me to correlate Sumo Logic log data with threat intelligence data to identify and visualize malicious IP addresses, domain names, email addresses, URLs and MD5 Hashes. Alert me when there is some penetration or event that maps to a known indicator of compromise (IOC) and tell me where else these IOCs exist in my infrastructure. And above all, make this simple, low friction, and integrated into your platform. And listen we did. Threat intelligence is offered as part of Sumo's Enterprise and Professional Editions, at no extra cost to the customer. Threat Intel Dashboard Supercharge your Threat Defenses: Consume threat intelligence directly into your enterprise systems in real time to increase velocity & accuracy of threat detection. Be Informed, Not Overwhelmed: Real-time visualizations of IOCs in your environment, with searchable queries via an intuitive web interface. Achieve Proactive Security: Know which adversaries may be targeting your assets and organization, thanks to strategic, operational and technical reporting and alerts. We chose to partner with CrowdStrike because they are a leader in cloud-delivered next-generation endpoint protection and adversary analysis. CrowdStrike’s Falcon Intelligence offers security professionals an in-depth and historical understanding of adversaries, their campaigns, and their motivations. CrowdStrike Falcon Intelligence reports provide real-time adversary analysis for effective defense and cybersecurity operations. To learn more about Sumo Logic's Integrated Threat Intelligence Solution, please go to http://www.sumologic.com/application/integrated-threat-intelligence.

AWS

February 6, 2017

Blog

Using Sumo Logic and Trend Micro Deep Security SNS for Event Management

As a principal architect at Trend Micro, focused on AWS, I get all the ‘challenging’ customer projects. Recently a neat use case has popped up with multiple customers and I found it interesting enough to share (hopefully you readers will agree). The original question came as a result of queries about Deep Security’s SIEM output via syslog and how best to do an integration with Sumo Logic. Sumo has a ton of great guidance for getting a local collector installed and syslog piped through, but I was really hoping for something: a little less heavy at install time; a little more encrypted leaving the Deep Security Manager (DSM); and a LOT more centralized. I’d skimmed an article recently about Sumo’s hosted HTTP collector which made me wonder – could I leverage Deep Security’s SNS event forwarding along with Sumo’s hosted collector configuration to get Events from Deep Security -> SNS -> Sumo? With Deep Security SNS events sending well formatted json, could I get natural language query in Sumo Logic search without defining fields or parsing text? This would be a pretty short post if the answers were no… so let’s see how it’s done. Step 1: Create an AWS IAM account This account will be allowed to submit to the SNS topic (but have no other rights or role assigned in AWS). NOTE: Grab the access and secret keys during creation as you’ll need to provide to Deep Security (DSM) later. You’ll also need the ARN of the user to give to the SNS Topic. (I’m going to guess everyone who got past the first paragraph without falling into an acronym coma has seen the IAM console so I’ll omit the usual screenshots.) Step 2: Create the Sumo Logic Hosted HTTP Collector. Go to Manage-> Collection then “Add Collector”. Choose a Hosted Collector and pick some descriptive labels. NOTE: Make note of the Category for later Pick some useful labels again, and make note of the Source Category for the Collector (or DataSource if you choose to override the collector value). We’ll need that in a little while. Tip When configuring the DataSource, most defaults are fine except for one: Enable Multiline Processing in default configuration will split each key:value from the SNS subscription into its own message. We’ll want to keep those together for parsing later, so have the DataSource use a boundary expression to detect message beginning and end, using this string (without the quotes) for the expression: (\{)(\}) Then grab the URL provided by the Sumo console for this collector, which we’ll plug into the SNS subscription shortly. Step 3: Create the SNS topic. Give it a name and grab the Topic ARN Personally I like to put some sanity around who can submit to the topic. Hit “Other Topic Actions” then “Edit topic policy”, and enter the ARN we captured for the new users above as the only AWS user allowed to publish messages to the topic. Step 4: Create the subscription for the HTTP collector. Select type HTTPS for the protocol, and enter the endpoint shown by the Sumo Console. Step 5: Go to search page in the Sumo Console and check for events from our new _sourceCategory: And click the URL in the “SubscribeURL” field to confirm the subscription. Step 6: Configure the Deep Security Manager to send events to the topic Now that we’ve got Sumo configured to accept messages from our SNS topic, the last step will be to configure the Deep Security Manager to send events to the topic. Log in to your Deep Security console and head to Administration -> System Settings -> Event Forwarding. Check the box for “Publish Events to Amazon Simple Notification Service and enter the Access and Secret key for the user we created with permission to submit to the topic then paste in the topic ARN and save. You’ll find quickly that we have a whole ton of data from SNS in each message that we really don’t need associated with our Deep Security events. So let’s put together a base query that will get us the Deep Security event fields directly accessible from our search box: _sourceCategory=Deep_Security_Events | parse “*” as jsonobject | json field=jsonobject “Message” as DSM_Log | json auto field=DSM_Log Much better. Thanks to Sumo Logic’s auto json parsing, we’ll now have access to directly filter any field included in a Deep Security event. Let your event management begin! Ping us on if you have any feedback or questions on this blog… And let us know what kind of dashboards your ops & secops teams are using this for! A big thanks to Saif Chaudhry, Principle Architect at Trend Micro who wrote this blog.

February 6, 2017

Blog

ECS Container Monitoring with CloudWatch and Sumo Logic

Blog

How to Analyze NGINX Logs with Sumo Logic

Blog

Chief Architect Stefan Zier on Tips for Optimizing AWS S3

Blog

Overview of AWS Lambda Monitoring

AWS Lambda usage is exploding, and at last week’s re:invent conference, it was one of the key priorities for AWS. Lambda simplifies infrastructure by removing host management, and giving you the advantage of paying for just the compute and storage you use. This means it’s important to monitor your usage carefully to ensure you are managing your spend well. This post outlines the various options and approaches to AWS Lambda monitoring so you can make an informed decision about how you monitor your Lambda applications. AWS Lambda Monitoring Basics AWS provides vital stats for Lambda in a couple of ways. You can view metrics from your Lambda console, the CloudWatch console, via the Lambda or CloudWatch command line, or even through the CloudWatch API. The Lambda console gives you an overview of just the vital stats of your Lambda app. For a more detailed view, you can click through to the CloudWatch console. Here are the key terms you’ll notice in your CloudWatch console: Invocations: Invocations are the number of times a Lambda function is triggered in response to an event or API call. Errors: The number of failed invocations. Duration times: The time it takes to execute a Lambda function. This is measured in milliseconds. Throttles: The number of invocation attempts that were not executed because they exceed concurrency limits. Dead letter errors: Asynchronous errors are sent to a dead letter queue for further troubleshooting. If these errors are not written to a DLQ, they are counted as dead letter errors. Within CloudWatch you can set alarms to notify you of issues, or identify unused resources. You can also view all logs generated by your code in CloudWatch Logs. Archiving these logs will incur additional cost, but you can decide how far back you’d like to store log data. Troubleshooting Common Errors in Lambda With AWS lambda, you don’t have underlying infrastructure to monitor. This means that most of the errors can be resolved by troubleshooting your code. Here are the common errors that occur with Lambda: IAM roles & permissions: For your code to access other AWS services, you need to configure IAM roles correctly. If not done, this could result in a permissions denied error. Timeout exceeded: Some functions can take longer to execute than others, and will need a longer timeout setting. Memory exceeded: Some jobs like database operations require less memory. However, jobs that involve large files, like images, for example, will need more memory. You will need to adjust the MemorySize setting in your code if you see this error. Advanced Lambda Monitoring As you start out using Lambda for part of your app, you can get by with AWS’ default monitoring options. Once you start expanding Lambda usage in your applications, and even run entire apps in Lambda, you’ll need the power of a more robust monitoring tool. The Sumo Logic App for AWS Lambda is great for monitoring your Lambda functions and gaining deeper visibility into performance and usage. Here are benefits: Track compute & memory usage: The Sumo Logic app tracks compute performance of individual Lambda functions and lets you drill down to the details. This is important because you configure resources in your code, and you need visibility at the individual function level to ensure all processes have adequate resources. Cost correlation: The Sumo Logic app translates granular performance data to actual billed costs. It lets you track usage of memory, and even excess memory in functions, and prevents overspending. It enables you to split expense across multiple teams if you need to. Based on past behavior, the app can even predict future performance and spend. Integrated reporting: If you already use Sumo Logic to monitor and manage the rest of your infrastructure stack, you can bring it all in one place, and have realistic benchmarks for your Lambda usage. Real-time stats: The data in your Sumo Logic dashboards streams in real-time from AWS, so you never have to worry about seeing outdated information. Advanced visualizations: Sumo Logic lets you slice and dice your data with advanced analytics tools. The dashboards contain various chart types, including advanced types like the box plot chart. This kind of analysis is simply not possible within AWS. As you scale your usage of Lambda, you need deep monitoring that can correlate compute usage with costs. You’re used to having control and visibility over your server-driven apps using traditional monitoring tools. With Lambda, the approach to monitoring has changed, but your goals are the same—to gain visibility and efficiency into your app and the abstracted Lambda infrastructure. You need the right tools to enable this kind of monitoring. About the Author Twain began his career at Google, where, among other things, he was involved in technical support for the AdWords team. Today, as a technology journalist he helps IT magazines, and startups change the way teams build and ship applications. Overview of AWS Lambda Monitoring is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Overview of MongoDB Performance Monitoring

Monitoring components of a system, especially for performance, is the best (and really the only) way to ensure a consistent user experience and that all service levels are being achieved. System monitoring is often left to operations to figure out after the system is built and “production ready.” This article will highlight the main areas to ensure they are monitored in MongoDB. What Is Performance Monitoring? Performance monitoring is combining all the various other types of monitoring of an application, or component, to have an overall view of system performance. Good performance monitoring also provides the ability to compare current performance levels to what has been set in service level agreements and trends. The utopia of performance monitoring is to be predictive and proactive. Find changes in performance, tell the appropriate group about it, and resolve it before it ever impacts a client. Critical Metrics to Monitor in MongoDB If this was a web or mobile application, then you’d probably be thinking response time, and you would be correct. Except in MongoDB, the only response it tracks easily is how long it is taking to write out to its peers in its replication agreements. This is a valid metric since every user transaction gets written to at least one peer. Read times aren’t tracked as their own metric inside the platform. The other critical data point to monitor for performance is disk I/O. MongoDB, like any database platform, relies heavily on the speed it can read and write to the disk. If the disk I/O gets to 100% that means everything is now waiting on disk access and the entire system will slow to a crawl. Extremely Useful MongoDB Metrics Knowing the usage pattern in your database is critical to trending performance. So tracking how many reads (selects) and writes (insert, update, delete) are being performed at a regular interval will definitely help so you can tune (and size) the system accordingly. In conjunction with tracking the actual reads and writes, MongoDB also exposes the number of clients actively doing reads and the number doing writes. Combining these two metrics will allow better cache tuning and can even decide when adding more replicas would make sense. The last extremely useful metric for performance monitoring in MongoDB, in my opinion, is capturing slow queries. This is very useful for developers to find missing indexes, and tracking the number will find ways that clients are using the system that weren’t originally envisioned. I’ve seen unexpected client behavior in the past. For instance, once a user base figured out they could do wildcard searches, they stopped typing an exact 12-digit number and just searched on the first digit followed by a %. An index brought performance back in line, but it was not expected. (Remember that people are like water. They will find the fastest way to do things and follow that route for good or bad.) Additional MongoDB Metrics to Monitor Most of the metrics in this section are solid and valuable, but are too coarse-grained to be of the same value as the metrics in the above sections. The following are good overall metrics of system health, and will definitely help in scaling the database system up and down to meet the needs of your applications. MongoDB has internal read and write queues that can be watched, and will only really be used when MongoDB can’t keep up with the number of requests that are incoming. If these are used often, then you will probably need to look into adding capacity to your MongoDB deployment. Another great metric to trend is the number of client connections, and available connections. Then, of course, there are always metrics at the machine level that are important to watch on all of your nodes. These include memory, CPU, and network performance. Monitoring Tools to Use More information on the tools that ship with MongoDB for monitoring are available at mongoDB. And there are operational support platforms, like Sumo Logic, which provide a much more visual and user-friendly way to review these metrics. Sumo Logic also has the Sumo Logic App for MongoDB that includes pre-built queries and dashboards allowing you to track overall system health, queries, logins and connections, errors and warnings, replication, and sharding. You can learn more about the app from the MongoDB documentation. If you don’t already have a Sumo Logic account, sign up for the Free Trial and take the app for a spin. About the Author Vince Power is a Solution Architect who has a focus on cloud adoption and technology implementations using open source-based technologies. He has extensive experience with core computing and networking (IaaS), identity and access management (IAM), application platforms (PaaS), and continuous delivery. Overview of MongoDB Performance Monitoring is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

AWS Well Architected Framework - Security Pillar

Blog

CISO Manifesto: 10 Rules for Vendors

This CISO blog post was contributed by Gary Hayslip, Deputy Director, Chief Information Security Officer (CISO) for the City of San Diego, Calif., and Co-Author of the book CISO Desk Reference Guide: A Practical Guide for CISOs As businesses today focus on the new opportunities cybersecurity programs provide them, CISOs like myself have to learn job roles they were not responsible for five years ago. These challenging roles and their required skill sets I believe demonstrate that the position of CISO is maturing. This role not only requires a strong technology background, good management skills, and the ability to mentor and lead teams; it now requires soft skills such as business acumen, risk management, innovative thinking, creating human networks, and building cross-organizational relationships. To be effective in this role, I believe the CISO must be able to define their “Vision” of cybersecurity to their organization. They must be able to explain the business value of that “Vision” and secure leadership support to execute and engage the business in implementing this “Vision.” So how does this relate to the subject of my manifesto? I am glad you asked. The reason I provided some background is because for us CISOs, a large portion of our time is spent working with third-party vendors to fix issues. We need these vendors to help us build our security programs, to implement innovative solutions for new services, or to just help us manage risk across sprawling network infrastructures. The truth of the matter is, organizations are looking to their CISO to help solve the hard technology and risk problems they face; this requires CISOs to look at technologies, workflows, new processes, and collaborative projects with peers to reduce risk and protect their enterprise assets. Of course, this isn’t easy to say the least, one of the hardest issues I believe CISOs face is time and again when they speak with their technology provider, the vendor truly doesn’t understand how the CISO does their job. The vendor doesn’t understand how the CISO views technology or really what the CISO is looking for in a solution. To provide some insight, I decided I would list ten rules that I hope technology providers will take to heart and just possibly make it better for all of us in the cyber security community. Now with these rules in mind, let’s get started. I will first start with several issues that really turn me off when I speak with a technology provider. I will end with some recommendation to help vendors understand what CISOs are thinking when they look at their technology. So here we go, let’s have some fun. Top Ten Rules for Technology Providers “Don’t pitch your competition” – I hate it when a vendor knows I have looked at some of their competitors, and then they spend their time telling me how bad the competition is and how much better they are. Honestly I don’t care, I contacted you to see how your technology works and if it fits for the issue I am trying to resolve. If you spend all of your time talking down about another vendor, that tells me you are more concerned about your competitor than my requirements. Maybe I called the wrong company for a demonstration. “Don’t tell me you solve 100% of ANY problem” – For vendors that like to make grand statements, don’t tell me that you do 100% of anything. The old adage “100% everything is 0% of anything.” In today’s threat environment, the only thing I believe that is 100% is eventually that I will have a breach. The rest is all B.S. so don’t waste my time saying you do 100% coverage, or 100% remediation, or 100% capturing of malware traffic. I don’t know of a single CISO that believes that anyone does 100% of anything so don’t waste your time trying to sell that to me.

Blog

AWS Best Practices - How to Achieve a Well Architected Framework

Blog

A Toddler’s Guide to Data Analytics

Any parent of a two-year old appreciates the power of speaking a common language. There is nothing more frustrating to my two-year old son than his inability to communicate what he wants. Learning to say things like “milk” and “applesauce” has transformed the breakfast experience. On the other hand, learning to say “trash” means that any object in reach is in danger of being tossed in the trash bin for the glory of saying “twash” over and over again. I see this in my world world as well. Millions of dollars are spent in the software world translating from one “language” to another. In fact, a whole industry has popped up around shared Application Program Interfaces (APIs) to standardize how systems communicate. Despite this trend, there seems to be more emphasis on the “communication” rather than the “data” itself. Data Analytics products in particular seem happy to shove all types of data into one mold, since the output is the same. That is where we at Sumo Logic decided to take the road less travelled. We believe in the idea that data needs to be treated “natively” – in other words, we don’t want to shove a square data peg in a round systems hole. Just like speaking a language natively changes the experience of travel – speaking the native language of data transforms the analytics experience. When we decided to build our Unified Logs and Metrics (ULM) product, we also decided that it was essential that we become bi-lingual – speaking the native language of both raw logs and time-series metrics. And here is why it matters – according to my Toddler. Answer my Question – Quickly Toddlers are well acquainted with the frustration of not being understood. Everything takes too long, and they need it now. And you know what. I get it. I have had the occasion of speaking to people that don’t speak my language before, and it is hard. I once spent 15 minutes in a pharmacy in Geneva trying to order anti-bacterial cream. There was a lot of waving of hands and short, obvious words (Arm. Cut. Ow!). We would have faced the same obstacles if we used a log system to store metrics. Every query needs to be translated, from one language to another, and it takes forever. At end of the day, you can try optimize a log system – built to search for needles in haystacks – to perform the equivalent of speed reading, but eventually the laws of physics intervene. You can only make it so fast. It takes too long – and like my toddler, you will just stop asking the question. What’s the use in that? Cleaning up is Hard I am always amazed with how my two-year old can turn a nice stack of puzzles or a bucket of toys into a room-sized disaster zone – it is the same components, but vastly different results. Storage optimization is essential in the world of operational data. There is a natural assumption underneath a true log-analytics system. We assume on some level that each log is a special snowflake. There is, of course, a lot of repetition, but the key is to be flexible and optimize for finding key terms very quickly. Metrics, on the hand, are repetitive by design. Every record of a measurement is the same – except for the measurement itself. Once you know you are collecting something – say system CPU performance on some server – you don’t need to capture that reference every time. You can optimize heavily for storing and retrieving long lists of numbers. Storing time series metrics as logs, or events, is extremely wasteful. You can incur any where from 3x to 10x more storage costs – and that is without the same performance. To achieve the same performance as most metrics system can reach, you are looking at 10-20x in storage costs. This, of course, is the reason why no log-analytics companies are really used for performance metrics at scale – the immense costs involved just don’t justify the benefit of tool reduction. I Want to Play with my Cars Anywhere One of the funniest things my son does is how he plays with his toy cars. He has race tracks, roads, and other appropriate surfaces. He rarely uses them. He prefers to race his cars on tables, up walls, and on daddy’s leg. The flexibility of having wheels is essential. He has other “cars” that don’t roll – he doesn’t really play with them. It is the same core truth with data analytics. Once you have high performance with cost effective storage – uses just present themselves. Now you can perform complex analytics without fear of slowing down the system to a crawl. You can compare performance over months and years, rather than minutes and hours – because storage is so much cheaper. Innovative use cases will always fill up the new space created by platform enhancements – just as restricted platforms will always restrict the use cases as well. Choose Wisely So, it’s 2 AM. Your application is down. Your DevOps/Ops/Engineering team is trying to solve the problem. They can either be frustrated that they can’t get their questions answered, or they can breeze through their operational data to get the answers they need. I know what two-year old would tell you to do. Time to put your old approach to data analytics in the twash.

January 10, 2017

Blog

Making the Most of AWS Lambda Logs

How can you get the most out of monitoring your AWS Lambda functions? In this post, we’ll take a look at the monitoring and logging data that Lambda makes available, and the value that it can bring to your AWS operations. You may be thinking, “Why should I even monitor AWS Lambda? Doesn’t AWS take care of all of the system and housekeeping stuff with Lambda? I thought that all the user had to do was write some code and run it!” A Look at AWS Lambda If that is what you’re thinking, then for the most part, you’re right. AWS Lambda is designed to be a simple plug-and-play experience from the user’s point of view. Its function is simply to run user-supplied code on request in a standardized environment. You write the code, specifying some basic configuration parameters, and upload the code, the configuration information, and any necessary dependencies to AWS Lambda. This uploaded package is called a Lambda function. To run the function, you invoke it from an application running somewhere in the AWS ecosystem (EC2, S3, or most other AWS services). When Lambda receives the invoke request, it runs your function in a container; the container pops into existence, does its job, and pops back out of existence. Lambda manages the containers—You don’t need to (and can’t) do anything with them. So there it is—Lambda. It’s simple, it’s neat, it’s clean, and it does have some metrics which can be monitored, and which are worth monitoring. Which Lambda Metrics to Monitor? So, which Lambda metrics are important, and why would you monitor them? There are two kinds of monitoring information which AWS Lambda provides: metrics displayed in the AWS CloudWatch console, and logging data, which is handled by both CloudWatch and the CloudTrail monitoring service. Both types of data are valuable to the user—the nature of that value and the best way to make use of it depend largely on the type of data. Monitoring Lambda CloudWatch Console Metrics Because AWS Lambda is strictly a standardized platform for running user-created code, the metrics that it displays in the CloudWatch console are largely concerned with the state of that code. These metrics include the number of invocation requests that a function receives, the number of failures resulting from errors in the function, the number of failures in user-configured error handling, the function’s duration, or running time, and the number of invocations that were throttled as a result of the user’s concurrency limits. These are useful metrics, and they can tell you a considerable amount about how well the code is working, how well the invocations work, and how the code operates within its environment. They are, however, largely useful in terms of functionality, debugging, and day-to-day (or millisecond-to-millisecond) operations. Monitoring and Analyzing AWS Lambda Logs With AWS Lambda, logging data is actually a much richer source of information in many ways. This is because logging provides a cumulative record of actions over time, including all API calls made in connection with AWS Lambda. Since Lambda functions exist for the most part to provide support for applications and websites running on other AWS services, Lambda log data is the main source of data about how a function is doing its job. “Logs,” you say, like Indiana Jones surrounded by hissing cobras. “Why does it always have to be logs? Digging through logs isn’t just un-fun, boring, and time-consuming. More often than not, it’s counter-productive, or just plain impractical!” And once again, you’re right. There isn’t much point in attempting to manually analyze AWS Lambda logs. in fact, you have three basic choices: either ignore the logs, write your own script for extracting and analyzing log data, or let a monitoring and analytics service do the work for you. For the majority of AWS Lambda users, the third option is by far the most practical and the most useful. Sumo Logic’s Log Analytics Dashboards for Lambda To get a clearer picture of what can be done with AWS Lambda metrics and logging data, let’s take a look at how the Sumo Logic App for AWS Lambda extracts useful information from the raw data, and how it organizes that data and presents it to the user. On the AWS side, you can use a Lambda function to collect CloudWatch logs and route them to Sumo Logic. Sumo integrates accumulated log and metric information to present a comprehensive picture of your AWS Lambda function’s behavior, condition, and use over time, using three standard dashboards: The Lambda Overview Dashboard The Overview dashboard provides a graphic representation of each function’s duration, maximum memory usage, compute usage, and errors. This allows you to quickly see how individual functions perform in comparison with each other. The Overview dashboard also breaks duration, memory, and compute usage down over time, making it possible to correlate Lambda function activity with other AWS-based operations, and it compares the actual values for all three metrics with their predicted values over time. This last set of values (actual vs. predicted) can help you pinpoint performance bottlenecks and allocate system resources more efficiently. The Lambda Duration and Memory Dashboard Sumo Logic’s AWS Lambda Duration and Memory dashboard displays duration and maximum memory use for all functions over a 24-hour period in the form of both outlier and trend charts. The Billed Duration by Hour trend chart compares actual billed duration with predicted duration on an hourly basis. In a similar manner, the Unused Memory trend chart shows used, unused, and predicted unused memory size, along with available memory. These charts, along with the Max Memory Used box plot chart, can be very useful in determining when and how to balance function invocations and avoid excessive memory over- or underuse. The Lambda Usage Dashboard The Usage dashboard breaks down requests, duration, and memory usage by function, along with requests by version alias. It includes actual request counts broken down by function and version alias. The Usage dashboard also includes detailed information on each function, including individual request ID, duration, billing, memory, and time information for each request. The breakdown into individual requests makes it easy to identify and examine specific instances of a function’s invocation, in order to analyze what is happening with that function on a case-by-case level. It is integrated, dashboard-based analytics such as those presented by the Sumo Logic App for AWS Lambda that make it not only possible but easy to extract useful data from Lambda, and truly make the most of AWS Lambda monitoring. About the Author Michael Churchman started as a scriptwriter, editor, and producer during the anything-goes early years of the game industry. He spent much of the ‘90s in the high-pressure bundled software industry, where the move from waterfall to faster release was well under way, and near-continuous release cycles and automated deployment were already de facto standards. During that time he developed a semi-automated system for managing localization in over fifteen languages. For the past ten years, he has been involved in the analysis of software development processes and related engineering management issues.

Blog

Triggering AWS Lambda Functions from Sumo Logic Alerts

Blog

Leveraging Machine Data Analytics for DevOps

Early on, we measured data volume in gigabytes. Then we moved onto terabytes. Now, it’s petabytes. But the scale of data is not the only thing that has changed. We now deal with different types of data as well. In particular, the introduction of large volumes of machine data have created new opportunities for machine data analytics. Leveraging machine data, especially logs and metrics, is a key part of advancing the DevOps workflow. Advanced analytics based on machine data allows DevOps engineers to make sense of petabytes of data by using statistical, indexing, filtering and machine learning techniques. In this post, I explain how to use Sumo Logic’s cloud-native platform to analyze large volumes of machine data to drive actionable insights. Using Sumo Logic Unified Logs and Metrics for Machine Data Let’s start by discussing how Sumo Logic allows users to visualize machine logs and metrics. Sumo makes this information available through a single, unified interface—the Sumo Logic Application Status Dashboard. The dashboard shows the DevOps engineer a real-time visualization of the status quo regarding logs and metrics. The Sumo Logic Application Status Dashboard The image above shows the available metrics in this example: latency, customer logins, CPU usage, app log errors, and memory usage and errors. Additional metrics can be visualized in the dashboard as well, depending on which type of data is available. Examples of supported logs include error logs, binary logs, general and slow query logs, and DDL logs. In addition, since the dashboard is connected to those logs, it allows you to drill down to find more details about an issue. The Sumo Logic Dashboard and DevOps Using the available logs and metrics, a DevOps engineer can perform a quick root cause analysis on a production issue so the problem can be addressed quickly. That’s essential in DevOps because quick resolution of problems assures that pipelines can keep flowing continuously. This video demonstrates the Sumo Logic Application Dashboard in action: Notice in particular the use of filtering—one of the analytics techniques Sumo Logic uses to help a DevOps engineer tackle an issue. Other analytics methods include statistical, indexing and machine learning techniques. Machine Data, Predictive Analytics and Sumo Logic Sumo Logic lets you do more with machine data than simply find out what happened. You can also use it as a predictive analytics platform to identify trends and understand what is likely to happen next with your infrastructure or DevOps development pipeline. Predictive analytics based on machine data are valuable because the vast volume of data coming daily into an organization means that a large amount of that data turns into noise and ultimately masks the messages that are most important. With predictive analytics, DevOps teams can make the most of all data, even if they can’t react to it all in real time. Consider, for example, the case of a CPU usage spike or a memory drop. Predictive analytics techniques could help you to predict when such events will occur again so that you can prepare for them. Similarly, predictive analytics delivered via tools like Sumo Logic can help you to find patterns in a vast amount of data without having to program your own code. Sumo can identify the trends and help you make sense of them through a convenient interface. That’s a big help to DevOps professionals because it means that, using Sumo Logic, they can make sense of a large volume of information without having to be experts in statistics or data analytics programming. Instead, they can focus on what they know best—whether it is coding, testing or system administration—and rely on Sumo Logic to be the data analytics expert on their team. LogReduce: Clean Up Your Machine Data A final feature worth mentioning is LogReduce. This is a feature in Sumo Logic that, like unified logs and metrics, helps DevOps engineers to reduce the noise in their machine data. The following video shows an example of LogReduce: As you can see, a lot of calculations and analysis are done under the hood. All the DevOps engineer had to do was push the LogReduce button. This saves the DevOps engineer from having to worry about machine learning techniques, freeing him or her to focus on the problem to be solved. In my opinion, every DevOps engineer using the LogReduce button should have at least a basic understanding of machine data. Otherwise, results could be misinterpreted. Still, LogReduce is a great feature for transforming a baseline knowledge of machine data analytics into expert-level results. About the Author Cordny Nederkoorn is a software test engineer with over 10 years of experience in finance, e-commerce and web development. He is also the founder of TestingSaaS, an international community researching cloud applications with a focus on forensics, software testing and security.

Blog

How to Deploy and Manage a Container on Azure Container Service

I’ll admit, I’m new to Microsoft Azure. As a developer for startups and agencies, I am most often exposed to lightweight cloud hosting providers like Heroku and Digital Ocean. Recently, however, I started working with a new client that hosts their entire architecture on Azure, so I’ve had to learn quickly. While my client isn’t using container technology like Docker, it is something that I have been working with a lot recently, so I took it upon myself to explore the Azure Container Service (ACS). I have to say, I was pleasantly surprised by how easy it was to get started. Getting Started with the Azure Container Service The first thing that you’ll notice about Microsoft Azure is that it is significantly different than its primary competitor, Amazon AWS. With an application marketplace, Azure seems more focused on being able to quickly spin up resources with as little overhead as possible. This allows for skipping some steps when it comes to setting up open source services. Because of the simplicity of the Marketplace, getting started with the Azure Container Service is as simple as running a search. Find Azure Container Service on the Marketplace. Once you’ve found the Azure Container Service (the one published by Microsoft), click the “Create” link. This will bring up a simple step-by-step setup process for your container cluster. Step 1 – Azure Container Service Basics Configuring Basic Settings for an Azure Container. The first step in creating your ACS is to set up your user information, SSH public key, and resource group. Like any good SSH configuration, Azure servers use public key authentication. This is a more secure authentication method that allows you to log into a remote server without having to create (or remember) a password. If SSH keys are a new concept to you, finding (or generating) one is a pretty straightforward process, but I’ve found that GitHub has a far better walkthrough than I could ever write. In addition to setting up your username and SSH public key, you also need to set up your resource group. The resource group does exactly what its name implies: it groups your resources. You can either create a new one (as shown in the screenshot above), or you can pick an existing one to add the Container Service to. Step 2 – Container Framework Configuration Providing Framework Information for Your Container. After setting up authentication and grouping your resources, you’ll need to decide on your container orchestrator. The available options here are Docker Swarm and DC/OS. To be honest, each solution has its own benefits and drawbacks, but if you are new to Docker, I recommend using Swarm because it is Docker’s native clustering solution. Step 3 – Azure Container Service Settings Configuring the Azure Container Service. The next step in the process is to set up some basic rules for your cluster. This can seem a bit confusing at first glance, but in reality, all you are doing is telling ACS how many masters and agents to spin up in your cluster. While the master count is pretty straightforward, the agent count might not be immediately clear. In Docker Swarm, this number is for the initial number of agents in the agent scale set. The same is also true for DC/OS, but indicates the initial number of agents in the private scale set. It’s important to note that a public scale set is also created here, depending on the number of masters in the cluster (1 master = 1 agent, 3-5 masters = 2 agents). Step 4 – Summary and Validation Reviewing the Azure Container Service Summary. The summary step is pretty self-explanatory, but it is important to discuss the validation process here. My first time around with ACS, I got stuck on this page because of a timed out validation process. The ultimate problem was that my internet went out, so the asynchronous response from the validator didn’t come through. Because I was unable to continue to the next step without a successful validation, I made the (wrong) assumption that I was done and closed out of the process. After some clicking around, I realized I needed to pick back up where I left off and wait for validation to pass before moving on to the next step. Step 5 – Buy Your Azure Deployment Explore the Azure Resource Group. Once the configuration process is all said and done, all that is left is to purchase your deployment. I didn’t include a screenshot of this step for obvious reasons, but it is important to note that once you purchase your deployment, it might take some time for everything to be provisioned. As shown in the screenshot above, once the deployment is provisioned, all of the necessary information will be accessible through the resource group indicated in the configuration process. Connecting to the Swarm Launch Containers Using the Swarm Master Overview. Now that we’ve created our Swarm master, we need to actually launch some containers. To do this, first navigate to your new Swarm master virtual machine and take note of the public IP address (in this case, 104.210.24.237). In order to work with our new Swarm master, we will need to use this public IP address and create an SSH tunnel. The process of doing so differs between operating systems, but in a standard Linux-based environment, the command to do so is pretty straightforward: ssh -L 2375:localhost:2375 -f -N username@ip -p 2200 Next, we need to tell Docker about this tunnel so we can control the swarm using our standard local Docker commands. To do so, we simply need to update the DOCKER_HOST environment variable with the configured port: export DOCKER_HOST=:2375 Now, any Docker command we run will be piped through our SSH tunnel directly into our Azure swarm. It’s important to note that the export command only lasts until your terminal session ends, which means that if you open a new session, you will need to re-export your DOCKER_HOST to control your Azure swarm. Business as Usual with Your New Azure Swarm Once you’ve got your SSH tunnel created, and your Docker host variable exported, you can interface with your Azure Swarm as if it is running on your own machine. Let’s take an example straight from Azure’s own documentation and spin up a new Docker container on our swarm. docker run -d -p 80:80 yeasy/simple-web As with any other Docker host, this command spins up a container from the yeasy/simple-web Docker image. Once the container loads, you can see it in action on the DNS name shown in your Swarm master profile. A Simple Web Docker Image. That’s it! Managing your containers in the Azure Container service is an incredibly simple and straightforward process. By allowing you to interface with the swarm using your standard local Docker commands, the learning curve is very low, which speeds up your workflow and reduces the growing pains typically associated with learning a new technology. About the Author Zachary Flower (@zachflower) is a freelance web developer, writer, and polymath. He has an eye for simplicity and usability, and strives to build products with both the end user and business goals in mind. From building projects for the NSA to creating features for companies like Name.com and Buffer, Zach has always taken a strong stand against needlessly reinventing the wheel, often advocating for the use of well established third-party and open source services and solutions to improve the efficiency and reliability of a development project.

Blog

Building a World-Class Cloud Partner Ecosystem

Wow, what a week last week at AWS re:Invent. I spent the weekend reflecting (and recovering) on the progress we have made in the market and how our great partners have impacted Sumo Logic’s success. Sumo Logic was born in the cloud and designed to help customers build, run and secure mission critical cloud applications. If you haven’t already, I encourage you to read our “State of the Modern Applications” report. As we developed this report, we found it incredibly enlightening as have many others. The report highlights how wildly different the technology choice is for building these modern applications in the cloud versus traditional technologies used for on-premises application development. This rise and increased adoption of cloud-native applications has heavily influenced Sumo Logic’s partner ecosystem. We are very proud and honored to be partnered with the leading technologies used to build today’s modern applications. We share a common vision with our partners to help customers accelerate their adoption of cloud more quickly and more safely. So, together with MongoDB, Fastly, Pivotal, NGINX, CrowdStrike, Okta and Evident.IO, we decided to THROW A PARTY to celebrate the success we shared in 2016 and to kick off 2017 with a bang! We had well over 300 customers, partners, and people wanting to learn more at the event. It was a fantastic time, and I want to thank our partners for sponsoring and everyone who attended. As I look to 2017, I couldn’t be more excited to be at Sumo Logic and the tremendous opportunity that lies ahead for us and our incredible partners. We are just getting started together on this journey, and I am supremely optimistic about our future together. Wishing you all a very happy holiday season and New Year!

AWS

December 8, 2016

Blog

Sumo Logic + New Relic = Comprehensive Application and Infrastructure Operations

At AWS re:Invent 2016 last week, New Relic and Sumo Logic announced a partnership that brings together two leaders in application operations. The integrated solution combines machine data analytics with application and infrastructure performance data to enable joint customers to use the New Relic Digital Intelligence Platform for visualizing the impact of software performance on customer experience and business outcomes. Why is this important and what does this mean for the industry? New Relic is the leader in Application Performance Management and provides detailed performance metrics across an enterprise’s digital business – customer experience, application performance, and infrastructure. Thousands of enterprises use the New Relic solution to proactively monitor and manage application performance alerts (“latency of application is spiking, application is unavailable etc.”). However, to get to the root cause of issues, IT teams also need complete visibility into the logs associated with the application, transactions and the supporting infrastructure. And that is where the integrated Sumo Logic – New Relic solution comes in. Sumo Logic turns machine data – structured and unstructured into operational insights, helping organizations more effectively build, run and secure modern applications and infrastructure. The Sumo Logic service enables customers to ingest and analyze logs with patented machine learning algorithms. And when integrated with New Relic’s full-stack visibility provides joint customers with: Complete picture of what’s happening with users, applications, and instances Proactive identification of application and infrastructure performance issues Faster troubleshooting and root cause analysis leveraging APM and log data However, it’s not just the comprehensive and integrated APM and log analytics views that distinguishes this partnership. What makes this partnership unique is the “operating model” of the two solutions. Like Sumo Logic, New Relic also focuses on modern applications and infrastructures that are developed using agile/DevOps methodologies, use microservices style distributed architectures and typically run in highly elastic and scalable cloud environments. And like Sumo Logic, New Relic is also a SaaS service, providing fast time-to-value and low total-cost-of-ownership for its customers (incidentally, both Sumo Logic and New Relic are in the newly announced AWS SaaS subscription marketplace) At Sumo Logic, we are thrilled to partner with New Relic. We believe that application operations is undergoing fundamental transformations and companies like Sumo Logic and New Relic are bringing a new vision to this market. Stay tuned to see more integration details emerge from this partnership.

December 7, 2016

Blog

Customer Blog: OpenX

Blog

Ground Zero for a $30 Trillion Disruption

Yes, that is a “T” as in trillion. In the last 30 days, I had the pleasure of attending two events that reinforced in my mind that Sumo Logic is at the center of the largest market disruption I will most likely experience in my lifetime. First, I traveled with our Sumo Logic CTO, Christian Beedgen, to Lisbon, Portugal, for his presentation at Europe’s largest technology conference, Web Summit 2016. After watching at least 100 thought leaders representing 20+ different industries from around the world, the comment from Saul Klein of LocalGlobe and Index Ventures about the expected reallocation of $30 trillion in market value from existing Fortune 2000 companies to new disruptors and yet-to-be-born companies beautifully summed up my key takeaway from the show. However this massive transfer of wealth is not just dependent on each country represented (more than 115!) finding and funding the next Google. Klein passionately appealed to global policy makers and influencers to support the 120 million “zebras” on Facebook interested in entrepreneurship in order to have more control over their lives. Self-employed entrepreneurs already represent five percent of the global population and growing. So, imagine the synchronicity I felt when I found myself a few weeks later at what I consider to be the mecca of the zebra wave: AWS re:Invent, Las Vegas. The energy, activity and dialogue of 32,000+ attendees (or ‘builders’ in parlance of AWS CTO Werner Vogels), across a myriad of market categories sharing and demonstrating their cloud-based products, solutions and services was not only exciting, but also gave me a sense of hope. Hope, in that, as Gary Vee said at Web Summit, now with the ubiquity of mobile and the accessibility of cloud, “if you have the imagination and willingness, no one can stop you. No one.” This hope is not just in transformative technologies but also in the very way people come together to create. It’s collaboration that goes beyond the sake of inclusivity in and of itself. The market transition to a size-able, global entrepreneurial workforce will create an ocean of experts who have the opportunity to come together in a much more harmonious way to solve similar problems, providing solutions to market that benefit people and profits. For businesses adopting this philosophy, silos will dissipate in favor of collaborative structures, processes and decision-making. We see this playing out already in digital businesses that have adopted DevOps as their innovation strategy. DevOps practices scuttle linear, waterfall software development approaches in favor of agile and continuous delivery practices resembling “loops.” Silos fade away, and a cross-expertise (as opposed to “cross-function”) of development, testing, deployment, operations, security and line-of-business professionals come together to create, monitor, troubleshoot, optimize and then creation again — a continuous loop that drives continuous innovation. And these loops won’t be cookie-cutter — they will be unique to the ideas, culture and opportunities of each organization, thereby creating new sources of business competitive advantage, and new sources of hope for unimagined opportunities. We see examples of this today in organizations such as Google, Facebook, Airbnb, Uber, Spotify, Snapchat, Amazon and Netflix to name a few. So, every company faces a fundamental question — to be part of the policy of continuous innovation (hope), or to maintain a policy of entrenchment to preserve the status quo. In my mind, no group will survive a policy of entrenchment or isolation. A company might keep going for a while, but ultimately it won’t survive. The longer it waits, the more it delays the inevitable. That’s why Sumo Logic is passionate about helping companies make this transition. Our machine data analytics platform provides the visibility and confidence companies desire to transition to cloud and modern applications. We think of it as “continuous intelligence” to feed the loop of continuous innovation decision-making. And the value is applicable across the organization. For example, as a 20-year marketing & communication veteran, I’m excited to finally be in the position of being a consumer of a B2B technology that’s relevant to my expertise. Our cloud-native, multi-tenant platform service enables us to leverage customer meta data to better understand their usage patterns of our own product, so we can better support and market to them, in addition to optimizing our services. And that re-allocated $30T I mentioned earlier? Well wrap your head around this: one expected outcome will be that 75% of the S&P 500 will be replaced in the next 10 years, at the current S&P churn rate. (Source: Innosight, Richard N. Foster, Standard & Poor’s). So, there’s a sense of urgency in the air, which may explain the astonishing growth rates of Web Summit (from a couple of hundred in Dublin 2010 to more than 50,000 in six years), and AWS re:Invent (5,500 attendees in 2012 to more than 32,000 this year). And that gives me hope too because it means people are getting it. So, when I look back ten years from now at Web Summit and AWS re:Invent 2016, I’m proud to know that I was there at ground zero for a $30T market disruption. And my forecast for the next ten years? It will be a wild ride for sure, but those who go ‘all-in’ on cloud computing, focus on new ideas and delivering value continuously at speed, are likely to come out on top. And at a time when much in the world seems unknown, the future certainly seems bright to me. https://www.sumologic.com/blog... class="at-below-post-recommended addthis_tool">

Blog

Designing a Data Analytics Strategy for Modern Apps

Yesterday at AWS re:Invent 2016, Sumo Logic Co-Founder and CTO Christian Beedgen presented his vision for machine data analytics in a world where modern apps are disrupting virtually every vertical market in business. Every business is a software business, Marc Andreessen wrote more than five years ago. Today, driven by customer demand, the need to differentiate and the push for agility, digital transformation initiatives are disrupting every industry. “We are still at very the beginning of this wave of Digital Transformation,” Christian said. “By 2020 half of all businesses will have figured out digitally enhanced products and services.” The result is that modern apps are being architected differently than they were just 3 years ago. Cloud applications are being built on microservices by DevOps teams that automate to deliver new functionality faster. “It used to be that you could take the architecture and put it on a piece of paper with a couple of boxes and a couple of arrows. Our application architecture was really clean.” But with this speed and agility comes complexity, and the need for visibility has become paramount. “Today our applications look like spaghetti. Building microservices, wiring them up, integrating them so they can work with something else, foundational services, SQL databases, NoSQL databases…” You need to be able to see what’s going on, because you can’t fix what you cannot see. Modern apps require Continuous Intelligence to provide insights, continuously and in real-time, across the entire application lifecycle. Designing Your Data Analytics Strategy Ben Newton, Sumo Logic’s Principal Product Manager of the Metrics team, took the stage to look at the various types of data and what you can do with them. Designing a data analytics strategy begins by understanding the data types that are produced by machine data, then focusing on the activities that data supports. The primary activities are Monitoring where you detect and notify (or alert), and Troubleshooting where you identify, diagnose, restore and resolve. “What we often find is that users can use that same data to do what we call App Intelligence – the same logs and metrics that allows you to figure out something is broken, also tells you what your users are doing. If you know what users are doing, you can make life better for them because that’s what really matters.” So who really cares about this data? When it comes to monitoring where the focus is on user-visible functionality, it’s your DevOps and traditional IT Ops teams. Engineering and development also are responsible for monitoring their code. In troubleshooting apps where the focus is on end-to-end visibility, customer success and technical support teams also become stakeholders. For app intelligence, the focus is on user activity and visibility everyone is a stakeholder including sales, marketing, and product management. “Once you have all of this data, all of these people are going to come knocking on your door,” said Ben. Once you understand the data types you have, where it is within your stack and the use cases, you can begin to use data to solve real problems. In defining what to monitor and measure, Ben highlighted: Monitor what’s important to your business and your users. Measure and monitor user visible metrics. Build fewer, higher impact, real-time monitors. “Once you get to troubleshooting side, it gets back to you can’t fix what you can’t measure.” Ben also said: You can’t improve what you can’t measure. You need both activity metrics and detailed logs. Up to date data drives better data-driven decisions. You need data from all parts of your stack. So what types of data will you be looking at? Ben broke it down to the following categories: Infrastructure Rollups vs. Detailed What resolution makes sense? Is real-time necessary? Platform Rollups vs. Detailed Coverage of all components Detailed logs for investigations Architecture in the metadata Custom How is your service measured? What frustrates users? How does the business measure itself? “Everything you have produces data. It’s important to ensure you have all of the components covered.” Once you have all of your data, it’s important to think about the metadata. Systems are complex and the way you make sense out of it is through your metadata. You use metadata to describe or tag your data. “For the customer, this is the code you wrote yourself. You are the only people that can figure out how to monitor that. So one of the things you have to think about is the metadata. ” Cloud Cruiser – A Case Study Cloud Cruiser’s Lead DevOps Engineer, Ben Abrams, took the stage to show how the company collects data and provide some tips on tagging it with metadata. Cloud Cruiser is a SaaS app that enables you to easily collect, meter, and understand your cloud spend in AWS, Azure, and GCP. Cloud Cruiser’s customers are large enterprises and mid-market players globally distributed across all verticals, and they manage hundreds of millions of cloud spend. Cloud Cruiser had been using an Elastic (Elasticsearch, Logstash, and Kibana) stack for their log management solution. They discovered that managing their own logging solution was costly and burdensome. Ben cited the following: Operational burden was a distraction to the core business. Improved security. Ability to scale + cost. Cloud Cruiser runs on AWS (300-500 instances) and utilizes microservices written in Java using the dropwizard framework. Their front-end web app runs on Tomcat and uses Angularjs. Figure 1 shows the breadth of the technology stack: In evaluating a replacement solution, Ben said “We were spending too much time on our ELK stack.” Sumo Logic’s Unified Logs and Metrics (ULM) was also a distinguishing factor. The inclusion of metrics meant that they didn’t have to employ yet another tool that would likewise have to be managed. “Logs are what you look at when something goes wrong. But Metrics are really cool.” Ben summarized the value and benefits they achieved this way: Logs Reduced operational burden. Reduced cost. Increased confidence in log integrity. Able to reduce the number of people needing VPN. Alerting based on searches did not need ops handholding. Metrics Increased visibility in system and application health. Used in an ongoing effort with application and infrastructure changes in that we were able to reduce our monthly AWS bill by over 100%. Ben then moved into a hands on session, showing how they automate the configuration and installation of Sumo Logic collectors, and how they tag their data using source categories. Cloud Cruiser currently collects data from the following sources: Chef: automation of config and collector install Application Graphite Metrics from Dropwizard Other graphite metrics forwarded by Sensu to Sumo Logic “When I search for something I want to know what environment is it, what type of log is it, and which server role did it come from.” One of their decisions was to differentiate log data from metrics data as shown below. Using this schema allows them to search logs and metrics by environment, type of log data and corresponding Chef role. Ben walked through the Chef Cookbook they used for deploying with Chef and shared how they automate the configuration and installation of Sumo Logic collectors. For those interested, I’ll follow on this up in the DevOps Blog. A key point from Ben, though, was “Don’t log secrets.” The access ID and key should be defined elsewhere, out of scope and stored in an encrypted data bag. Ben also walked through the searches they used to construct the following dashboard. Through this one dashboard, Cloud Cruiser can utilize both metrics and log data to get an overview of the health of their production deployment. Key Takeaways Designing your data analytics strategy is highly dependent on your architecture. Ultimately it’s about the experience you provide to your users. It’s no longer just about troubleshooting issues in production environments. It’s also about understanding the experience you provide to your users. The variety of data that streams in real time comes from the application, operating environment and network layers produces an ever increasing volume of data every day. Log analytics provides the forensic data you need, and time-series based metrics give you insights into the real-time changes taking place under the hood. To understand both the health of your deployment and the behavior/experience of your customers, you need to gather machine data from all of its sources, then apply both logs and metrics to give teams from engineering to marketing the insights they need. Download the slides and view the entire presentation below:

Blog

Evident.io: Visualize, Analyze and Report on Security Data From AWS

Evident.io and Sumo Logic team up to provide seamless integrated visibility into compliance monitoring and risk attribution Analyzing and visualizing all your security data in one place can be a tricky undertaking. For any SOC, DevSecOps or DevOps team in heterogeneous environments, the number of tools in place to gain visibility into and monitor compliance can be daunting. The good news is that Evident.io and Sumo Logic have teamed up to bring you a simple-to-implement, yet effective integration that allows you to perform additional analytics and visualization of your Evident Security Platform data in the Sumo Logic Analytics platform. Evident.io ESP is an agentless, cloud-native platform focused on comprehensive continuous security assessment of the control plane for AWS cloud infrastructure services. ESP can monitor all AWS services available through the API, ensuring their configurations are in line with AWS best practices for security as well as your organization’s specific compliance requirements. Sumo Logic is a leading SaaS-native, machine data analytics service for log management and time series metrics. Sumo Logic allows you to aggregate, perform statistical analytics, report on trends, visualize and alert on all your operational, performance and security related event log data in one place from just about any data source. Why integrate with Sumo Logic? Both of these platforms are architected for the cloud from the ground up and have a solid devops pedigree. This integration allows you to aggregate all the data generated by your AWS cloud infrastructure in the same place as your application level security and performance event data which allows you to perform attribution on a number of levels. The Evident.io alert data is rich with configuration state data about your security posture with regards to AWS best practices for security and the CIS Benchmarks for AWS. As customers adopt CI/CD concepts; being able to quickly visualize, alert and remediate, in near real-time, on any vulnerabilities introduced by misconfiguration is critical. Evident.io and Sumo Logic combined can help you do this better and faster. And, best yet, it is super easy to get started with Evident.io and Sumo Logic in a matter of minutes. The Sumo Logic App for Evident.io ESP The Sumo Logic App for Evident.io ESP enables a user to easily and quickly report on some key metrics from their AWS Cloud infrastructure such as: Trend analysis of alerts over time (track improving or deteriorating posture over time) Time to resolve alerts (For SLAs – by tracking the start and end of an alert in one report) Summary of unresolved alerts/risks Number of risks found by security signatures over time Below are some screen shots from the Sumo Logic App for Evident.io ESP: Figure 1 is an overview of the the types and severity of risks, alert status and how long before a risk is resolved and marked as ended on the Evident.io side. This can be an important metric when managing to SLAs. Fig. 1 Figure 2 provides a detailed view of the risks identified by Evident.io ESP within the configured time range for each of the dashboard panels. The panels present a views into: Which Evident.io ESP signatures triggered the risks A breakdown of: risks identified by AWS region risks by AWS account number of total identified risks number of newly identified risks Fig. 2 The chart in Fig 3 below is an interesting one that shows risks identified clearly trending down over 14 days. This is indicating that the teams are remediating identified issues in the Evident.io ESP alerts, and you clearly see an improvement in the security posture of this very large AWS environment that has 1000s of instances. Note: There are almost no high severity risks in this environment. Fig. 3 Is my data secure? These two platforms do an awesome job of securing your data both in flight and in transit, with both using TLS 1.2 encryption for in flight data and customer specific 256 bit AES encryption keys for at rest data. You can be confident that this data is securely transported from the Evident Security Platform to Sumo Logic and stored in a secure fashion. How can I gain access? This integration relies on the use of AWS SNS (Simple Notification Service) and a Sumo Logic native https collector. If you are both an Evident.io and Sumo Logic customer you can enable and start to benefit from the integration using the directions here: http://help.sumologic.com/Special:Search?qid=&fpid=230&fpth=&path=&search=evident.io or http://docs.evident.io/#sumo. Note you will need to have access to both Evident.io and Sumo Logic instances. Security and compliance monitoring are no longer a bottleneck in your agile environment. You can start visualizing the data from Evident Security Platform (ESP) in Sumo Logic in a matter of minutes. This blog post was written by Hermann Hesse, Senior Solutions Architect at Evident.io. He can be reached at https://www.linkedin.com/in/hermann-hesse-a040281

AWS

November 30, 2016

Blog

CDN with AWS CloudFront - Tutorial

Blog

AWS – The Biggest Supercomputer in the World

AWS is one of the greatest disruptive forces in the entire enterprise technology market. Who would have thought when they launched in 2006, it was going to kick off perhaps the most transformative shift in the history of the $300B data center industry. Over 25,000 people (or 0.0003% of the World’s population) are descending on Vegas this week to learn more about AWS, the biggest supercomputer in the world. As we get ready to eat, drink, network and learn, I wanted to provide some responses to inquiries I often get from prospects, reporters and folks who I meet at various conferences around the country. What advice would you pass on to anyone deciding to use AWS for public cloud storage? Understand the IaaS provider’s shared security model. In Amazon’s case, AWS is responsible for the infrastructure. The customer is responsible for the security of everything that runs on that infrastructure – The applications, the workloads and the data. Make sure any additional service you use on top of that have pursue their own security certifications and attestations to protect data at rest and in motion. This will allay fears and give people comfort in sending data through a SaaS-based service. We find that organizations are making different decisions based on the trust level they have with their partners, and we at Sumo Logic take this very seriously investing millions to achieve and maintain on an ongoing basis, these competitive differentiators. Too many people try to live vicariously through the certifications AWS has and pass this on as adequate Understand the benefits you are hoping to achieve before you start (i.e. Better pricing / reduced cost; Easier budget approvals (CAPEX vs. OPEX); Increase Business Agility; Increase flexibility and choices of what programming models, OS, DB and architectures make sense for the business; Increased security; Increased workload scalability / elasticity, etc.) How can we maximize AWS’s value? Crawl, walk, run – it is a learning curve that will take time to master. Adopt increasing levels of services as your teams get up to speed and understands how to leverage APIs and automate everything through code. Compute as code is now a reality. Understand the pain points you are trying to address – this will dictate approach (i.e. Pricing / Cost / Budget; Internal Politics; Control of Data Locality; Sovereignty; Security; Compliance, etc.) Turn on logging within AWS. More specifically, activate Amazon CloudWatch to log all your systems, applications and services and activate AWS CloudTrail to log all API actions. This will provide visibility into all user actions on AWS. The lack of visibility into cloud operations and controls stands as the largest security issue we see. What cautions might there be in terms of how to end up paying more than one should or not really getting full value out of this type of storage? Understand not all data is created equal…in terms of importance, frequency of access, life expectancy of the data, retention requirements, and search performance. Compare Operational data (high importance, high frequency of access, short life expectancy, high search performance requirements) to audit data (medium importance, lower frequency of access, longer life expectancy/data retention requirements, low performance requirements) Align your storage needs to the value and urgency of the data that you are logging (S3, S3 Infrequent Access, Glacier, EBS, etc.) Look for solutions and tools that are cloud native, so you can avoid unnecessary data exfiltration costs. 10 years ago, no one was virtualizing mission critical workloads because of Security and Compliance concerns…but we ended up there anyways. This is exactly the same thing for cloud. And in this new world, speed and time to market is everything. Organizations are looking to be more flexible, more agile, capitalize on business opportunities, and how you approach security is different. And to support the rapid pace of delivery of these digital initiatives – weekly, even daily – these companies are leveraging modern, advanced IT infrastructures like AWS and Sumo Logic. In this new world, we at Sumo Logic have a tremendous opportunity to help operations and security professionals get the visibility they need as those workloads are moved out to the cloud. We help them become cloud enablers, to help drive the business forward, not being naysayers. Visibility is everything! Come stop by our booth – #604 – and say hi!

Blog

Advanced Security Analytics for AWS

Every company – if they are going to remain relevant – is going through some form of digital transformation today and software is at the heart of this transformation. According to a report by the center for digital business transformation, the digital disruption will displace approximately 40% of incumbent companies within the next 5 years. Don’t believe it? According to Forrester Research, between 1973 and 1983, 35% of the top 20 F1000 companies were new. Now jump forward 20 years, and this number increases to 70%. According to predictions from IDC’s recent FutureScape for Digital Transformation, two-thirds of Global 2000 companies will have digital transformation at the center of their corporate strategy by next year, and by 2020, 50% of the Global 2000 will see the majority of their business depend on their ability create digitally-enhanced products, services, and experiences. So what does this all mean? Keeping pace with the evolving digital marketplace requires not only increased innovation, but also updated systems, tools, and teams. Accenture and Forrester Research reported in their Digital Transformation in the Age of the Customer study that only 26% of organizations considered themselves fully operationally ready to execute against their digital strategies. In order to deliver on the promise of digital transformation, organizations must also modernize their infrastructure to support the increased speed, scale, and change that comes with it. We see three characteristics that define these modern applications and digital initiatives: They follow a DevOps or DevSecOps culture, where the traditionally siloed walls between the Dev, Ops and Security teams are becoming blurred, or go away completely. This enables speed, flexibility and agility. They are generally running on modern infrastructure platforms like AWS (see AWS Modern Apps Report), leveraging APIs and compute as code (see AWS – The Largest Supercomputer in the World) The way you approach security needs to change. You need deep visibility & native integrations across the AWS services that are used, you need to understand your risks and security vulnerabilities, you need to connect the dots between the services used, and understand what the users are doing, where are they coming from, what are they changing, what are the relationship of those changes, how this impacts network flows and security risks. And it is important to be able to match information contained in your AWS log data – i.e. IP Address, Ports, UserIDs, etc – from services like CloudTrail and VPC Flow Logs, with known Indicators of Compromise (IOCs) that are out there in the wild from premium threat intelligence providers like Crowdstrike. Pulling in global threat intelligence into Sumo Logic’s Next Gen Cloud Security Analytics for AWS accomplishes the following: Increases velocity & accuracy of threat detection Adds additional content to log data and helps to identify and visualize malicious IP addresses, domain names, ports, email addresses, URLs, and more. Improve security and operational posture through accelerated time to identify and resolve security threats (IOC) Come stop by our booth – #604 – for a demo and say hi!

AWS

November 29, 2016

Blog

Getting Started with AWS EC2 Container Service (ECS)

Amazon Web Services is the leading Infrastructure as a Service (IaaS) provider. They have over 50+ groups of services that run the gamut from mobile services to services to support Internet of Things (IoT). They are most well known for their EC2 and S3 services, and in leveraging this broad base, they are able to layer additional, more complex services on top. The EC2 Container Service leverages EC2 compute instances to provide a quick way to set up and scale a container cluster. The Basics The EC2 instances backing ECS use all of the supporting features you are familiar with in AWS. When you configure your initial cluster (or any subsequent cluster for that matter), ECS configures EC2 instances as cluster hosts and configures security groups, VPCs, subnets, routes, and gateways to support them. The cluster also comes online with a suite of basic metrics on CPU and memory utilization. EC2 leverages the docker engine as a container primitive. This is different, and not to be confused with any of Docker’s enterprise-geared cluster management offerings like Docker Datacenter. ECS is designed to replace the need for a container cluster manager to manage these Docker containers in production. Another benefit of using ECS as an AWS-aware cluster manager is that if the cluster needs to increase the number of hosts, it can do so as needed. You can also pair scaling your hosts with scaling your service running containerized leveraging Service Auto Scaling and CloudWatch. Head’s up – here are a couple of items to be aware of when you are first getting started. There is currently no Windows support for the ECS CLI. (You will quickly run into a known bug if you try and follow the Getting Started tutorial from a PC.) Also, during the initial configuration of ECS, you will be prompted to create a build repository. This repository will be hosted on S3, and will be used as a source for images used to launch services and tasks. Clusters and Instances EC2 Container Service, as the name describes, runs on EC2 instances. When starting for the first time, you will use a wizard to build your initial cluster. From there, you can modify or build additional clusters on the Clusters tab. These EC2 instances use the AWS-developed open-source “ecs agent” to run the containers. The multiple clusters can be accessed through the Management Console or using different AWS profiles on the command line. Services and Tasks The functions of ECS you will be engaging with the most are services and task definitions. A task definition describes the details of a particular group of containers; what data volumes they should attach at run time; what interfaces should be exposed and how they are addressed; and how they should be operated together. A service is the running state of this task definition. As part of the task, a minimum healthy percentage and maximum percentage can be set. The service will be maintained to those specifications. Additional containers will be launched if needed to maintain the service above the minimum health, and shut down if it exceeds the maximum. A number of task instances running can be set to duplicate the definition a number of times as separate services. The task instance value is set to one by default. The task definition can be revised (everything is version controlled), and a new service will be spun up to match the newly modified definition. ECS will ensure the new service is accessible, and then terminate the old service. A recent addition to the ECS now allows for an Elastic LoadBalancer (ELB) to be placed in front of a service, and dynamic ports to be assigned. This update now lets multiple instances of the same task run on the same host. This update also adds an initial framework to support service discovery. The AWS team blog post on the update can be found here. AWS is great for building and running instances for a cluster, and is slowly accumulating features to effectively manage a cluster running on those instances. If you are looking to run containers in production and already use AWS, ECS is definitely worth checking out. About the Author Over the last 10 years, Sara Jeanes has held numerous program management, engineering, and operations roles. She has led multiple team transformations and knows first-hand the pain of traditional waterfall development. She is a vocal advocate for DevOps, microservices, and the cloud as tools to create better products and services. You can follow Sara on Twitter @sarajeanes. Getting Started with AWS EC2 Container Service (ECS) is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.

Blog

Starting Fresh in AWS

Many folks we speak to ask the question: “How do I get started in AWS?” The answer used to be simple. There was a single service for compute, storage, and a few others services in early trials. Fast forward 10+ years and AWS now offers over 50 services. Taking your first steps can be daunting. What follows is my recommended approach if you’re starting fresh in the AWS Cloud and don’t have a lot of legacy applications and deployments weighing you down. If that is you, check out the companion post to this one. Do Less To Do More Everything in AWS operates under a Shared Responsibility Model. The model simple states that for each of the areas required day-to-day operations (physical, infrastructure, virtualization, operation system, application, and data), someone is responsible. That someone is either you (the user) or AWS (the service provider). Light grey options are the responsibility of AWS, Black are the user’s The workload shifts towards the service provider as you move away from infrastructure services (like Amazon EC2) towards abstract services (like AWS Lambda). As a user, you want AWS to do more of the work. This directs your service choice as you start to build in the AWS Cloud. You want to pick more and more of the services that fall under the SaaS or abstract — which is a more accurate term when compared to SaaS — category. Computation If you need to run your own code as part of your application, you should be making your choice based on doing less work. This means starting with AWS Lambda, a service that runs your functions directly without worrying about the underlying frameworks or operating system. If Lambda doesn’t meet your needs, try using a Docker container running on the Amazon EC2 Container Service (ECS). The advantage of this service is that it configures the underlying EC2 instance (the OS, Docker host, scheduling, etc.) and lets you simply worry about the application container. If ECS can’t meet your needs, see if you’re a fit for AWS Elastic Beanstalk. This is a service that takes care of provisioning, capacity management, and application health for you (a/k/a you do less work). All of this runs on top of Amazon EC2. So does Lambda and ECS for that matter. If all else fails, it’s time to deploy your own instances directly in Ec2. The reason you should try to avoid this as much as possible is the simple fact that you’re responsible for the management of the operating system, any applications your install, and — as always — your data. This means you need to keep on top of patching your systems, hardening them, and configuring them to suit your needs. The best approach here is to automate as much of this operational work as possible (see our theme of “do less” repeating?). AWS offers a number of services and features to help in this area as well (start with EC2 AMIs, AWS CodeDeploy, AWS CodePipeline, and AWS OpsWorks). Data Storage When it comes to storing your data, the same principle applies; do less. Try to store you data in services like Amazon DynamoDB because the entire underlying infrastructure is abstracted away for you. You get to focus purely on your data. If you just need to store simple file object, Amazon S3 is the place to be. In concert with Amazon Glacier (long term storage), you get the simplest version of storage possible. Just add an object (key) to a bucket and you’re all set. Under the covers, AWS manages all of the moving parts in order to get you 11 9’s of durability. This means that about 0.000000001% of objects stored in the service may experience data corruption. That’s a level of quality that your simply cannot get on your own. If you need more control or custom configurations, other services like the Amazon Elastic File System or EBS volumes in EC2 are available. Each of these technologies comes with more operational overhead. That’s the price you pay for customization. Too Many Services Due to the shear number of services that AWS provides, it’s hard to get a handle on where to start. Now that you know your guiding principle, it might be worth looking at the AWS Application Architecture Center. This section of the AWS site contains a number of simple reference architectures that provide solutions to common problems. Designs for web application hosting, batch processing, media sharing, and others are all available. These designs give you an idea of how these design patterns are applied in AWS and the services you’ll need to become familiar with. It’s a simple way to find out which services you should start learning first. Pick a design that meets your needs and start learning the services that the design is composed of. Keep Learning AWS does a great job of providing a lot of information to help get you up to speed. Their “Getting Started with AWS” page has a few sample projects that you can try under the free tier. Once you start to get your footing, the whitepaper library is a great way to dive deeper on certain topics. In addition, all of the talks from previous Summits (one to two day free events) and AWS re:Invent (the major user conference) are available for viewing on the AWS YouTube channel. There are days and days of content for you to watch. Try to start with the most recent material as a lot of the functionality has changed over the years. But basic, 101-type talks are usually still accurate. Dive In There is so much to learn about AWS that it can be paralyzing. The best advice I can give is to simply dive in. Find a simple problem that you need to solve, do some research, and try it out. There is no better way to learn than doing. Which leads me to my last point, the community around AWS is fantastic. AWS hosts a set of very active forums where you can post a question and usually get an answer very quickly. On top of that the usual social outlets (Twitter, blogs, etc.) are a great way to engage with others in the community and to find answers to your pressing questions. While this post has provided a glimpse of where to start, be sure to read the official “Getting Started” resources provided by AWS. There’s also a great community of training providers (+ the official AWS training) to help get you up and running. Good luck and happy building! This blog post was contributed by Mark Nunnikhoven, Vice President, Cloud Research at Trend Micro. Mark can be reached at https://ca.linkedin.com/in/marknca.

AWS

November 21, 2016

Blog

Getting Started Under Legacy Constraints in AWS

Getting started in AWS used to be simple. There was a single service for compute, storage, and a few others services in early trials. Fast forward 10+ years and AWS now offers over 50 services. Taking your first steps can be daunting. What follows is my recommended approach if you already have a moderate or large set of existing applications and deployments that you have to deal with and want to migrate to the AWS Cloud. If you’re starting fresh in the AWS Cloud, check out the companion post to this one. Do Less To Do More Everything in AWS operates under a Shared Responsibility Model. The model simple states that for each of the areas required day-to-day operations (physical, infrastructure, virtualization, operation system, application, and data), someone is responsible. That someone is either you (the user) or AWS (the service provider). Light grey options are the responsibility of AWS, Black are the user’s The workload shifts towards the service provider as you move away from infrastructure services (like Amazon EC2) towards abstract services (like AWS Lambda). As a user, you want AWS to do more of the work. This should direct your service choice as you start to build in the AWS Cloud. Ideally, you want to pick more and more of the services that fall under the SaaS or abstract — which is a more accurate term when compared to SaaS — category. But given your existing constraints, that probably isn’t possible. So you need to start where you can see some immediate value, keeping in mind that future project should aim to be “cloud native”. Start Up The Forklift The simplest way to get started in AWS under legacy constraints is to forklift an existing application from your data centre into the AWS Cloud. For most applications, this means you’re going to configure a VPC, deploy a few EC2 instances, an RDS instance (ideally as a Multi-AZ deployment). To make sure you can expand this deployment, leverage a tool like AWS OpsWorks to automate the deployment of the application on to your EC2 instances. This will make it a lot easier to repeat your deployments and to manage your Amazon Machine Images (AMIs). Migrating your data is extremely simple now as well. You’re going to want to use the AWS Database Migration Service to move the data and the database configuration into RDS. Second Stage Now that your application is up and running in the AWS Cloud, it’s time to start taking advantage of some of the key features of AWS. Start exploring the Amazon CloudWatch service to monitor the health of your application. You can set alarms to warn of network bandwidth constraints, CPU usage, and when the storage space on your instances starts to get a little cramped. With monitoring in place, you can now adjust the application’s configuration to support auto scaling and to sit behind a load balancer (either the classic ELB or the new ALB). This is going to provide some much needed resiliency to your application. It’s automated so you’re going to start to realize some of the benefits of AWS and reduce the operational burden on your teams at the same time. These few simple steps have started your team down a sustainable path of building in AWS. Even though these features and services are just the tip of the iceberg, they’ve allowed you to accomplish some very real goals. Namely having a production application working well in the AWS Cloud! On top of that, auto scaling and CloudWatch are great tools to help show teams the value you get by leveraging AWS services. Keep Going With a win under your belt, it’s a lot easier to convince teams to build natively in AWS. Applications that are build from the ground up to take advantages of abstract services in AWS — like Amazon Redshift, Amazon SQS, Amazon SNS, AWS Lambda, and others — will let you do more for your users with less effort on your part. Teams with existing constraints usually have a lot of preconceived notions of how to build and deliver IT services. To truly get the most out of AWS, you have to adopt a new approach to building services. Use small wins and a lot of patience to help convince hesitant team members that this is the best way to move forward. Too Many Services Due to the shear number of services that AWS provides, it’s hard to get a handle on where to start. Now that you know your guiding principle, it might be worth looking at the AWS Application Architecture Center. This section of the AWS site contains a number of simple reference architectures that provide solutions to common problems. Designs for web application hosting, batch processing, media sharing, and others are all available. These designs give you an idea of how these design patterns are applied in AWS and the services you’ll need to become familiar with. It’s a simple way to find out which services you should start learning first. Pick a design that meets your needs and start learning the services that the design is composed of. Keep Learning AWS does a great job of providing a lot of information to help get you up to speed. Their “Getting Started with AWS” page has a few sample projects that you can try under the free tier. Once you start to get your footing, the whitepaper library is a great way to dive deeper on certain topics. In addtion, all of the talks from previous Summits (one to two day free events) and AWS re:Invent (the major user conference) are available for viewing on the AWS YouTube channel. There are days and days of content for you to watch. Try to start with the most recent material as a lot of the functionality has changed over the years. But basic, 101-type talks are usually still accurate. Dive In There is so much to learn about AWS that it can be paralyzing. The best advice I can give is to simply dive in. Find a simple problem that you need to solve, do some research, and try it out. There is no better way to learn than doing. Which leads me to my last point, the community around AWS is fantastic. AWS hosts a set of very active forums where you can post a question and usually get an answer very quickly. On top of that the usual social outlets (Twitter, blogs, etc.) are a great way to engage with others in the community and to find answers to your pressing questions. While this post has provided a glimpse of where to start, be sure to read the official “Getting Started” resources provided by AWS. There’s also a great community of training providers (+ the official AWS training) to help get you up and running. Good luck and happy building! This blog post was contributed by Mark Nunnikhoven, Vice President, Cloud Research at Trend Micro. Mark can be reached at https://ca.linkedin.com/in/marknca.

AWS

November 21, 2016

Blog

5 Patterns for Better Microservices Architecture

Microservices have become mainstream in building modern architectures. But how do you actually develop an effective microservices architecture? This post explains how to build an optimal microservices environment by adhering to the following five principles: Cultivate a Solid Foundation Begin With the API Ensure Separation of Concerns Production Approval Through Testing Automate Deployment and Everything Else Principle 1: Great Microservices Architecture is Based on a Solid Foundation No matter how great the architecture, if it isn’t based on a solid foundation, it won’t stand the test of time. Conway’s law states that “…organizations that design systems … are constrained to produce designs which are copies of the communication structures of these organizations…” Before you can architect and develop a successful microservice environment, it is important that your organization and corporate culture can nurture and sustain a microservice environment. We’ll come back to this at the end, once we’ve looked at how we want to design our microservices. Principle 2: The API is King Unless you’re running a single-person development shop, you’ll want to have an agreed-upon contract for each service. Actually, even with a single developer, having a specific set of determined inputs and outputs for each microservice will save you a lot of headaches in the long run. Before the first line of code is typed, determine a strategy for developing and managing API documents. Once your strategy is in place, focus your efforts on developing and agreeing on an API for each microservice you want to develop. With an approved API for each microservice, you are ready to start development, and begin reaping the benefits of your upfront investment. Principle 3: Separation of Concerns Each microservice needs to have and own a single function or purpose. You’ve probably heard of separation of concerns, and microservices are prime examples for the application of that principle. Additionally, if your microservice is data-based, ensure that it owns that data, and exists as the sole access point for that data. As additional requirements come to light, it can be very tempting to add an additional endpoint to your service that kind of does the same thing (but only kind of). Avoid this at all costs. Keep your microservices focused and pure, and you’ll avoid running into the nightmare of trying to remember which service handled that one obscure piece of functionality. Principle 4: Test-Driven Approval Back in the old days, when you were supporting a large monolithic application, you’d schedule your release weeks or months in advance, including an approval meeting which may or may not have included a thumbs up/thumbs down vote, or fists of five to convey approval or confidence in the new release. With microservice architecture, that changes. You’re going to have a significant number of much smaller applications, and if you follow the same release process, you’ll be spending a whole lot more time in meetings. Therefore, if you’re implementing test-driven development (TDD), writing comprehensive contract and integration tests as you develop each application, you’ll finish your service up with a full test suite which you can automate as part of your build pipeline. Use these tests as the basis for your production deployment approval process, rather than relying on the approval meetings of yore. Principle 5: Automate, Automate, Automate As developers and engineers, we’re all about writing code which can automate and simplify the lives of others. Yet, too often, we find ourselves trapped in a world with manual deployments, manual testing and manual approval processes and change management. Automating these processes when it comes to microservices is less a convenience and more of a necessity, especially as your code base and repertoire of microservices expands and matures. Automate your build pipelines so that they trigger as soon as code is merged into the master branch. Automate your tests, static code analysis, security scans and any other process which you run your code through, and then on condition of all checks completing successfully, automate the deployment of the updated microservice into your environment. Automate it all! Once your microservice is live, ensure that you have configured a means by which the service can be automatica