Blog › Using Sumo

Sending CloudPassage Halo Event Logs to Sumo Logic

04.23.2013 | Posted by CloudPassage: Cloud Security

The below is a guest post from CloudPassage.

Automating your server security is about more than just one great tool – it’s also about linking together multiple tools to empower you with the information you need to make decisions.  For customers of CloudPassage and Sumo Logic, linking those tools to secure cloud servers is as easy as it is powerful.

The CloudPassage Halo Event Connector enables you to view security event logs from CloudPassage Halo in your Sumo Logic dashboard, including alerts from your configuration, file integrity, and software vulnerability scans. Through this connector, Halo delivers unprecedented visibility of your cloud servers via your log management console. You can track server events such as your server rebooting, shutting down, changing IP addresses, and much more.

The purpose of the Halo Event Connector is to retrieve event data from a CloudPassage Halo account and import it into Sumo Logic for indexing or processing. It is designed to execute repeatedly, keeping the Sumo Collector up-to-date with Halo events as time passes and new events occur.

The Halo Event Connector is free to use, and will work with any Halo subscription. To get started integrating Halo events into Sumo Logic, make sure you have set up accounts for CloudPassage Halo and Sumo Logic.

Then, generate an API key in your CloudPassage Halo portal. Once you have an API key, follow the steps provided in the Halo – Sumo Logic documentation, using the scripts provided on Github. The documentation walks you through the process of testing the Halo Event Connector script.  

Once you have tested the script, you will then add the output as a “Source” by selecting “Script” in Sumo Logic (see below).


 

When you have finished adding the new data source that integrates the Halo Event Connector with Sumo Logic (as detailed in the .pdf documentation), you will be taken back to the “Collectors” tab where the newly added Script source will be listed.

 

Once the Connector runs successfully and is importing event data into Sumo Logic, you will see Halo events such as the following appear in your Sumo Logic searches:

Try it out today – we are eager to hear your feedback! We hope that integrating these two tools makes your server security automation even more powerful.

Universal Collection of Machine Data

04.18.2013 | Posted by Sanjay Sarathy, CMO

Customers love flexibility, especially if that flexibility drives additional business value.  In that vein, today we announced an expansion of our log data collection capabilities with our hosted HTTPS and Amazon S3 collectors that eliminate the need for any local software installation.  There may be a variety of reasons why you don’t want or can’t have local collectors  - for example, not having access to the underlying infrastructure as often happens with Infrastructure-As-A-Service (IaaS) environments.  Or you simply don’t feeling like deploying any local software into your current infrastructure. Defining these hosted collectors is now baked into the set-up process, whether you’re using Sumo Logic Free or our Enterprise product.    

 

 

With these new capabilities, companies can now unify how they collect and analyze log data generated from private clouds, public clouds, and their on-premise infrastructure.  They can then apply our unique analytics capabilities like LogReduce to generate insight across every relevant application and operational tier.

With companies increasingly moving towards the Cloud to power different parts of their business, it’s imperative that they have the necessary means to troubleshoot and monitor their diverse infrastructure.  Sumo Logic provides that flexibility.

Harder, Better, Faster, Stronger – Machine Data Analytics and DevOps

03.28.2013 | Posted by Ben Newton, Corporate Sales Engineering Manager

Work It Harder, Make It Better

Do It Faster, Makes Us Stronger

More Than Ever Hour After

Our Work Is Never Over

     Daft Punk – “Harder, Better, Faster, Stronger”

 

When trying to explain the essence of DevOps to colleagues last week, I found myself unwittingly quoting the kings of electronica, the French duo Daft Punk (and Kanye West, who sampled the song in “Stronger”). So often, I find the “spirit” of DevOps being reduced to mere automation, the takeover of Ops by Dev (or vice versa), or other over-simplications. This is natural for any new, potentially over-hyped, trend. But how do we capture the DevOps “essence” – programmable architecture, agile development, and lean methodology – in a few words? It seems like the short lyrics really sum up the essence of the flexible, agile, constantly improving ideal of a DevOps “team”, and the continuous improvement aspects of lean and agile methodology.

So, what does this have to do with machine data analytics and Sumo Logic? Part of the DevOps revolution is a deep and wrenching re-evaluation of the state of IT Operations tools. As the pace of technological change and ferocity of competition keep increasing for any company daring to make money on the Internet (which is almost everybody at this point), the IT departments are facing a difficult problem. Do they try to adapt the process-heavy, tops-down approaches as exemplified by ITIL, or do they embrace a state of constant change that is DevOps?  In the DevOps model, the explosion of creativity that comes with unleashing your development and operations teams to innovate quickly overwhelms traditional, static tools. More fundamentally, the continuous improvement model of agile development and DevOps is only as good as the metrics used to measure success. So, the most successful DevOps teams are incredibly data hungry. And this is where machine data analytics, and Sumo Logic in particular, really comes into its own, and is fundamentally in tune with the DevOps approach.

 

1.  Let the data speak for itself

Unlike the management tools of the past, Sumo Logic makes only basic assumptions about the data being consumed (time stamped, text-based, etc.). The important patterns are determined by the data itself, and not by pre-judging what patterns are relevant, and which are not. This means that as the application rapidly changes, Sumo Logic can detect new patterns – both good and ill – that would escape the inflexible tools of the past.

2.  Continuous reinterpretation

Sumo Logic never tries to force the machine data into tired old buckets that are forever out of date. The data is stored raw so that it can continually be reinterpreted and re-parsed to reveal new meaning. Fast moving DevOps teams can’t wait for the stodgy software vendor to change their code or send their consultant onsite. They need it now.

3. Any metric you want, any time you want it

The power of the new DevOps approach to management is that the people that know the app the best, the developers, are producing the metrics needed to keep the app humming. This seems obvious in retrospect, yet very few performance management vendors support this kind of flexibility. It is much easier for developers to throw more data at Sumo Logic by outputting more data to the logs than to integrate with management tools. The extra insight that this detailed, highly specific data can provide into your customers’ experience and the operation of your applications is truly groundbreaking. 

4. Set the data free

Free-flow of data is the new norm, and mash-ups provide the most useful metrics. Specifically, pulling business data from outside of the machine data context allows you to put it in the proper perspective. We do this extensively at Sumo Logic with our own APIs, and it allows us to view our customers as more than nameless organization ID numbers. DevOps is driven by the need to keep customers happy.

5. Develop DevOps applications, not DevOps tools

The IT Software industry has fundamentally failed its customers. In general, IT software is badly written, buggy, hard to use, costly to maintain, and inflexible. Is it any wonder that the top DevOps shops overwhelmingly use open source tools and write much of the logic themselves?! Sumo Logic allows DevOps teams the flexibility and access to get the data they need when they need it, without forcing them into a paradigm that has no relevance for them. And why should DevOps teams even be managing the tools they use? It is no longer acceptable to spend months with vendor consultants, and then maintain extra staff and hardware to run a tool. DevOps teams should be able to do what they are good at – developing, releasing, and operating their apps, while the vendors should take the burden of tool management off their shoulders.

 

The IT industry is changing fast, and DevOps teams need tools that can keep up with the pace – and make their job easier, not more difficult. Sumo Logic is excited to be in the forefront of that trend. Sign up for Sumo Logic Free and prove it out for yourself.

The Marriage of Machine Data and Customer Service

03.06.2013 | Posted by Sanjay Sarathy, CMO

Last week we announced how Atchik uses Sumo Logic and our ability to easily analyze machine data to reshape its customer service function.  In fact, there are a variety of ways in which customer service organizations can become best friends with your log management infrastructure to improve your customers’ perception of your product or service.  Specifically, companies can use a log management service to:

  • Pinpoint exactly what the customer did during the course of a transaction or interaction with an application or service, as opposed to relying purely on email threads or phone logs.  This root cause analysis can help in understanding bottlenecks that the customer complained about and, just as importantly, provide guidance to the development team on how customers are using the product or service.  Actually it’s a great reason for the app development teams to use the service as well, but that’s the subject of another post.  
  • Easily correlate that application activity with the impact on other infrastructure elements that affect the consumer experience.  Unfortunately, many companies today only focus on a single application view of the customer experience when, given how integrated applications and services are today, it’s critical to get a full picture of all the different ways in which the customer is affected.
  • Proactively address potential customer-facing issues *before* they hit by receiving real-time alerts when application anomalies are diagnosed by the log management solution
  • Create customer dashboards and reports that provide real-time insights into the customer activity you care most about tracking  

We use Sumo Logic internally to support every function in the organization from application development to QA to customer service and even marketing.  Our co-founder and VP of Engineering, Kumar Saurabh, is hosting a webinar on March 26th to talk about “Sumo and Sumo”. We invite you to attend.

 


Using the transpose operator

02.19.2013 | Posted by Yan Qiao, Software Engineer

Sumo Logic lets you access your logs through a powerful query language.  In addition to searching for individual log messages, you may extract, transform, filter and aggregate data from them using a sequence of operators.  There are currently about two dozen operators available and we are constantly adding new ones.  In this post I want to introduce you to a recent addition to the toolbox, the transpose operator.

Let’s say you work for an online brokerage firm, and your trading server logs lines that look like the following, among other things:

2013-02-14 01:41:36 10.20.11.102 GET /Trade/StockTrade.aspx action=buy&symbol=s:131 80 Cole 219.142.249.227 Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_7_3)+AppleWebKit/536.5+(KHTML,+like+Gecko)+Chrome/19.0.1084.54+Safari/536.5 200 0 0 449

There is a wealth of information in this log line, but to keep it simple, let’s focus on the last number, in this case 449, which is the server response time in milliseconds.   We are interested in finding out the distribution of this number so as to know how quickly individual trades are processed.  One way to do that is to build a histogram of the response time using the following query:

stocktrade |  extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by response_time

Here we start with a search for “stocktrade” to get only the lines we are interested in, extract the response time using a regular expression, round it up to the next 100 millisecond, and count the occurrence of each number.  The result looks like: 

Now, it would also be interesting to see how the distribution changes over time.   That is easy with the timeslice operator:

stocktrade | timeslice 1m | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time

and the result looks like the following:

This gets the data we want, but it is not presented in a format that is easy to digest.  For example, in the table above, the first five rows give us the distribution of response time at 8:00, the next five rows at 8:01, etc.  Wouldn’t it be nice if we could rearrange the data into the following table?

That is exactly what transpose does:

stocktrade | timeslice 1m | extract “(?<response_time>\d+$)” | toInt(ceil(response_time/100) * 100) as response_time | count by _timeslice, response_time | transpose row _timeslice column response_time

Here we tell the query engine to rearrange the table using time slice values as row labels, and response time as column labels.

This is especially useful when the data is visualized.  The “stacking” option allows you to draw bar charts with values from different columns stacked onto each other, as shown below:

The length of bars represents number of trading requests per minute, and the colored segments represent the distribution of response time.

That’s it!  To find out other interesting ways to analyze your log data, sign up for Sumo Logic Free and try for yourself!

A Few Good Logs

02.15.2013 | Posted by Praveen Rangnath, Product Marketing

 

“I Want The Logs!”

In the midst of this week’s back and forth between Tesla, the New York Times, and various other media outlets and bloggers, Greylock Data Scientist in Residence (and Sumo Logic Advisory Board Member) DJ Patil posted a tweet that caught my eye: “Love that everyone is using data to have a conversation.  It’s about getting to the right answer.”

DJ is 100% correct, and throughout this Tesla/NY Times debate, we at Sumo Logic are thrilled to see the public recognition of the importance of log data — as a source of the truth.  

Yes, log data needs to be properly analyzed and understood (as the debate makes evident), but what clearly emerged from the debate is the truism that that log data holds the absolute and authoritative record of all the events that occurred.  It’s evident; just see how the discussion revolves entirely around understanding the logs.  

The Bigger Picture

There is a bigger picture to this debate, which is that log data is generated everywhere, whether it be from the car you drive, the energy meter beside your home, the device you’re using to read this blog, the server delivering this content, the network delivering this content, the device I’m using to write this post… I could go on and on.  And in the same way log files generated by a car hold the answer to whether it ran out of power or met range estimates, log files generated by applications, servers, network and virtualization infrastructure hold the answer to whether revenue generating applications are up and adequately performing, if customers are utilizing a newly developed feature, or if any part of an IT infrastructure is slow or experiencing downtime.  

It is important to remember — these are all business critical questions.  And just like Tesla needed to analyze their logs to defend their business, every enterprise, large or small, needs to be able to easily analyze and visualize their log data to ensure the health of their business.

Cars, Enterprises, and Terabytes

Before moving on, let’s not forget, enterprises are not cars, and data generated from enterprises is different from data generated by cars, particularly along three dimensions:  volume, variety, and velocity.  You got it… the 3 Vs of Big Data.  Cars do not (or at least do not yet!) generate up to terabytes of unstructured data per day.  Enterprises with large distributed IT environments do.

This is where Sumo Logic comes in.  Sumo Logic is based on the recognition that enterprises need to be able to easily analyze and visualize the massive amounts of amounts of data generated by their infrastructure and business, and that current on-premise tools just can’t scale.  Today, enterprises generate as much data in 10 minutes as they did in the entire year in 2003.  It is therefore not surprising that legacy on-premise solutions just can’t keep up.

Sumo Logic makes it possible for enterprises of all sizes to find the truth from their data.  And we do so without adding any operational overhead for our customers; Sumo Logic is a 100% cloud-based service.   Large enterprises like Netflix and Land O’Lakes use Sumo Logic.  Fast growing enterprises like PagerDuty and Okta do as well.  

You want some answers?  You have some logs?  We can handle the logs.  

Contact us here, or try it out for yourself by signing up for Sumo Logic Free.

 

Mapping machine data (pun intended)

01.25.2013 | Posted by Amanda Saso, Sr. Tech Writer

When you’re talking analytics, who said that an unfair advantage has to be ugly? Our newest feature is drop-dead gorgeous:

 

What you’re seeing is the result of a geo lookup query, which matches extracted IP addresses to their geographical location–another troubleshooting tool from Sumo Logic. (If you’re ready to skip right to the good stuff and start using this feature, see our Knowledge Base article here.)

Geo lookup queries use four Sumo Logic search language components: IP addresses are parsed, then the lookup operator compares the extracted IPs against a hosted IP geolocation table. The count and sort aggregate functions order the data; using these aggregate functions allows you to add a map to a Dashboard. The results are plugged in to the Google Maps API, and in a few seconds you’ve got a map showing the location of IP addresses. The syntax looks like this:

| parse “remote_ip=*]” as ip_address
| lookup latitude, longitude, country_code, country_name, city, postal_code from geo://default on ip = ip_address
| count by latitude, longitude, country_code, country_name, city, postal_code
| sort _count

It’s important to note the flexibility of geolocation fields that you can choose to use in geo lookup queries. Longitude and latitude are required, but the hosted geolocation table includes fields for different levels of granularity, such as country_name, postal_code, and area_code; depending on the area of the world you’re concentrating on, you can pick and choose which fields make sense in your query.

I also like using the familiar Google Maps interface–there’s no learning curve. The zoom slider/control is displayed both in the Search page, and in a Dashboard:

 

In addition, clicking one of the markers on a map immediately zooms down to street level, meaning that you don’t have to worry about zooming on the wrong area:

 

To learn more about using geo lookup queries to build maps, see Mapping IP addresses with geo lookup queries in the Sumo Logic Labs beta feature section of our Support Portal. While you’re there, be sure to drop us a line!

Or, get started now using Sumo Logic Free!  

Beyond LogReduce: Refinement and personalization

01.23.2013 | Posted by David Andrzejewski, Data Sciences Engineer

LogReduce is a powerful feature unique to the Sumo Logic offering. At the click of a single button, the user can apply the Summarize function to their previous search results, distilling hundreds of thousands of unstructured log messages into a discernible set of underlying patterns.

While this capability represents a significant advance in log analysis, we haven’t stopped there. One of the central principles of Sumo Logic is that, as a cloud-based log management service, we are uniquely positioned to deliver a superior service that learns and improves from user interactions with the system. In the case of LogReduce, we’ve added features that allow the system to learn better, more accurate patterns (refinement), and to learn which patterns a given user might find most relevant (personalization).

Refinement

Users have the ability to refine the automatically extracted signatures by splitting overly generalized patterns into finer-grained signatures or editing overly specific signatures to mark fields as wild cards. These modifications will then be remembered by the Sumo Logic system. As a result, all future queries run by users within the organization will be improved by returning higher-quality signatures.

Personalization

Personalized LogReduce helps users uncover the insights most important to them by capturing user feedback and using it to shape the ranking of the returned results. Users can promote or demote signatures to ensure that they do (or do not) appear at the top of Summarize results. Besides obeying this explicit feedback, Sumo Logic also uses this information to compute a relevance score which is used to rank signatures according to their content. These relevance profiles are individually tailored to each Sumo Logic user. For example, consider these Summarize query results:

 Results before feedback 

Since we haven’t given any feedback yet, their relevance scores are all equal to 5 (neutral) and they fall back to being ranked by count.

Promotion

Now, let’s pretend that we are in charge of ensuring that our database systems are functioning properly, so we promote one of the database-related signatures:

Results after promote

We can see that the signature we have promoted has now been moved to the top of the results, with the maximum relevance score of 10. When we do future Summarize queries, that signature will continue to appear at the top of results (unless we later choose to undo its promotion by simply clicking the thumb again).

The scores of the other two database-related signatures have increased as well, improving their rankings. This is because the content of these signatures is similar to the promoted database signature. This boost also will persist to future searches.

Demotion

This functionality works in the opposite direction as well. Continuing our running example, our intense focus on database management may mean that we find log messages about compute jobs to be distracting noise in our search results. We could try to “blacklist” these messages by putting Boolean negations in our original query string (e.g., “!comput*”), but this approach is not very practical or flexible. As we add more and more terms to our our search, it becomes increasingly likely that we will unintentionally filter out messages that are actually important to us. With Personalized LogReduce, we can simply demote one of the computation-related logs:

Results after demote

This signature then drops to the bottom of the results. As with promotion, the relevance and ranking of the other similar computation-related signature has also been lowered, and this behavior will be persisted across other Summarize queries for this user.

Implicit feedback

Besides taking into account explicit user feedback (promotion and demotion), Summarize can also track and leverage the implicit signals present in user behavior. Specifically, when a user does a “View Details” drill-down into a particular signature to view the raw logs, this is also taken to be a weaker form of evidence to increase the relevance scores of related signatures.

Conclusion

The signature refinement and personalized relevance extensions to LogReduce enable the Sumo Logic service to learn from experience as users explore their log data. This kind of virtuous cycle holds great promise for helping users get from raw logs to business-critical insights in the quickest and easiest way possible, and we’re only getting started. Try these features out on your own logs at no cost with Sumo Logic Free and let us know what you think!

Real-time Enterprise Dashboards, Really

11.14.2012 | Posted by Bruno Kurtic, Founding Vice President of Product and Strategy

Today we shipped a highly anticipated new capability with a novel approach, novel not only to Sumo Logic, but also novel within our space: Real-time Enterprise Dashboards.  Dashboard technologies have been around for many years, but not all dashboard technologies are created equal.  Most existing technologies leverage either precomputed summary data sets or recompute the entire data set every time a dashboard is viewed.  As such, they suffer from long load times, stale information, an inability to handle the data volume.

Our customers faced a specific challenge: how to take terabytes of machine data per day, crunch it, transform it into information, and render that information in a way that supports making business and IT decisions in real time.  Now they can.

When machine data is used to troubleshoot and monitor today’s production applications or infrastructure, data volume is the enemy.  Large farms of Apache or IIS servers, SaaS and other applications, or data center infrastructure like VMware farms, Cisco networking gear, or Linux or Microsoft Windows server farms generate volumes of data that obey Moore’s Law: the data volume doubles every two years.  It only makes sense that the volume of machine data would follow Moore’s Law – if machine computing capacity doubles, those machines do twice the work, as a result they generate twice the amount of machine data that describes that work.

This exponential growth has put existing dashboarding technologies under an insurmountable strain. Some of us here at Sumo Logic built previous-generation dashboards in our past lives.  From our experience we realized that an entirely new approach is required to enable real-time monitoring and dashboarding and that realization drove development of a new architecture.

First, we adopted the cloud computing paradigm. That turned a data center into an API with lim(capacity)=∞.  This enabled us to spin up and spin down additional capacity truly on demand with a single API call.  Then we built our Streaming Query Engine that leverages that capacity in an elastic manner.  It continuously takes data off the wire and computes results before the data ever hits its permanent resting place.  This “one-time” computing is more efficient and less costly than traditional recompute methods.   When you view a Sumo Logic Dashboard, you simply attach to the existing state, which is continuously computed by our Stream Query Engine in the background.  What you get is freshest data available instantly enabling real-time visibility into your infrastructure or applications.  And they are beautiful to boot. 

Try it for yourself.

Securing the Enterprise Cloud – SOC 2 Compliance

10.16.2012 | Posted by Bruno Kurtic, Founding Vice President of Product and Strategy

In our earlier post, Cloudy Compliance Part 1, we discuss general standards, regulations and some basic compliance concepts. In Part 2, we further explore the relevance of current standards and regulations, including the brief explanations of the American Institute of Certified Public Accountants (AICPA) and its Service Organization Control (SOC) reports.

Today we officially announced the successful completion of our SOC 2 Type 1 examination. Based on Trust Services Principles and Criteria, SOC 2 relates to enterprise-grade assurance, management and confidentiality capabilities.  It’s a significant validation for Sumo Logic, and further proof of the enterprise readiness of our cloud-based log management and analytics service.

What the announcement means to you
As part of SOC 2 examination, Sumo Logic received evaluations which reviewed control confidentiality and integrity of customer’s log data and other machine data in the following three, key areas:

  • Security – The system is protected against unauthorized access (both physical and logical).
  • Availability – The system is available for operation and use as committed or agreed.
  • Confidentiality – Information designated as confidential is protected as committed or agreed.

… Continue Reading

Twitter