Blog › Amazon Web Services

Universal Collection of Machine Data

04.18.2013 | Posted by Sanjay Sarathy, CMO

Customers love flexibility, especially if that flexibility drives additional business value.  In that vein, today we announced an expansion of our log data collection capabilities with our hosted HTTPS and Amazon S3 collectors that eliminate the need for any local software installation.  There may be a variety of reasons why you don’t want or can’t have local collectors  - for example, not having access to the underlying infrastructure as often happens with Infrastructure-As-A-Service (IaaS) environments.  Or you simply don’t feeling like deploying any local software into your current infrastructure. Defining these hosted collectors is now baked into the set-up process, whether you’re using Sumo Logic Free or our Enterprise product.    

 

 

With these new capabilities, companies can now unify how they collect and analyze log data generated from private clouds, public clouds, and their on-premise infrastructure.  They can then apply our unique analytics capabilities like LogReduce to generate insight across every relevant application and operational tier.

With companies increasingly moving towards the Cloud to power different parts of their business, it’s imperative that they have the necessary means to troubleshoot and monitor their diverse infrastructure.  Sumo Logic provides that flexibility.

Pardon me, have you got data about machine data?

01.31.2013 | Posted by Bruno Kurtic, Founding Vice President of Product and Strategy

I’m glad you ask, I just might.  In fact, we started collecting data about machine data some 9 months ago when we participated at the AWS Big Data conference in Boston.  Since then we continued collecting the same data at a variety of industry show and conferences such as VMworld, AWS re: Invent, Velocity, Gluecon, Cloud Slam, Defrag, DataWeek, and others.

The original survey was printed on my home printer, 4 surveys per page, then inexpertly cut with the kitchen scissors the night before the conference – startup style, oh yeah!  The new versions made it onto a shiny new iPad as an IOS App.  The improved method, Apple caché, and a wider reach gave us more than 300 data points and, incidentally, cost us more than 300 Sumo Logic T-Shirts which we were more than happy to give up in exchange for data.  (btw, if you want one come to one of our events, next one coming up will be the Strata Conference).  

As a data junkie, I’ve been slicing and dicing the responses and thought that end of our fiscal year could be the right moment to revisit it and reflect on my first blog post on this data set.

Here is what we asked:

  • Which business problems do you solve by using machine data?
  • Which tools do you use to analyze machine data in order to solve those business problems?
  • What issues do you experience solving those problems with the chosen tools?

The survey was partially designed to help us to better understand the Sumo Logic’s segment of IT Operations Management or IT Management markets as defined by Gartner,  Forrester, and other analysts.  I think that the sample set is relatively representative.  Responders come from shows with varied audiences such as developers at Velocity and GlueCon, data center operators at VMworld, and folks investigating a move to the cloud at AWS re: Invent and Cloud Slam.  Answers were actually pretty consistent across the different “cohorts”.  We have a statistically significant number of responses, and finally, they were not our customers or direct prospects.  So let’s dive in and see what we’ve got and let’s start at the top:

Which business problems do you solve by using logs and other machine data?

  • Applications management, monitoring, and troubleshooting (46%)
  • IT operations management, monitoring, and troubleshooting (33%)
  • Security management, monitoring, and alerting (21%)

Does anything in there surprise?  I guess it depends on what your point of reference is.  Let me compare it to the overall “IT Management” or “IT Operations Management” market.  The consensus(if such a thing exists) is that size by segment is:

  • IT Infrastructure (servers, networks, etc) is up to 50-60% of the total market
  • Application (internal, external, etc.) is just north of 30-40%
  • Security is around 10%

Source: Sumo Logic analysis of aggregated data from various industry analysts who cover IT Management space.

There are a few things that could explain the big difference between how much our subsegment leans more toward Applications vs. IT infrastructure.  

  • (hypothesis #1) analysts measure total product sold to derive the market size which might not be the same as effort people apply to these use cases.  
  • (hypothesis #2) there is more shelfware in IT Infrastructure which overrepresented effort.  
  • (hypothesis #3) there are more home-grown solutions in Application management which underrepresents effort.  
  • (hypothesis #4) our data is an indicator or a result of a shift in the market (e.g., when enterprises shift toward the IaaS, they spend less time managing IT Infrastructure and shift more toward the core competency, their applications).  
  • (obnoxious hypothesis #5) intuitively, it’s the software stupid – nobody buys hardware because they love it, it exists to run software (applications), and we care more about applications, and that’s why it is so.

OK, ok, let’s check the data to see which hypothesis can our narrow response set help test/validate.  I don’t think our data can help us validate hypothesis #1 or hypothesis #2.  I’ll try to come up with additional survey questions that will, in the future, help test these two hypotheses.  

Hypothesis #3 on the other hand might be partially testable.  If we compare responses from users who use commercial vs. who use home-grown, we are left with the following:

Not a significant difference between responders who use commercial vs. responders who use home grown tools.  Hypothesis #3 explains only a couple of percentage points of difference.  

Hypothesis #4 – I think we can use a proxy to test it.  Let’s assume that responders from VMworld are focused on internal data center and the private cloud.  In this case they would not be relying as much on IaaS providers for IT Infrastructure Operations.  On the other hand, let’s also assume that AWS, and other cloud conference attendees are more likely to rely on IaaS for IT Infrastructure Operations.  Data please:

Interesting, seems to explain some shift between security and infrastructure, but not applications.  So, we’re left with:

  • hypothesis #1 – spend vs. reported effort is skewed – perhaps
  • hypothesis #2 – there is more shelfware in IT infrastructure – unlikely
  • obnoxious hypothesis #5 – it’s the software stupid – getting warmer

That should do it for one blog post.  I’ve barely scratched the surface by stopping with the responses to the first question.  I will work to see if I can test the outstanding hypotheses and, if successful, will write about the findings.  I will also follow-up with another post looking at the rest of the data.  I welcome your comments and thoughts.

While you’re at it, try Sumo Logic for free.

Why I joined Sumo Logic and Moved to Silicon Valley

01.28.2013 | Posted by Ben Newton, Corporate Sales Engineering Manager

Entering StartUP

We make hundreds of decisions every day, mostly small ones, that are just part of life’s ebb and flow. And then there are the big decisions that don’t merely create ripples in the flow of your life - they redirect it entirely. The massive, life-defining decisions like marriage and children; the career-defining decisions like choosing your first job after college. I’ve had my share of career-defining decisions – leaving a physics graduate program to chase after the dot com craze, leaving consulting for sales engineering, etc. The thing about this latest decision is that it combines both. I am joining Sumo Logic, leaving behind a safe job in marketing, and moving to Silicon Valley – away from my friends, family, and community. So, why did I do it? 

 

Now is the time for Start-Ups in Enterprise Software. 

Consumer start-ups get all the press, but the enterprise startups are where the real action is. The rash of consolidations in the last five years or so has created an innovation gap that companies like Sumo Logic are primed to exploit.  The perfect storm of cloud computing, SaaS, Big Data, and DevOps/Agile is forcing customers to start looking outside of their comfort zones to find the solutions they need. Sumo Logic brings together all of that innovation in a way that is too good to not be a part of it.

The Enterprise SaaS Revolution is Inevitable.

The SaaS business model, combined with Agile development practices, is completely changing the ways companies buy enterprise software. Gartner sees companies replacing legacy software with SaaS more than ever. The antiquated term-licenses of on-premise software with its massive up-front costs, double digit maintenance charges, and “true-ups” seem positively barbaric by comparison to the flexibility of SaaS. And crucially for me, Sumo Logic is also one of the few true SaaS companies that is delving into the final frontier of the previously untouchable data center. 

Big Data is the “Killer App” for the Cloud.
“Big Data” analytics, using highly parallel-ized architectures like Hadoop or Cassandra, is one of the first innovations in enterprise IT to truly be “born in the cloud”. These new approaches were built to solve problems that just didn’t exist ten, or even five, years ago. The Big Data aspect of Sumo Logic is exciting to me. I am convinced that we are only scratching the surface of what is possible with Sumo Logic’s technology, and I want to be there on the bleeding edge with them.

Management Teams Matter.
When it really comes down to it, I joined Sumo Logic because I have first-hand knowledge of the skills that Sumo Logic’s management team brings to the table. I have complete confidence in Vance Loiselle’s leadership as CEO, and Sumo Logic has an unbeatable combination of know-how and get-it-done people . And clearly some of the top venture capital firms in the world agree with me. This is a winning team, and I like to win!

Silicon Valley is still Nirvana for Geeks and the best place for Start-Ups.
Other cities are catching up, but Silicon Valley is still the best place to start a tech company. The combination of brainpower, money, and critical mass is just hard to beat. On a personal level I have resisted the siren call of San Francisco Bay Area for too long. I am strangely excited to be in a place where I can wear my glasses as a badge of honor, and discuss my love for gadgets and science fiction without shame. Luckily for me, I am blessed with a wife that has embraced my geek needs, and supports me whole heartedly (and a 21-month-old who doesn’t care either way). 

So, here’s to a great adventure with the Sumo Logic team, to a new life in Silicon Valley, and to living on the edge of innovation. 

P.S.  If you want to see what I am so excited about, get a Sumo Logic Free account and check it out. 

AWS re:Invent – The future is now

12.05.2012 | Posted by Stefan Zier, Cloud Infrastructure Architect

This past week, several of us had the pleasure of attending Amazon Web Service’s inaugural re:Invent conference in Las Vegas. In the weeks leading up to the conference, it wasn’t fully obvious to me just how big this show was going to be (not for lack of information, but mostly because I was focused on other things).

When I picked up my badge in the Venetian and walked through the enormous Expo hall, it struck me: IaaS, the cloud, is no longer a bleeding edge technology used by a few daring early adopters. The economics and flexibilities afforded are too big to be ignored – by anyone.

Attending sessions and talking to customers showed that, more than ever before, application architectures are distributed. The four design principles outlined in Werner Vogels excellent keynote on day 2 of the conference – Werner calls them “The Commandments of 21st Century Architectures” made it obvious that the cloud requires people to build their applications in fundamentally different ways from traditional on-premise applications.

While at the conference, I spent quite some time at the Sumo Logic booth, explaining and demoing our solutions to customers. Most of them run distributed systems in AWS, and it never took more than 2 minutes for them to realize why their lives would be much easier with a log management solution — having a central tool to collect and quickly analyze logs from a large distributed app is essential to troubleshooting, monitoring and optimizing an app.

Once I started to explain how our product is architected, and started relating to the architecture principles Werner outlined, most people understood why Sumo Logic’s product can scale and perform the way it does, unlike some other “cloud washed” solutions on the market.

In addition to having a highly relevant set of attendees for us, the conference also was a great place to find other vendors who exist in the same ecosystem – we’ve begun several good conversations about integrations and partnerships.

The response from people we talked to was overwhelmingly positive. During the conference, we’ve seen a big increase in sign-ups for our Sumo Logic Free product. We will definitely be back next year.

Pragmatic AWS: 3 Tips to enhance the AWS SDK with Scala

07.12.2012 | Posted by Stefan Zier, Cloud Infrastructure Architect

At Sumo Logic, most backend code is written in Scala. Scala is a newer JVM (Java Virtual Machine) language created in 2001 by Martin Odersky, who also co-founded our Greylock sister company, TypeSafe. Over the past two years at Sumo Logic, we’ve found Scala to be a great way to use the AWS SDK for Java. In this post, I’ll explain some use cases. 

1. Tags as fields on AWS model objects

Accessing AWS resource tags can be tedious in Java. For example, to get the value of the “Cluster” tag on a given instance, something like this is usually needed: 

   String deployment = null;
   for (Tag tag : instance.getTags()) {
     if (tag.getKey().equals(“Cluster”)) {
       deployment = tag.getValue();
     }
   }

While this isn’t horrible, it certainly doesn’t make code easy to read. Of course, one could turn this into a utility method to improve readability. The set of tags used by an application is usually known and small in number. For this reason, we found it useful to expose tags with an implicit wrapper around the EC2 SDK’s Instance, Volume, etc. classes. With a little Scala magic, the above code can now be written as:

val deployment = instance.cluster

Here is what it takes to make this magic work:

object RichAmazonEC2 {
 implicit def wrapInstance(i: Instance) = new RichEC2Instance(i)
}

class RichEC2Instance(instance: Instance) {
 private def getTagValue(tag: String): String =
   tags.find(_.getKey == tag).map(_.getValue).getOrElse(null)
 
 def cluster = getTagValue(“Cluster”)
}

Whenever this functionality is desired, one just has to import RichAmazonEC2._

2. Work with lists of resources

Scala 2.8.0 included a very powerful new set of collections libraries, which are very useful when manipulating lists of AWS resources. Since the AWS SDK uses Java collections, to make this work, one needs to import collections.JavaConversions._, which transparently “converts” (wraps implicitly) the Java collections. Here are a few examples to showcase why this is powerful: 

Printing a sorted list of instances, by name:
ec2.describeInstances(). // Get list of instances.
 getReservations.                  
 map(_.getInstances).
 flatten.                          // Translate reservations to instances.
 sortBy(_.sortName).               // Sort the list.
 map(i => “%-25s (%s)”.format(i.name, i.getInstanceId)). // Create String.
 foreach(println(_))               // Print the string.

Grouping a list of instances in a deployment by cluster (returns a Map from cluster name to list of instances in the cluster):
ec2.describeInstances().            // Get list of instances.
 filter(_.deployment = “prod”).    // Filter the list to prod deployment.
 groupBy(_.cluster)                // Group by the cluster.

You get the idea – this makes it trivial to build very rich interactions with EC2 resources.

3. Add pagination logic to the AWS SDK

When we first started using AWS, we had a utility class to provide some commonly repeated functionality, such as pagination for S3 buckets and retry logic for calls. Instead of embedding functionality in a separate utility class, implicits allow you to pretend that the functionality you want exists in the AWS SDK. Here is an example that extends the AmazonS3 class to allow listing all objects in a bucket: 

object RichAmazonS3 {
 implicit def wrapAmazonS3(s3: AmazonS3) = new RichAmazonS3(s3)
}

class RichAmazonS3(s3: AmazonS3) {
 def listAllObjects(bucket: String, cadence: Int = 100): Seq[S3ObjectSummary] = {

   var result = List[S3ObjectSummary]()

   def addObjects(objects: ObjectListing) = result ++= objects.getObjectSummaries

   var objects = s3.listObjects(new ListObjectsRequest().withMaxKeys(cadence).withBucketName(bucket))
   addObjects(objects)

   while (objects.isTruncated) {
     objects = s3.listNextBatchOfObjects(objects)
     addObjects(objects)
   }

   result
 }
}

To use this:

val objects = s3.listAllObjects(“mybucket”)

There is, of course a risk of running out of memory, given a large enough number of object summaries, but in many use cases, this is not a big concern.

Summary

Scala enables programmers to implement expressive, rich interactions with AWS and greatly improves readability and developer productivity when using the AWS SDK. It’s been an essential tool to help us succeed with  AWS.

Security-Gain without Security-Pain

06.21.2012 | Posted by Stefan Zier, Cloud Infrastructure Architect

As Joan mentioned, we use SaaS products a lot here at Sumo Logic. On an average day, I log into sites on the internet tens or even hundreds of times, supplying a username and password each time. The proliferation of usernames and passwords creates three issues:

  • Password hygiene. In this day and age, it is reckless  to reuse passwords across sites. At the same time, it is impossible to remember arbitrarily many unique passwords.
  • Strength. With rainbow tables and the tumbling cost of compute power, passwords need to be increasingly long and complex.
  • Efficiency. I shouldn’t have to spend half my day logging into sites.

What we need are tools that:

  • Encourage you to use different passwords everywhere.
  • Are secure, ideally using two factors of authentication.
  • Require the least number of keystrokes or mouse actions to get past login screens.

Here are a few tools we use and like.

1Password

1Password is the password manager most of us use. It stands out from many other password managers in several ways:

  • It is a native Mac application and has excellent integration. There is a version for Windows.
  • Support for iOS and Android.
  • Well-implemented sync via Dropbox, including for iOS.
  • Plugins for the 3 major browsers (Safari, Chrome, Firefox).
  • Keyboard compatible.

One of the major benefits of 1Password is that it’s designed to stay out of your way. To log into a site:

  • Without 1Password, I enter the URL in the address bar, navigate to the login form. Then, I enter my login, then my password. A lot of typing.
  • With 1Password, I enter 1pinto the address bar, start typing the site’s name to select from the list and hit enter. Then, I watch 1Password log me into the site.

Properly used, 1Password can be regarded as a one and a half factor authentication solution. There’s a great discussion on Agile Bits blog. We’ll share some power user tips on 1Password in the near future.

IronKeys

IronKeys are cool toys. They’re USB sticks with “spook-grade” crypto and self-destruction capabilities. We issue every developer an IronKey for the storage of all key files, such as ssh private keys and AWS credential files. Aside from being geek-chic, the IronKeys offer two benefits:

  • The key files are only exposed while the IronKey is plugged in and mounted. Not when people are at Starbucks browsing the web.
  • If an IronKey is ever lost, we can remote-detonate them. The minute they get plugged into a USB port, the software on the IronKey phones home and gets a self destruct signal. This requires an internet connection, but we’ve configured IronKeys to not unlock without one.  

OATH (Google Apps and AWS)

Retired Hardware MFA Tokens

Retired Hardware MFA Tokens

Google’s Two-Step Verification and Amazon Web Service’s MFA both use the OATH open architecture, not to be confused with OAuth. OATH is a software replacement for traditional hardware-based two-factor authentication tokens. 

Google offers open sourced client applications for iOS and Android that serve as the second factor of authentication. This reduces clutter, since you don’t need to carry any hardware tokens. Having the phone be your token also makes it more likely that you have your token with you most of the time.

Google has also taken several steps to remove friction:

  • To set up your phone, you simply scan a QR code form the screen.
  • After the first two factor authentication with your phone, you can check a box “Remember me for 30 days”. The browser cookie then serves as your second factor of authentication.

AWS initially only supported classical hardware MFA tokens. To make matters worse, one MFA token couldn’t be shared across multiple AWS accounts. More recently, they’ve also added support for OATH. In fact, the same Google Authenticator apps work for AWS, as well.

Wrapping up

Traditional two-factor authentication approaches based on hardware tokens are painful to use. OATH, 1Password and IronKeys strengthen security without adding too much pain to people’s lives.

Pragmatic AWS: Principle of Least Privilege with IAM

06.12.2012 | Posted by Stefan Zier, Cloud Infrastructure Architect

Lock and Chain - by Martin Magdalene

One of the basic principles in information security is the Principle of Least Privilege. The idea is simple: give every user/process/system the minimal amount of access required to perform its tasks. In this post, I’ll describe how this principle can be applied to applications running in a cluster of EC2 instances that need access to AWS resources. 

What are we protecting?

The AWS Access Key ID and Secret are innocent looking strings. I’ve seen people casually toss them around scripts and bake them into AMIs. When compromised, however, they give the attacker full control over all of our resources in AWS. This goes beyond root access on a single box – it’s “god mode” for your entire AWS world! Needless to say, it is critical to limit both the likelihood of a successful attack and the exposure in case of a successful attack against one part of your application.

Why do we need to expose AWS credentials at all?

Since our applications run on EC2 instances and access other AWS services, such as S3, SQS, SimpleDB, etc, they need AWS credentials to run and perform their functions.

Limiting the likelihood of an attack: Protecting AWS credentials

In an ideal world, we could pass the AWS credentials into applications without ever writing them to disk and encrypt them in application memory. Unfortunately, this would make for a rather fragile system – after a restart, we’d need to pass the credentials into the application again. To enable automated restarts, recovery, etc., most applications store the credentials in a configuration file.

There are many other methods for doing this. Shlomo Swidler compared tradeoffs between different methods for keeping your credentials secure in EC2 instances.

At Sumo Logic, we’ve picked what Shlomo calls the SSH/On Disk method. The concerns around forgetting credentials during AMI creation don’t apply to us. Our AMI creation is fully automated, and AWS credentials never touch those instances. The AWS credentials only come into play after we boot from the AMI. Each application in our stack runs as a separate OS user, and the configuration file holding the AWS credentials for the application can only be read by that user. We also use file system encryption wherever AWS credentials are stored.

To add a twist, we obfuscate the AWS credentials on disk. We encrypt them using a hard-coded, symmetric key. This obfuscation, an additional Defense-in-Depth measure, makes it a little more difficult to get the plain text credentials in the case of instance compromise. It also makes shoulder surfing much more challenging. 

Limiting exposure in case of a successful attack: Restricted access AWS credentials

Chances are that most applications only need a very small subset of the AWS portfolio of services, and only a small subset of resources within them. For example, an application using S3 to store data will likely only need access to a few buckets, and only perform limited set of operations against them.

AWS’s IAM service allows us to set up users with limited permissions, using groups and policies. Using IAM, we can create a separate user for every application in our stack, limiting the policy to the bare minimum of resources/actions required by the application. Fortunately, the actions available in policies directly correspond to AWS API calls, so one can simply analyze which calls an application makes to the AWS API and derive the policy from this list.

For every application-specific user, we create a separate set of AWS credentials and store them in the application’s configuration file.

In Practice – Automate, automate, automate!

If your stack consists of more than one or two applications or instances, the most practical option for configuring IAM users is automation. At Sumo Logic, our deployment tools create a unique set of IAM users. One set of users per deployment and one user per application within the deployment. Each user is assigned a policy that restricts access to only those of the deployments resources that are required for the application.

If the policies changes, the tools update them automatically. The tools also configure per-application OS level users and restrict file permissions for the configuration files that contain the AWS credentials for the IAM user. The configuration files themselves store the AWS credentials as obfuscated strings.

One wrinkle in this scheme is that the AWS credentials created for the IAM users need to be stored somewhere after their initial creation. After the initial creation of the AWS credentials, they can never be retrieved from AWS again. Since many of our instances are short-lived, we needed to make sure we could use the credentials again later. To solve this particular issue, we encrypt the credentials, then store them in SimpleDB. The key used for this encryption does not live in AWS and is well-protected on hardware tokens.   

Summary

It is critical to treat your AWS credentials as secrets and assign point-of-use specific credentials with minimal privileges. IAM and automation are essential enablers to make this practical.  

Update (6/12/2012): AWS released a feature named IAM Roles for EC2 Instances today. It makes temporary a set of AWS credentials available via instance metadata. The credentials are rotated multiple times a day. IAM Roles add a lot of convenience, especially in conjunction with the AWS SDK for Java.

Unfortunately, this approach has an Achilles heel: any user with access to the instance can now execute a simple HTTP request and get a valid set of AWS credentials. To mitigate some of the risk, a local firewall, such as iptables, can be used to restrict HTTP access to a subset of users on the machine.

Comparing the two approaches 

+ User privileges and obfuscation offer a stronger defense in scenarios where a single (non-root) user is compromised.
+ Per-application (not per-instance) AWS credentials are easier to reason about.
- The rotation of IAM keys performed transparently by IAM roles adds security. An attacker has to maintain access to a compromised machine to maintain access to valid credentials.

Best of Both Worlds

AWS’s approach could be improved upon with a small tweak: Authenticate access to the temporary/rotating credentials T in instance metadata using another pair of credentials A. A itself would not have any privileges other than accessing T from within an instance. This approach would be a “best of both worlds”. Access to A could be restricted using the methods described above, but keys would still be rotated on an ongoing basis.   

Pragmatic AWS: Data Destroying Drones

06.05.2012 | Posted by Stefan Zier, Cloud Infrastructure Architect

As we evolve our service, we occasionally delete EBS (Elastic Block Store) volumes. This releases the disk space back to AWS to be assigned to another customer. As a security precaution, we have decided to perform a secure wipe of the EBS volumes. In this post, I’ll explain how we implemented the wipe.

Caveats

Wiping EBS volumes may be slightly paranoid and not strictly needed, since AWS guarantees to never return a previous users data via the hypervisor (as mentioned in their security white paper). We also understand that the secure wipe is not perfect. EBS is able to move our data around in the background and leave back blocks that we didn’t wipe. Still, we felt that this additional precaution was worth the bit of extra work and cost – better safe than sorry.

Drones

We wanted to make sure secure wiping did not to have any performance impact on our production deployment. Therefore, we decided that it would be great to perform the secure wipe from a different set of AWS instances — Data Destroying Drones. We also wanted them to be fire-and-forget, so we wouldn’t have to manually check up on them.

To accomplish all this, we built a tool that:

  1. Finds to-be-deleted EBS volumes matching a set of tag values. (we tag the volumes to mark them for wiping).
  2. Launches one t1.micro instance per EBS volume that needs wiping (using an Ubuntu AMI).
  3. Passes a cloud-init script with Volume ID and (IAM limited) AWS credentials into the instance.

The Gory Details

Ubuntu has a mechanism named cloud-init. It accepts a shell script via EC2’s user data, which is passed in as part of the RunInstances API call to EC2. Here is the script we use for the Data Destroying Drones:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
#!/bin/bash
set -e
export INSTANCE_ID=`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`
export VOLUME_ID=v-12345678
export EC2_URL=https://ec2.us-east-1.amazonaws.com
export EC2_ACCESS_KEY=[key id]
export EC2_SECRET_KEY=[key]
 
sudo apt-get install scrub
euca-attach-volume -i $INSTANCE_ID -d /dev/sdj $VOLUME_ID
sudo scrub -b 50M -p dod /dev/sdj > ~/sdj.scrub.log 2>&1
sleep 30
 
euca-detach-volume $VOLUME_ID
euca-delete-volume $VOLUME_ID
halt

This script automates the entire process:

  1. Attach the volume.
  2. Perform a  DoD 5220.22-M secure wipe of the volume using scrub.
  3. Detach and delete the volume.
  4. Halt the instance.

The instances are configured to terminate on halt, which results in all involved resources to disappear once the secure wipe completes. The scrub can take hours or even days, depending on the size of the EBS volumes, but the cost for the t1.micro instances makes this a viable option. Even if the process takes 48 hours, it costs less than $1 to wipe the volume.

Summary

Aside from being a fun project the Data Destroying Drones have given us additional peace of mind and confidence that we’ve followed best practice and made a best effort to secure our customers data by not leaving any of it behind in the cloud.

Sumo Logic at AWS Big Data Boston

05.29.2012 | Posted by Bruno Kurtic, Founding Vice President of Product and Strategy

I recently represented Sumo Logic at the AWS Big Data conference in Boston.  It was a great show, very well-attended.  Sumo Logic was one of the few vendors invited to participate.

During the conference I conducted a survey of the attendees to try to understand how this, emerging early-adopter segment of IT professionals,  manages log data for their infrastructure and applications.  

Common characteristics of attendees surveyed:

  • They run their apps and infrastructure in the cloud
  • They deal with large data sets
  • They came to learn how to better exploit/leverage big data and cloud technologies

What I asked:

  • Do you use logs to help you in your daily work, and if so, how?
  • What types of tools do you use for log analysis and management?
  • What are the specific pain points associated with your log management solutions?

The findings were interesting.  Taking each one in turn:  

No major surprises here.  Enterprises buy IaaS in order to run applications, either for burst capacity or because they believe it’s the wave of the future.  The fact that someone else manages the infrastructure does not change the fact that you have to manage and monitor your applications, operating systems, and virtual machines.


A bit of a surprise here.  In my previous analysis, some 45% of enterprises use homegrown solutions, but in this segment it’s 70%.  Big difference with the big data and cloud crowd.  A possible explanation for this is that existing commercial solutions are not easy to deploy and run in the cloud and don’t scale to handle big data.  So, the solution = build it yourself.  Hmm.

Yes, yes, I know, it adds up to more than 100%.  That’s because the question was stated as “select as many as apply” and many respondents have more than one problem.  So, nothing terribly interesting in there.  But let me dig a bit deeper into issues associated with homegrown vs. commercial.

 

This makes a bit more sense.  For the home grown, it looks like complexity is the biggest pain – which makes sense.  Assembling together huge systems to support big volumes of log data is more difficult than many people anticipate.  Hadoop and other similar solutions are not optimized to simply and easily deliver answers.  This then leads to the next pain point:  if it is not easy to use, then you don’t use it = does not deliver enough value.  

The responses on commercial solutions make sense as well.  Today’s commercial products are expensive and hard to operate.  On top of the sticker price, you have to spend precious employee time to perform frequent software upgrades and implement “duct tape” scaling.  If you don’t have expertise internally you buy it from vendors’ professional services at beaucoup $$$$$.  You have to get your own compute and storage, which grow as your data volume grows.  So, commercial “run yourself” solutions = very high CAPEX (upfront capital expenditures) and OPEX (ongoing operational expenditures).  In the end (as the second pain point highlights), commercial solutions are also complex to operate and hard to use, requiring highly skilled and hard to find personnel.

Pretty bleak – what now?
At Sumo Logic, we think we have a solution.  The pain points associated with home-grown and commercial solutions that were architected in the last decade are exactly what we set out to solve. We started this company after building, selling and supporting the previous generation of log management and analysis solutions.  We’ve incorporated our collective experience and customer feedback into Sumo Logic.

Built for the cloud
The Sumo Solution is fundamentally different from anything else out there.  It is built for big data and is “cloud native”.  All of the complexities associated with deploying, managing, upgrading, and scaling are gone – we do all that for you.  Our customers get a simple-to-use web application, and we do all the rest.

Elastic scalability
Our architecture is true cloud, not a “cloud-washed” adaptation of on-premise single-instance software solutions that are trying to pass themselves off as cloud.  Each of our services are separate and can be scaled independently.  It takes us minutes to triple the capacity of our system.

Insights beyond your wildest dreams
Because of our architecture, we are able to build analytics at scale.  Our LogReduce™ and Push Analytics™ uncover things that you didn’t even know you should be paying attention to.  The whole value proposition is turned on its head – instead of having to do all the work yourself, our algorithms do the work for you while you guide them to get better over time.

Come try it out and see for yourself: https://www.sumologic.com/free-trial/

Pragmatic AWS: 4 Ideas for using EC2 Tags

05.15.2012 | Posted by Stefan Zier, Cloud Infrastructure Architect

At Sumo Logic, we use Amazon Web Services (AWS) for everything. Our product, as well as all our internal infrastructure live in AWS. In this series of posts, we’ll share some useful practices around using AWS. In the first installment, I’ll outline some useful things we do with tags

1. Organize resources

We’ve decided on a hierarchical way of managing our EC2 (Elastic Compute Cloud) resources:

Deployment
 + Cluster
   + Instance/Node

Within an AWS account, we can have multiple “deployments”. A deployment is a complete, independent copy of our product and uses the same architecture as our production service. Besides production, we use several smaller-scale deployments for development, testing and staging. Each deployment consists of a number of clusters, and each cluster of one or more instances.

Instances and their corresponding EBS (Elastic Block Store) volumes are tagged with Deployment, Cluster and Node tags. As an example, the third frontend node of our production deployment would be tagged like so:

Deployment=prod
Cluster=frontend
NodeNumber=3

There is also a direct mapping to DNS names. The DNS name for this node would be prod-frontend-3.

Combined with the filtering features in AWS Console (you can make any tag a column in the resource listings), this makes it very easy to navigate to a particular set of resources.

2. Display Instance Status

Tags can also be used as an easy way to display status information in the AWS console. Simply update a tag with the current status, whenever it changes.

The code that deploys our instances into EC2 updates a DeployStatus tag whenever it progresses from one step to another. For example, it could read:

2012-05-10 17:53 Installing Cassandra

This allows you to see what’s going on with instances at a glance.

3. Remember EBS Volume Devices

For EC2 instances that have multiple EBS volumes, when they need to be attached, our tools need to know which volume gets mapped to which device on the instance.

When we first create a volume, for example /dev/sdj, we create add a DeviceName tag to the volume with a value of /dev/sdj to track where it needs to be attached. Next time we attach the volume, we know it’s “proper place”.

4. Attribute and remind people of costs

All our developers are empowered to create their own AWS resources. This is a huge benefit for full-scale testing, performance evaluations, and many other use cases. Since AWS is not a charity, however, we need to manage costs tightly. In order to do this, we tag all AWS resources with an Owner tag (either by hand, or via our automated deployment tool).

To consume this tag, we have a cron job that runs daily and emails users who have active resources in AWS to remind them to shut down what they no longer require.

The subject line of the email reads “[AWS] Your current burn rate is $725.91/month!”. The body of the email contains a table with a more detailed cost breakdown. In addition, there is also a rollup email that goes out to the entire development team.

 

Summary

EC2 tags are extremely useful tools to track state, organize resources and store relationships between resources like instances and EBS volumes. There are a myriad more ways to use them. I hope these tips have been helpful.

Twitter