A Secure Cloud-Based Intranet

04.26.2012 | Posted by Joan Pepin, Director of Security

For the last two years, Sumo Logic has been (quietly) building a secure, massively scalable, multi-tenant data management and analytics platform in the cloud. For us at Sumo Logic, the Cloud is a concept we believe in and have internalized deeply into our culture, our processes and infrastructure.  In our office we only own two equipment racks, and they are less than half full. The boxes there are our back-up server, some security gear, a VOIP box, and a single small server to provide network and AAA services for the LAN. We have ‘dogfooded’ not only our own product here (we make extensive use of our own product for troubleshooting and operations, see Stefan Zier’s series “Sumo on Sumo”), but the entire idea of the cloud itself. From our email and build environment, through our CRM and our product itself, we live in the Cloud. Through adopting best practices and developing some of our own we operate there in a way that is designed to be secure, and I’d like to share some of the insights we’ve picked up along the way.

Of course, the “Cloud” is a nebulous term ;) and here at Sumo Logic we use several different types of cloud-based services, which mostly break down into two categories; SaaS and IaaS. On the SaaS side, we have our email and CRM, testing, support and billing and as well as a number of services we use to monitor and alert on our service availability, and on the IaaS side, we use AWS to host our build environment and its associated bug-tracker, wiki and code-repository. One of the many advantages to this model is exemplified by our build environment (Hudson). Hosting this in EC2 provides us with great flexibility in bringing up new build-slaves at peak times, such as before a major release or branch. 

In general, the SaaS providers we use provide excellent security features. For instance, we mandate the use of two-factor authentication and strong passwords for access to Sumo Logic email, and our provider has a rich variety of security controls and features such as the two-factor authentication that we can (and do) leverage. This is much simpler than keeping this level of security would be if we ran the whole mess ourselves :)

On the IaaS side, Stefan Zier has done an amazing job of setting us up in AWS. One example of how IaaS features can be leveraged is the way in which he handled access to our AWS hosted resources. In addition to username and password authentication to our cloud based services (more on that later) Amazon “security groups” are used to limit network-level access to these services to only certain IP addresses on a whitelist. In order to handle the automation of that whitelist, we make use of a dynamic DNS provider that assigns hostnames to authorized systems. Stefan wrote a program which polls for the addresses of those authorized hosts and updates the corresponding security group in AWS. We plan on getting this set up on AWS’ Virtual Private Cloud sometime soon, which will allow us to layer a VPN on top of this already very secure solution.

Another layer of protection we incorporate is anonymity. All of our cloud-based company infrastructure is attached to a domain which is not connected to Sumo Logic in any way. Similarly, we have used anonymized labels for our private git repositories, etc. The public cloud allows for some obscurity and anonymity, and we leverage that.

 Of course, there is still work involved in keeping things secure. Living in the cloud means having a lot of accounts. A LOT of them. Our process for on-boarding and off-boarding employees requires the creation or deletion of a very large number of accounts and the adding or removing of a lot of tags, groups, lists and checkboxes. Having solid documented procedures for this is the only way to keep it straight and running smoothly. We also have to host our own LDAP server for AAA to some of our tools, and we also use this for our VPN authentication. So we have to manage that, and it is a pain. Centralized AAA and policy/group management services exist for cloud-based services, and we’ve looked at some. Unfortunately, none of them also supported hosting or managing an LDAP instance for us, and keeping that synced up and tied-into the rest of the mess would be a killer feature. We certanly feel there is a gap in the market here that we wish somebody would fill.

From an end-user perspective, there are a lot of accounts to keep straight and a lot of passwords to remember. In order to make this both secure and manageable, we provide (and mandate the use of) a password management tool that runs both Mac and Windows (and has a useable web-interface for Linux and others) and also runs on Android and iPhone. It uses a cloud-based file-storage service to sync its encrypted password database between devices. This allows us to mandate that users have extremely strong passwords that are different for every account and it gives our users the tools to actually comply with that rule :)

 Of course, building a secure cloud-based service ourselves requires a lot of thought and engineering well beyond just leveraging our provider’s consoles. We have done a lot of thought about how to build a secure service leveraging IaaS and we have written a paper about some of the design principles and practices we employ. If you are interested you can download it here.

The Precursor Legacy

04.24.2012 | Posted by Christian Beedgen, Co-Founder & CTO

This past week has seen the long-awaited Splunk IPO turn into a reality. After nearly 10 years together at ArcSight, Kumar and I were along for the ride in 2008 when ArcSight went public. We know on a very deep level how hard it is for any company to reach this milestone. Our hats are off to Splunk for their precision in positioning and timing. The resulting positive reaction of the market is more than well deserved. Splunk is now the second public company that has bet the house on logs and unstructured data, and it clearly has managed to do something that ArcSight didn’t: to convince the world that logs are a powerful way to manage not just security, but also IT operations, and applications in general. After all, business has had its share of analytics tools. It’s time for IT to catch up — and we are now seeing this space having reached mainstream momentum and attention.

Another Song to Sing

As part of the press frenzy last week, a number of people have started to look into what’s next in this space. Big Data has many angles, and we firmly believe that logs and unstructured data are a huge part of it. Reuters published an overview along those lines. We also happened to have met with Jonah Kowall from Gartner last week. His thoughts can be found here. Both articles touch on our firmly held belief that evolution cannot and will not stop, and that in fact some of the biggest contributors to application, IT and security management problems contain the keys to tame and solve them.

Big River

It has long been established that the rate at which data is being produced is growing exponentially, and that almost all of that data is basically unstructured. Mapping this back to IT, it is clear that there will never be another unified and standardized set of protocols upon which to build the one and only management and analytics tool to rule them all. With the proliferation of deployment models in today’s highly heterogeneous environments, IT has to adapt to business needs in real-time. To accomplish this, the best and most detailed inputs are the operational logs generated in real-time by the IT infrastructure.

If I Had a Hammer

The key for the next generation of IT analytics products is to understand that any and all data must be considered as grist for the analytics mill. Relying on having to know the semantics of the data by requiring a pre-fabricated parser in order to use the data translates to keeping the door shut for some of the most detailed data. Going up the stack to the application layer, this is even more true. In order to provide more than just troubleshooting capabilities, even data that has never before been seen needs to be an input into the analytics engine. Meaningful aggregation and comprehension can be based on automatically inferring structure, and large-scale refereed structure inference will in turn lead to better semantic understanding of the data.

(There’ll be) Peace in the Valley

Ultimately, the power of any analytics is based on how much we know about the meaning of the data. Otherwise, the data is just that – data. Analytics turn data into information, and ultimately insight. We believe that the best way to accomplish this is by offering application, IT, and security management and analytics as a cloud-based service that can use the power of all the data to constantly improve analytics. Enterprises should embrace Big Data, and ask for analytics as a service, rather than trying to locally reinvent the wheel over and over again.

RAFC: Internet Connectivity for the cloud age

04.17.2012 | Posted by Stefan Zier, Cloud Infrastructure Architect

Sumo Logic's Mountain View Data Center

RAFC - Redundant Array of Flaky Connections

We are big believers in Cloud Computing — 100% of our own infrastructure is in the cloud. In our first office, we learned that reliable and fast internet connectivity is absolutely crucial. When all your infrastructure is in the cloud, all work grinds to a screeching halt whenever connectivity is lost. In that office, we had a single, “business class” symmetric 10MBit link. In short, it sucked.

When we moved to our new 605 Castro Street office last year, we decided to try a different approach. We took design cues from web-scale applications: Pool commodity resources. Distribute load over the pool of resources. Anticipate failures. Scale horizontally. In concrete terms: 

  • Set up multiple consumer grade internet connections.
  • Buy a router that supports multiple-WAN load balancing and failover.
  • Add more consumer grade internet connections when more bandwidth is needed.
So instead of a single symmetric 10 MBit link, we ordered:
  • 100MBit/10MBit cable modem connection (Comcast).
  • 25MBit/5MBit bonded DSL connection (Sonic.net). 

We call it ”RAFC” - or “Redundant Array of Flaky Connections”. Combined, these two connections cost around $530/mo (with free Cable TV!), or about 65% less than our previous connection, which ran $1,500/mo. Instead of 10/10MBit, we now have 125/15MBit. 

The trickiest part was finding the multi-WAN router we liked. After trying a FortiNet Fortigate box, a Cisco ASA, a Netgear “business class” box, we settled a unit made by a company called Peplink.

Peplink’s entire business is built around doing multi-WAN routers right, and it shows: the box is impressive. It’s very easy to set up, supports a rich set of features, doesn’t crash, has great monitoring capabilities (including syslog, which we feed into Sumo Logic). The Peplink still also has a 3rd WAN port for future growth — horizontal scalability. When we discovered that one of our connections is more reliable, while the other one was faster, we adjusted outbound rules on the Peplink accordingly. SSH connections use reliable connection, S3 transfers use fast one. Most other traffic is load balanced in proportion to the uplink/downlink available on each connection. When one connection fails, all traffic fails over to the other one. This took about 5 minutes to set up. 

At this point, this setup supports more than 30 users in our office, and while we have connection outages almost daily, nobody notices. Connectivity has not been an issue in months. 

Do you want to buy a Big Data zoo?

04.12.2012 | Posted by Kumar Saurabh, Co-Founder & VP of Engineering

How exciting can a discussion at 5PM on a Friday be? Very exciting, in fact, if you are talking to industry analyst Vanessa Alvarez (@vanessaalvarez1) from Forrester about Big Data.

Last Friday it turned out that we had tons to talk about together regarding recent developments in the Big Data space. Vanessa has a unique take on Big Data — she thinks “Analytics as a Service” is going to gain a lot of traction soon. And that line of thinking resonates with us a lot.

At Sumo Logic you’ll hear us using terms like Cloud, SaaS, elastic scalability… but the most exciting angle for us has always been the *aaS angle, the fact that our solution is a service. We believe that log analytics should be easy to use, and by lowering the effort it takes to perform log analytics, we can make this kind of technology much more widely accessible. A “Log Analytics as a Service” solution aims to do just that — shorten and democratize the path from data to insights.

So, the real question is not if you are Mac or PC — but rather are you a Mac or Linux guy — when it comes to log management.  The choice is — do you really want to build and tweak and operate and maintain your log management system (the big data zoo in other words), or do you just need a solution that delivers log analytics in the most efficient way possible.

We still find a lot of prospects who think that they need to roll out their own log management system using a lot of new stacks (Hadoop, Cassandra, Solr, Hive…). We use similar technologies under the hood at Sumo, but we handle all the operational overhead that comes with this, and we certainly don’t shy away from fixing and optimizing pieces that don’t work, or don’t deliver the performance we need to deliver.

So, if you do not have extremely specialized requirements, is it worth rolling out your own log management systems? Is it worth all the operational overhead? Or would you rather use a service? Curious to hear your thoughts, please feel free to share your thoughts in comments, or shoot me an email at kumar@sumologic.com

It’s a culture thing – Devopsdays Austin 2012

04.10.2012 | Posted by Christian Beedgen, Co-Founder & CTO

Stefan and I attended Devopsdays last week in Austin. It was a great event, and I am really glad we went — it’s always fun to be able to present your company to the public. We are very comfortable with the development and operations crowd, because it is largely at the core of what we are doing ourselves. There’s not a whole lot of abstractions to overcome! Sumo Logic sponsored the event, and so we had a little table set up in the “vendor” area. There, as well as throughout the conference, we had many interesting discussions, about our product, but also about the larger theme of the conference.

We gave away a lot of T-Shirts, and it turns out that the little Sumo toys we had initially made for the company birthday two weeks ago are a great giveaway. This is the first time we came equipped with swag, and it came across well. As topical as Log Analytics and Application Management are for the crowd attending, it’s still fun to see them all smile at little toys of big naked men!

Maybe my single most favorite moment of the entire conference was when the discussion turned to hiring. We are still struggling with a recovering economy and uncomfortably high unemployment numbers in this country, so it was notable that when the room was asked who’s hiring, pretty much all hands went up. Wow. Then somebody yelled out, “Hey, who needs a job”? And all hands went down. Not a single person in the room was looking for a job. In the words of @wickett on Twitter: “No recession in DevOps world”.

One of the things I personally find fascinating is to observe the formation of trends, communities, maybe even cultures. It is not often that one has the luck to be around when something new is getting born. I was personally lucky to be, albeit somewhat from afar, observing the early days of the Ruby On Rails community, having attended the first conference in Chicago (and then some more in the following years). Rails never really mattered in my day job, and I ultimately was just a bystander. But even so, seeing the thought process in the community evolve was extremely interesting. I feel a little bit similar about the Devops development (pun!!). I actually was attending that mythical gathering in Mountain View in 2010. But at the time, I was more worried about getting Sumo Logic company off the ground, so I actually didn’t pay attention :)

I was trying to listen in a bit more closely this time. A good overall summary of where Devops has come from — and what its main motivational forces are today — is available in a recent post by John Willis. John also presented the keynote kicking off the Austin event. This was a very interesting talk, as it was laying out the basic principles behind Devops as seen through the eyes of one of the main players in the movement.

Based on the keynote, here’s Devops in 5 keywords (buzzwords?): Culture – Lean – Automation – Measurement – Sharing. In that order. This leads to the following insight: Devops is a human problem — it’s a problem of culture, and it’s the cultural aspects that need to be addressed first, before even thinking about the other four principles. In other words, as great as tools such as Puppet, Chef, Github, and yes, Sumo Logic are, they can’t in themselves change a culture that is based on segregation. Or, simply put: as long as you have (process and cultural) walls between development and operations, operations and security, and security and development, you end up with people that say No. And that’s basically the end of agility.

And this leads to something that surprised me (I guess I am a bit late to the party, but hey): I am sensing that Devops is really about the desire on the side of the operations folks to apply the learnings of Agile Development. I consider this as a good thing. We are building more and more software that runs as a service, and so it’s pretty obvious that Agile needs to extend from the construction process into the deployment process (and along the way destroy the distinction). I do think that the Agile approach has won in the development world. It still needs to be applied properly however (see for example “Flaccid Scrum”), and I am sure overeager managers will cause more than one spectacular failure for Devops projects by misunderstanding the process/tools vs culture priorities. And since we are in 2012, Agile rears its head in one of its newer incarnations in this context: Lean – see above, right after Culture. Given that the name “Devops” is still hotly discussed, maybe we will end up with a new label before too long: LeanOps, anyone?

It was also great to see teams within larger companies making the leap – the best example is National Instruments (also the host of the event), who have managed to get more agile by adopting a Devops approach (see also this presentation). So in summary, this event was great fun. A lot of real people with real problems, applying real forward thinking. I felt the crowd was leaning more towards Ops vs Dev, but as I said above, at least in the context of the systems we are building here at Sumo Logic, this distinction has long been jettisoned.

And of course, people need tools. In all our discussions, the ability to manage and analyze the logs of production systems has stood out as a key contributor in allowing teams to troubleshoot and find the root causes of issues in the applications faster, and to manage their applications and infrastructure more proactively such that they can find and fix problems before they impact the customer.

Finally, in an act of shameless self promotion, here’s yours truly being interviewed by Barton George from Dell during the event.

Twitter