08.20.2014 | Posted by Sanjay Sarathy, CMO
Not often have I spent two weeks in August in a “winter” climate, but it was a great opportunity to spend some time with our new team in Australia, visit with prospects, customers and partners, and attend a couple of Amazon Web Service Summits to boot.
Here are some straight-off-the-plane observations.
A Local “Data Center” Presence Matters: We now have production instances in Sydney, Dublin and the United States. In conversations with Australian enterprises and government entities, the fact that we have both a local team and a local production instance went extremely far when determining whether we were a good match for their needs. This was true whether their use case centered around supporting their security initiatives or enabling their DevOps teams to release applications faster to market. You can now select where your data resides when you sign up for Sumo Logic Free.
Australia is Ready For the Cloud: From the smallest startup to extremely large mining companies, everyone was interested in how we could support their cloud initiatives. The AWS Summits were packed and the conversations we had revolved not just around machine data analytics but what we could do to support their evolving infrastructure strategy. The fact that we have apps for Amazon S3, Cloudfront, CloudTrail and ELB made the conversations even more productive, and we’ve seen significant interest in our special trial for AWS customers.
We’re A Natural Fit for Managed Service Providers: As a multi-tenant service born in the Cloud, we have a slew of advantages for MSP and MSSPs looking to embed proactive analytics into their service offering, as our work with The Herjavec Group and Medidata shows. We’ve had success with multiple partners in the US and the many discussions we had in Australia indicate that there’s a very interesting partner opportunity there as well.
Analytics and Time to Insights: In my conversations with dozens of people at the two summits and in 1-1 meetings, two trends immediately stand out. While people remain extremely interested in how they can take advantage of real-time dashboards and alerts, one of their bigger concerns typically revolved around how quickly they could get to that point. ”I don’t have time to do a lot of infrastructure management” was the common refrain and we certainly empathize with that thought. The second is just a reflection on how we sometimes take for granted our pattern recognition technology, aka, LogReduce. Having shown this to quite a few people at the booth, the reaction on their faces never gets old especially after they see the order of magnitude by which we reduce the time taken to find something interesting in their machine data.
At the end of the day, this is a people business. We have a great team in Australia and look forward to publicizing their many successes over the coming quarters.
08.18.2014 | Posted by Russell
The code for this post, as well as the post itself, are on github.
Until recently, regular expressions seemed magical to me. I never understood how you could determine if a string matched a given regular expression. Now I know! Here’s how I implemented a basic regular expression engine in under 200 lines of code.
Implementing full regular expressions is rather cumbersome, and worse, doesn’t teach you much. The version we’ll implement is just enough to learn without being tedious. Our regular expression language will support:
.: Match any character
+: Match one or more of the previous pattern
*: Match 0 or more of the previous pattern
While this is a small set of options, we’ll still be able to make some cute regexes, like
m (t|n| ) | b to match Star Wars subtitles without matching Star Trek ones, or
(..)* the set of all even length strings.
We’ll evaluate regular expressions in 3 phases: 1. Parse the regular expression into a syntax tree 2. Convert the syntax tree into a state machine 3. Evaluate the state machine against our string
We’ll use a state machine called an NFA to evaluate regular expressions (more on that later). At a high level, the NFA will represent our regex. As we consume inputs, we’ll move from state to state in the NFA. If we get to a point where we can’t follow an allowed transition, the regular expression doesn’t match the string.
This approach was originally demonstrated by Ken Thompson, one of the original authors of Unix. In his 1968 CACM paper he outlines the implementation of a text editor and includes this as a regular expression evaluator. The only difference is that his article is written 7094 machine code. Things used to be way more hard core.
This algorithm is a simplification of how non-backtracking engines like RE2 evaluate regular expressions in provably linear time. It’s notably different from the regex engines found in Python and Java that use backtracking. When given certain inputs, they’ll run virtually forever on small strings. Ours will run in
O(length(input) * length(expression).
My approach roughly follows the strategy Russ Cox outlines in his excellent blog post.
Lets step back and think about how to represent a regular expression. Before we can hope to evaluate a regular expression, we need to convert it into a data structure the computer can operate on. While strings have a linear structure, regular expressions have a natural hierarchy.
Lets consider the string
abc|(c|(de)). If you were to leave it as a string, you’d have to backtrack and jump around as you tried to keep track of the different sets of parenthesis while evaluating the expression. One solution is converting it to a tree, which a computer can easily traverse. For example,
b+a would become:
To represent the tree, we’ll want to create a hierarchy of classes. For example, our
Or class will need to have two subtrees to represent its two sides. From the spec, there are 4 different regular expression components we’ll need to recognize:
|, and character literals like
b. In addition, we’ll also need to be able to represent when one expression follows another. Here are our classes:
|1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16||
To get from a string to a tree representation, we’ll use conversion process known as “parsing.” I’m not going to talk about building parsers in a high level of detail. Rather, I’ll give just enough to point you in the right direction if you wanted to write your own. Here, I’ll give an overview of how I parsed regular expressions with Scala’s parser combinator library.
Scala’s parser library will let us write a parser by just writing the set of rules that describe our language. It uses a lot of obtuse symbols unfortunately, but I hope you’ll be able to look through the noise to see the gist of what’s happening.
When implementing a parser we need to consider order of operations. Just as “PEMDAS” applies to arithmetic, a different set of rules apply to regular expressions. We can express this more formally with the idea of an operator “binding” to the characters near it. Different operators “bind” with different strengths — as an analogy,
* binds more tightly than
+ in expressions like 5+6*4. In regular expressions,
* binds more tightly than
|. If we were to represent parsing this as a tree the weakest operators end up on top.
It follows that we should parse the weakest operators first, followed by the stronger operators. When parsing, you can imagine it as extracting an operator, adding it to the tree, then recursing on the remaining 2 parts of the string.
In regular expressions the order of binding strength is: 1. Character Literal & parentheses 2.
* 3. “Concatenation” — a is after b 4.
Since we have 4 levels of binding strength, we need 4 different types of expressions. We named them (somewhat arbitrarily):
midExpr (concatenation), and
|). Lets jump into the code. First we’ll make a parser for the most basic level, a single character:
Lets take a moment to explain the syntax. This defines a parser that will build a
RegexExpr. The right hand side says: “Find something that matches
\w (any word character) or a period. If you do, turn it into a
Parentheses must be defined at the lowest level of the parser since they are the strongest binding. However, you need to be able to put anything in parentheses. We can accomplish this with the following code:
Here, we’ll define
|1 2 3 4 5||
Next, we’ll define concatenation, the next level up:
|1 2 3||
Finally, we’ll define or:
Lastly, we’ll define
highExpr is an
or, the weakest binding operator, or if there isn’t an
Finally, a touch of helper code to finish it off:
|1 2 3 4 5 6 7 8 9 10 11 12||
That’s it! If you take this Scala code you’ll be able to generate parse trees for any regular expression that meets the spec. The resulting data structures will be trees.
08.13.2014 | Posted by Garrett Nano
Software systems today contain hundreds of thousands to millions of lines of code written from anywhere between a few developers at a start up to thousands at today’s software giants. Working with large amounts of code with many developers results in overlapping usage and modification of the API’s being used by the developers. With this comes the danger of a small change breaking large amounts of code. This raises the question of how we as developers working on projects of this scale can ensure that the code that we write not only works within the context in which we are working, but also doesn’t cause bugs in other parts of the system. (Getting it to compile always seems to be the easy part!)
Here at Sumo Logic we currently have over 60 developers working on a project with over 700k lines of code1 split up into over 150 different modules2. This means that we have to be mindful of the effects that our changes introduce to a module and the effect that the changed module has on other modules. This gives us two options. First, we can try really really hard to find and closely examine all of the places in the code base that our changes affect to make sure that we didn’t break anything. Second, we can write tests for every new functionality of our code. We prefer the second option, not only because option one sounds painful and error prone, but because option two uses developers’ time more efficiently. Why should we have to waste our time checking all the edge cases by hand every time that we make a change to the project?
For this reason, at Sumo Logic we work by using the methods of test driven development. (For more on this see test driven development.) First we plan out what the end functionality of our changes should be for the system and write both unit and integration tests for our new functionality. Unit tests offer specific tests of edge cases and core functionality of the code change, while integration tests exercise end to end functionality. Since we write the tests before we write the new code, when we first run the updated tests, we will fail them. But, this is actually what we want! Now that we have intentionally written tests that our code will fail without the addition of our new functionality, we know that if we can manage to pass all of our new tests as well as all of the pre-existing tests written by other developers that we have succeeded with our code change. The benefits of test driven development are two-fold. We ensure that our new functionality is working correctly and we maintain the previous functionality. Another instance in which test driven development excels is in refactoring code. When it becomes necessary to refactor our code, we can refactor at ease knowing that the large suites of tests that we wrote during the initial development can tell us if we succeeded in our refactoring. Test driven development calls this red-green-refactor where red means failing tests and green means passing tests. Rejoice, with well-written tests we can write new code and refactor our old code with confidence that our finished work will continue to function without introducing bugs into the system.
Despite all of this testing, it is still possible for bugs to slip through the cracks. To combat these bugs, here at Sumo Logic we have multiple testing environments for our product before it is deemed to be of a high enough standard to be released to our customers. We have four different deployments. Three of them are for testing and one is for production. (For more on this see deployment infrastructure and practices.) Our QA team performs both manual and additional automated testing on these deployments, including web browser automation tests. Since we need a sizable amount of data to test at scale, we route log files from our production deployment into our pre-production environments. This makes us one of our own biggest customers! The idea of this process is that by the time a build passes all of the unit/integration tests and makes it through testing on our three non-production environments, all of the bugs will be squashed allowing us to provide our customers with a stable high performing product.
1 ( find ./ -name ‘*.scala’ -print0 | xargs -0 cat ) | wc -l
2 ls -l | wc -l
07.28.2014 | Posted by Dwayne Hoover, Senior Sales Engineer
Collecting log data from Amazon RDS instances can be done through a hosted HTTP collector. There is some configuration required to make this happen, but once the foundation is built, this can be a seamless integration from RDS to Sumo Logic.
Install the AWS RDS Command Line Tools and Configure Access:
This tutorial was performed on a Linux based EC2 machine, for detailed instructions on Windows, please refer to the documentation in the link above.
Obtain the command line tools
Copy the zip file to the desired installation path and unzip
Set up the following environment variables (these might look differently on your system, refer to the documentation for additional detail)
Set up the proper credentials for RDS access by entering access keys here:
For detailed instructions for RDS access, please see (Providing Credentials for the Tools): http://docs.aws.amazon.com/AmazonRDS/latest/CommandLineReference/StartCLI.html
You must also be sure that the user account interacting with RDS has the proper permissions configured in IAM: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.IAM.html
Verify by issuing the following command
$ rds-describe-db-log-files <rds instance name here>
If a list of the available log files is returned, you are ready to push the data into Sumo Logic.
Set Up a Sumo Logic Hosted HTTP Collector and Source:
Log in to Sumo Logic and select Add Collector
Choose Hosted Collector, Name it and Select OK when asked if you would like to add a data source:
Give the source a name and fill out relevant metadata. Also configure the options for timestamp parsing and multi line settings:
Upon saving the new source, you will be provided with a unique URL. This is the endpoint to which you will push the AWS RDS logs:
Collecting Logs from RDS and Pushing them to Sumo Logic:
To list available log files for your RDS instance, issue the following command:
$ rds-describe-db-log-files <db instance name>
You can limit the list by date last written as follows (note, uses UNIX POSIX timestamp):
$ rds-describe-db-log-files <db instance name> --file-last-written 1395341819000
To manually push logs to your newly configured HTTP endpoint, this can be done using curl. In the following example, we are pulling one log file and pushing it to Sumo Logic:
$ rds-download-db-logfile orasumo --log-file-name trace\/alert_ORASUMO.log | curl -X POST -d @- https://collectors.sumologic.com/receiver/v1/http/redactedKEY
Note: the forward slash in the file name is escaped with a back slash and the output of the rds-download-db-logfile is piped into a curl command that posts the data to Sumo Logic.
Luckily, the RDS command line tools provide an option to continuously monitor log files for activity, to use this feature for an HTTP push, you can do the following:
$ rds-watch-db-logfile sumopostgres --log-file-name error\/postgres.log | ./watch-rds.sh
Note, that we are piping the output into a shell script. The contents of our sample script can be seen below:
URL="https://collectors.sumologic.com/receiver/v1/http/<unique URL string>"
while read data;
curl --data "$data" $URL
This script will run until cancelled, so it is best to launch it in the background/nohup.
$ nohup sh -c 'rds-watch-db-logfile <your db instance name> --log-file-name <your db log file name> | ./watch-rds.sh'
Installed Collector Alternative:
If you already have a Sumo Logic collector installed and can access your RDS logs from the command line utilities, simply piping the results from above to a local file and sending the log messages via the collector will also work.
$ rds-watch-db-logfile sumopostgres --log-file-name error\/postgres.log > /path/to/localfile.log
Where /path/to/localfile.log is a configured Sumo Logic source for the installed collector.
This article originally appeared on DwayneHoover.com
06.25.2014 | Posted by Johnathan Hodge
As we release the Sumo Logic App for PCI Compliance, I was reflecting on how tough PCI compliance is. It’s obviously an essential part of any organization’s IT strategy that handles credit cardholder information – but it’s tough – monitoring compliance across all the requirements is a big undertaking. And a mistake can have disastrous results.
Because of this, I really like the new guidance I read in v3 of the PCI DSS, released in November 2013, in the new section called “Implementing PCI DSS into Business-as-Usual Processes”. To do this, you need, amongst other things, excellence in monitoring, detection, timely root-cause analysis and well-designed remediation.
Now, the new PCI App from Sumo Logic obviously supports these things. With a broad array of dashboards, reports and searches specifically designed to monitor and detect potential issues across the 12 requirements, hidden within the terabytes of log files that many customers have, our PCI App is strong. But so what? Highlighting there is an issue is close to useless unless you provide the tools to take effective action in diagnosing root cause – making change happen. And as industry experts remind us, no matter what “us vendors” say, there are always false-positives that need to be examined and can get in the way of underlying issues. We’ve all used analysis tools that highlight an issue but then make it nearly impossible to take that critical step to true root cause identification. There’s little more frustrating than hitting that “So what?” moment.
What makes our PCI App exceptional and different is the fact it’s based on the Sumo Logic platform. Once we alert you to a potential failure, it’s simple to identify which Requirement to focus on, and from there to drill into the details. Our unique features, such as Anomaly Detection and LogReduce, make finding the needles in the remaining haystacks painless – and quick.
So what? So, Sumo Logic’s PCI App will not simply highlight potential PCI infractions, it will dramatically reduce the time to root cause analysis – leaving you no time to even consider the “So what?” question. You will be too busy putting new measures in place to prevent the cause of the failure recurring.
06.20.2014 | Posted by Jana Lass
The Internet of Things. The popularity of this topic seems to be growing just about as rapidly as as the amount of machine data generated by these “things” has. A few weeks back, our beloved co-founder and CTO, Christian Beedgen, penned an article, The Internet of Things: More Connectivity Can Mean More Vulnerability, where he discusses some of the security challenges that exist around the exponentially growing number of devices connected to the internet.
Most major tech companies have written about or published infographics about IoT, yet when I read these articles or review infographics, they all seem to be missing a very obvious fact about the Internet…The Internet is made up of cats. To highlight what’s truly representative of our interconnectedness, we present you with an infographic around ”The Internet of Cats.” Enjoy!
06.16.2014 | Posted by Amanda Saso, Sr. Tech Writer
I like to fashion myself as a lower-level Cloud evangelist. I’m amazed at the opportunities the Cloud has afforded me both professionally and personally in the past four or five years. I tend to run head-first into any Cloud solution that promises to make my life better, and I’m constantly advocating Cloud adoption to my friends and family.
The consumer-level Cloud services that have developed over the past few years have changed how I relate to technology. Just like everyone else, I struggle with balancing work, mom duties, volunteer activities, and so on. Being able to keep my data handy simplifies my life–having records in the Cloud has saved me in several situations where I could just call up a document on my iPhone or iPad. No matter which Cloud app I’m using, I’m in the loop if I’m sitting at work or watching my kids at gymnastics (so long as I remember to charge my phone–there’s that darn single point of failure).
I respect Sumo for being a Cloud company that behaves like a Cloud company. We might have one physical server rattling around in an otherwise-empty server room, but I don’t know the name of it–I don’t ever need to access it. We run in the Cloud, we scale in the Cloud, we live in the Cloud. To me, that gives Sumo Logic an uncommon brand of Cloud legitimacy.
So what does all this have to do with Sumo Logic’s Help system? I started making noise about moving our online Help into the Cloud because I wanted the ability to dynamically update Help. At the time, my lovingly written files were somewhat brutally checked into the code, meaning that my schedule was tied to the engineering upgrade schedule. That worked for a while, but as we trend towards continuous delivery of our product, it wasn’t scaling. I knew there had to be a better way, so I looked to the Cloud.
My sense of urgency wasn’t shared by everyone, so I made a fool of myself at a Hack-a-Thon, attempting to make it happen. It was an epic failure, but a great learning experience for me. Knowing that I could spin up an instance of whatever kind of server my little heart desired was a game changer–what was once something that required capital expense (buying a Linux box or a Windows Server) was now available with a few clicks at minimal cost.
Within a month or so, I had convinced my manager of the legitimacy of my project. Eventually our Architect, Stefan Zier, took pity on me. He set up an S3 Bucket in AWS (Sumo runs in AWS, so this is a natural choice), then configured our test and production deployments to point to the URL I chose for our Help system. The last bit of engineering magic was leveraging an internal engineering tool that I use to update the URL for one or more deployments. Within a few days it worked. I now can push updates to Help from my own little S3 Bucket whenever I like. That is some awesome agility.
To those who are not tech writers, this may seem unremarkable, but I don’t know any other organizations with Cloud-based tech pubs delivery systems. I couldn’t find any ideas online when I was trying to do this myself. No blog posts, no tools. It was uncharted. This challenge really lit a fire under me–I couldn’t figure out why nobody seemed to be delivering Help from the Cloud.
The Cloud also improves the quality of my work, and grants me new options. Using an S3 Bucket means that I can potentially set up different Help systems for features that are only accessed by a subset of customers. I can take down anything that contains errors–which very, very rarely happens (yeah, right). I can take feedback from our Support team, Project Managers, Customer Success Team, Sales Engineers, and even from guys sitting around me who mumble about things that are missing when they try to write complicated queries. (Yes, our engineers learn about Sumo Logic operators using the very same Help system as our customers.)
Here’s the best part. As our team of tech writers grows (it’s doubled to two in 2014!), I don’t need an IT guy to configure anything; my solution scales gracefully. The authoring tool we use, Madcap Flare, outputs our Help in HTML 5, meaning that the writers don’t need any IT or admin support converting files, nor hosting them in a specific way. (Incidentally, when you check out our Help, everything you see was customized with the tools in Flare, using, of all things, a mobile help template.) Flare has earned a special place in my heart because my deliverables were ready for Cloud deployment; no changes in my process were needed. There are no wasted resources on tasks that the writers are perfectly capable of performing, from generating output to posting new files. That’s the great part about the Cloud. I can do myself what it would take an IT guy to handle using any on-premise server solution.
Funny, that sounds just like Sumo Logic’s product: Instead of wasting time racking servers, people can do their job right out of the gate. That’s value added. That’s the Cloud.
05.27.2014 | Posted by Sanjay Sarathy, CMO
As our growth has accelerated over the past few quarters, we’ve gained additional insights into what customers care about and why they choose us for machine data analytics. In addition, our integrations and partnerships with Akamai, Amazon Web Services and ServiceNow have provided even more context around what customers investing in cloud services want and need. I thought it would be instructive to share one perspective on what we’ve learned.
- Our cloud-native strategy is an asset, not just because of traditional TCO and elasticity reasons but because the fundamental cost of running a high-volume, cloud-based log management service that automatically detects patterns and anomalies is prohibitively expensive for customers choosing an on-premise alternative. It goes back to a central point that many customers bring up with us – “we want to be users of the system, not administrators of it.”
- Our customers really care about Service Level Agreements. Traditional SLAs focus on uptime/availability. This is essential, but not always sufficient. We’ve found that as a cloud provider in this space it’s also necessary to provide a SLA for query performance. Why? It’s quite simple. Query performance is essential to delivering on the promise of time-to-value, not just around initial setup, but also around ongoing operations.
- My colleagues have previously discussed the rationale behind LogReduce and Anomaly Detection. One of the tenets of our product strategy is that the rate of growth of machine data has far outpaced the ability for human rules to automatically capture all insights in your logs. We thus need to combine machine learning with human knowledge to uncover both known and unknown events in machine data. This combination and the reason we invest so much in data science is the underpinning of our analytics strategy.
- Log data is “inherently” chatty and volumes spike when issues arise or seasonality goes beyond the norm. It’s during these periods that the need to instantly burst capacity to meet customer demand is critical. An on-premise environment cannot by definition get this done without having expensive spare capacity sitting around, a situation most organizations don’t typically provision for. It’s why we’ve incorporated elastic bursting to over 5x of your regular volume as part of our regular service.
These and other differentiators are a significant reason why we’ve grown by 500% over the past year. We decided to take these differentiators and our other capabilities and make this part of our website. Enjoy the read and understand where we’re focusing our R&D efforts to create a valuable machine data analytics service.
05.20.2014 | Posted by Vance Loiselle, CEO
I originally envisioned this blog as a way to discuss our recent $30 million funding, led by our latest investor, Sequoia Capital, with full participation from Greylock, Sutter Hill and Accel. I’ve been incredibly impressed with the whole Sequoia team and look forward to our partnership with Pat. Yet despite 300 enterprise customers (9 of the Global 500), lots of recent success against our large competitor, Splunk, and other interesting momentum metrics, I’d rather talk about the ride and lessons learned from my first two years as a CEO.
- It’s Lonely. Accept It and Move On. My mentor, former boss and CEO of my previous company told me this, years ago. But at the time, it applied to him and not me (in hindsight I realize I did not offer much help). But, like being a first time parent, you really can’t fathom it until you face it yourself. I’m sure there’s some psychology about how certain people deal with it and others don’t. I’m constantly thinking about the implications of tactical and strategic decisions. I’ve learned that if you’re too comfortable, you’re not pushing hard enough. The best advice I can give is to find a Board member you can trust, and use him or her as a sounding board early and often.
- Trust Your Gut. There have been many occasions when I have been given good advice on key decisions. One problem with good advice, is you can get too much of it, and it isn’t always aligned. The best leader I ever met, and another long-time mentor, would always ask, ‘what is your gut telling you?’ More often than not, your gut is right. The nice thing about following your instincts, the only person to blame if it goes awry is yourself.
- Act Like It’s Your Money. I grew up in Maine, where $100,000 can still buy a pretty nice house. When I first moved to California from Boston it took me some time to get accustomed to the labor costs and other expenses. The mentality in most of the top startups in Silicon Valley is “don’t worry, you can always raise OPM (other people’s money)”. Though I understand the need to invest ahead of the curve, especially in a SaaS-based business like ours, I also believe too much funding can cause a lack of discipline. People just expect they can hire or spend their way around a problem.
- Don’t Be Arrogant. Just saying it almost disqualifies you. Trust me, I have come across all kinds. Backed by arguably the four best Venture Capital firms in the business, I have had plenty of opportunities to meet other CEOs, founders and execs. Some are incredible people and leaders. Some, however, act like they and their company are way too valuable and important to treat everyone with respect. Life is too short not to believe in karma.
- Listen Carefully. If a sales rep is having trouble closing deals, put yourself in his shoes and figure out what help he needs. If the engineering team is not meeting objectives fast enough, find out if they really understand the customer requirements. Often the smallest tweaks in communication or expectations can drastically change the results. Lastly, listen to your customer(s). It is very easy to write off a loss or a stalled relationship to some process breakdown, but customers buy from people they trust. Customers trust people who listen.
- It’s a People Business. Software will eat the world, but humans still make the decisions. We’re building a culture that values openness and rapid decision-making while aligning our corporate mission with individual responsibilities. This balance is a constant work in process and I understand that getting this balance right is a key to successfully scaling the Sumo Logic business.
- Find the Right VCs at the Right Time. I can’t take any credit for getting Greylock or Sutter Hill to invest in our A and B rounds, respectively. But I do have them to thank for hiring me and helping me. We partnered with Accel in November of 2012 and now Sequoia has led this recent investment. Do not underestimate the value of getting high quality VCs. Their access to customers, top talent, and strategic partners is invaluable. Not to mention the guidance they give in Board meetings and at times of key decisions. The only advice I can give here is: 1) know your business cold, 2) execute your plan and 3) raise money when you have wind at your back. Venture Capitalists make a living on picking the right markets with the right teams with the right momentum. Markets can swing (check Splunk’s stock price in last 3 months) and momentum can swing (watch the Bruins in the Stanley Cup – never mind they lost to the Canadiens).
- Believe. It may be cliché, but you have to believe in the mission. If you haven’t watched Twelve O’Clock High, watch it. It’s not politically correct, but it speaks volumes about how to lead and manage. You may choose the wrong strategy or tactics at times. But you’ll never know if you don’t have conviction about the goals.
OK, so I’m no Jack Welch or Steve Jobs, and many of these lessons are common sense. But no matter how much you think you know, there is way more that you don’t. Hopefully one person will be a little better informed or prepared by my own experience.
05.12.2014 | Posted by Jacek Migdal
The Scala compiler can be brutally slow. The community has a love-hate relationship with it. Love means “Yes, scalac is slow”. Hate means, “Scala — 1★ Would Not Program Again”. It’s hard to go a week without reading another rant about the Scala compiler.
Moreover, one of the Typesafe co-founders left the company shouting, “The Scala compiler will never be fast” (17:53). Even Scala inventor Martin Odersky provides a list of fundamental reasons why compiling is slow.
At Sumo Logic, we happily build over 600K lines of Scala code with Maven and find this setup productive. Based on the public perception of the Scala build process, this seems about as plausible as a UFO landing on the roof of our building. Here’s how we do it:
At Sumo Logic, we have more than 120 modules. Each has its own source directory, unit tests, and dependencies. As a result, each of them is reasonably small and well defined. Usually, you just need to modify one or a few of them, which means that you can just build them and fetch binaries of dependencies.
Using this method is a huge win in build time and also makes the IDE and test suites run more quickly. Fewer elements are always easier to handle.
We keep all modules in single GitHub repository. Though we have experimented with a separate repository for each project, keeping track of version dependencies was too complicated.
Parallelism on module level
Although Moore’s law is still at work, single cores have not become much faster since 2004. The Scala compiler has some parallelism, but it’s nowhere close to saturating eight cores in our use case.
Enabling parallel builds in Maven 3 helped a lot. At first, it caused a lot of non-deterministic failures, but it turns out that always forking the Java compiler fixed most of the problems. That allows us to fully saturate all of the CPU cores during most of the build time. Even better, it allows us to overcome other bottlenecks (e.g., fetching dependencies).
Incremental builds with Zinc
Zinc brings features from sbt to other build systems, providing two major gains:
- It keeps warmed compilers running, which avoids the startup JVM “warm-up tax”.
- It allows incremental compilation. Usually we don’t compile from a clean state, we just make a simple change to get recompiled. This is a huge gain when doing Test Driven Development.
For a long time we were unable to use Zinc with parallel modules builds. As it turns out, we needed to tell Zinc to fork Java compilers. Luckily, an awesome Typesafe developer, Peter Vlugter, implemented that option and fixed our issue.
The following example shows the typical development workflow of building one module. For this benchmark, we picked the largest one by lines of code (53K LOC).
This next example shows building all modules (674K LOC), the most time consuming task.
Usually we can skip test compilation, bringing build time down to 12 minutes.
Still, some engineers were not happy, because:
- Often they build and test more often than needed.
- Computers get slow if you saturate the CPU (e.g., video conference becomes sluggish).
- Passing the correct arguments to Maven is hard.
Educating developers might have helped, but we picked the easier route. We created a simple bash wrapper that:
Runs every Maven process with lower CPU priority (nice -n 15); so the build process doesn’t slow the browser, IDE, or a video conference.
- Makes sure that Zinc is running. If not, it starts it.
- Allows you to compile all the dependencies (downstream) easily for any module.
- Allows you to compile all the things that depend on a module (upstream).
- Makes it easy to select the kind of tests to run.
Though it is a simple wrapper, it improves usability a lot. For example, if you fixed a library bug for a module called “stream-pipeline” and would like to build and run unit tests for all modules that depend on it, just use this command:
bin/quick-assemble.sh -tu stream-pipeline
Tricks we learned along the way
- Print the longest chain of module dependency by build time.
That helps identify the “unnecessary or poorly designed dependencies,” which can be removed. This makes the dependency graph much more shallow, which means more parallelism.
- Run a build in a loop until it fails.
As simple as in bash: while bin/quick-assemble.sh; do :; done.
Then leave it overnight. This is very helpful for debugging non-deterministic bugs, which are common in a multithreading environment.
- Analyze the bottlenecks of build time.
CPU? IO? Are all cores used? Network speed? The limiting factor can vary during different phases. iStat Menus proved to be really helpful.
- Read the Maven documentation.
Many things in Maven are not intuitive. The “trial and error” approach can be very tedious for this build system. Reading the documentation carefully is a huge time saver.
Building at scale is usually hard. Scala makes it harder, because relatively slow compiler. You will hit the issues much earlier than in other languages. However, the problems are solvable through general development best practices, especially:
- Modular code
- Parallel execution by default
- Invest time in tooling
Then it just rocks!
 ( find ./ -name ‘*.scala’ -print0 | xargs -0 cat ) | wc -l
 All modules are built and tested by Jenkins and the binaries are stored in Nexus.
 The author’s 15-inch Macbook Pro from late 2013 has eight cores.
 We have little Java code. Theoretically, Java 1.6 compiler is thread-safe, but it has some concurrency bugs. We decided not to dig into that as forking seems to be an easier solution.
 Benchmark methodology:
- Hardware: MacBook Pro, 15-inch, Late 2013, 2.3 GHz Intel i7, 16 GB RAM.
- All tests were run three times and median time was selected.
- Non-incremental Maven goal: clean test-compile.
- Incremental Maven goal: test-compile. A random change was introduced to trigger some recompilation.