Here at devRant, we use Sumo Logic for a number of things including log monitoring (triggering alerts based on error log trends), database query analysis, and user behavior analysis + A/B testing analysis. In this article I’m going to focus on a recent A/B test analysis Sumo Logic was extremely helpful on.
About the devRant Community
In March 2016, my friend and I founded devRant (iOS, Android) – a fun community for devs to vent, share, and bond over how they really feel about code, tech, and life as a programmer. devRant is an app with an audience that is demanding and technical by nature since our users are developers. With that in mind, it is one of our goals to launch high-quality features and provide experiences that our community will consistently enjoy.
The format of devRant is pretty simple – members of the community post rants which get displayed in a feed-like format. Right now we have three different methods users can choose from to sort the feed:
- recent – most recent first
- top – highest rated rants first
- algo – an algorithm sort with a number of components (recency, score, and new components which I’ll cover later)
Where Sumo Logic Comes in
As I noted, we use Sumo Logic for a error monitoring (triggering alerts based on error log trends), database query analysis (seeing which of our DB queries take the longest tied back to user cohort info), and most importantly for us right now, user behavior analysis + A/B testing analysis.
In the introduction I mentioned the sorting methods we offer in the devRant mobile apps. Since devRant launched, the default sorting method we’ve offered is our algo sort. This is what all new users see and one of our most popular sorting options. Originally, this sorting method was simply a decaying time algorithm combined with score – so over about 12 hours, the rants shown would change, with the highest rated for that time period showing up on top.
While this algo served us well for a while, it became clear that we could do better; we also started getting a lot of requests from users. For example, one of the most frequent items of feedback we got was instead of seeing much of the same content for a few hours in the feed, users wanted the app to hide the content they had already viewed. Additionally, we had an unproven hypothesis that since developers are a picky bunch (well, we are!), it would be good to make the algo sort to the taste of each specific user and mix in some slightly older rants that we think they would enjoy. We decided to base this personalized algo on criteria like users they’ve enjoyed content from before, tags they’ve like, and foremost, rants that users who have similar tastes have enjoyed and that the user hasn’t already seen.
As a small but quickly growing startup, it was important to us to make sure we didn’t cannibalize our most important user experience with a hunch, so we decided we would A/B test the new algorithm and see if the new one actually out-performs the old one.
Using Sumo Logic to Effectively Analyze Our Test
We decided to deploy our new algorithm to 50% of our mobile user-base so we could get a meaningful amount of data fairly quickly. We then decided the main metric we would look at was number of upvotes because a user generally upvotes content only if they are enjoying it.
The first thing we looked at was the number of +1’s created by users based on what algo feed version they had. The log format for a vote looks like this:
event_type=’Vote’, user_id=’id of user’, vote_type=’1 for upvote, -1 for downvote’ insert_id=’a unique id for the vote’, platform=’iOS, Android or Web’, algov=’1 if the user is assigned the old algo, 2 if the new algo’, post_type=’rant or comment’
So an example log message from an upvote on a rant from an iOS user with the new algo would look like this:
event_type=’Vote’, user_id=’123’, vote_type=’1’, insert_id=’123456’, platform=’iOS’, algov=’2’, post_type=’rant’
For the initial query, I wanted to see a split of votes on rants for each algo version on iOS or Android (since the new algo is currently only available on mobile), so I wrote this simple query (it just uses some string matching, parses out the algo version, and then gets a count of votes for each version):
"event_type='Vote'" "vote_type='1'" "post_type='rant'" ("platform='iOS'" OR "platform='Android'") | parse "algov='*'" as algov | count by algov
I ran this over a short period of time and quickly got some insight into our A/B test. The results were promising:
This meant that based on this query and data, the new algo was resulting in about 50% more +1’s on rants compared to the old algo.
However, as I thought about this more, I saw an issue with the data. Like I touched on in the beginning of the article, we offer three different sorting methods with algo sort being just one of those. In the log message for a vote, I realized we weren’t logging what sort method the user was actually using. So even if the user had algov=2, they could have easily been using the recent sort instead of the algo sort, making their vote irrelevant for this test.
Without the actual sort method they had used to find the rant they voted on in the log, I was in a bind to somehow make the data work. Luckily, Sumo Logic’s robust query language came to the rescue! While there’s no sort property on the vote message, we have another event we log called “Feed Load” that logs each time the feed gets loaded for a user, and includes what sort method they are using. A feed load log message is very close in format to a vote message and looks like this:
event_type=’Feed Load’, user_id=’123’, sort=’algo’, insert_id=’1234’, platform=Android, algov=’2’
The important property here is sort. Using the powerful Sumo Logic join functionality, I was able to combine vote events with these feed load events in order to ensure the user who placed the vote was using algo sort. The query looked like this:
("event_type='Vote'" OR "event_type='Feed Load'") ("platform='iOS'" OR "platform='Android'") | join (parse "vote_type='*'" as vote_type, "user_id='*'" as user_id, "insert_id='*'" as insert_id, "algov='*'" as algo_ver, "post_type='*'" as post_type) as vote_query, (parse "event_type='Feed *'" as query, "sort='*'" as sort_type, "user_id='*'" as user_id) as feed_query on vote_query.user_id=feed_query.user_id timewindow 500s | WHERE vote_query_vote_type="1" AND vote_query_post_type="rant" AND feed_query_sort_type="algo" | count_distinct(vote_query_insert_id) group by vote_query_algo_ver
Here’s a brief explanation of this join query: it gets all of our vote and feed load events that occurred on iOS or Android over the given time period. Then for both the vote and feed load event type, it parses out relevant variables to be used later in the query. The votes and feed loads are joined on the user id. Meanwhile, Timewindow 500s makes it so only events that occurred within about 8 minutes of each other are included. In our use-case this is important because someone can change their sort method at any time, so it’s important that the feed algo event occurred close to the time of the vote. We then do some more filtering (only upvotes, only votes on rants, and only votes that were done near the time of an algo sort [meaning they most likely originated from content found in that sort]). Lastly, we use count_distinct on the unique vote id to make sure each vote is only counted once (since there can be multiple feed loads around the time of a vote), and group the results by the algo version so we get a nice split of number of votes for each test group.
After running the query over a decently substantial dataset, I got the following results:
Whoa!! Needless to say, this was pretty exciting. It seems to indicate that the new sort algo produced more than double the amount of upvotes compared to the old one, over the exact same time period. With this data, we are now comfortably able to say that the new algo is a very big improvement and creates a much better user experience in our app.
How Sumo Logic Will Continue to Help Us Push devRant Forward
Both my co-founder and I are very data-centric, so having a flexible solution like Sumo Logic on our side is great because it allows us to easily base very important decisions on the data and user behaviors we want to analyze. I hope in this article I’ve provided an example of how valuable A/B testing can be and why it’s important to have a tool that lets you query your data in a number of different ways. Though our new algorithm is performing very well, it is very possible that one of our tests in the future won’t have the same success. It’s better to use data to realize a failure quickly than to look at overall metrics in a few months and wonder why they plummeted. And for us, an unnoticed mistake might result in us becoming the subject of our own app.
As we grow, we look forward to continuing to develop new features and utilizing quantifiable data to measure their success. We believe this approach gives us an opportunity, as a startup that strives for innovation, to take some risks but learn quickly whether a few feature should stay or go.
About the Author
David Fox is the co-founder and engineering lead of devRant. He has over 10 years of experience developing high-performance backend systems and working with a large variety of databases alongside massive datasets. You can follow him on Twitter @dfoxinator or on devRant (@dfox).
How devRant uses Sumo Logic for Log Monitoring and A/B Testing is published by the Sumo Logic DevOps Community. If you’d like to learn more or contribute, visit devops.sumologic.com. Also, be sure to check out the Sumo Logic Developers for free tools and code that will enable you to monitor and troubleshoot applications from code to production.