This year, at Sumo Logic’s third annual user conference, Illuminate 2018, we presented Sumo Logic Notebooks as a way to do data science within the Sumo Logic platform. Sumo Logic Notebooks integrate Sumo Logic data, data science notebooks and common machine learning frameworks. Our vision is to empower customers so that they can use notebooks to connect with their data in Sumo Logic, whether that’s logs or metrics, in a notebook-style environment and experiment with machine learning algorithms to drive intelligent insights.
Here we introduce a workflow for interacting with your Sumo Logic data by leveraging the Sumo Logic Notebooks and experiment with a very simple machine learning model to analyze your data using the popular arcade game, Killer Queen.
Why Killer Queen?
You might be wondering why this game? It’s mostly due to the presence of the arcade game at Sumo Logic HQ, where Killer Queen has become a popular post-work competition. Killer Queen is a 10-person team-based competitive strategy game with two teams — Blue and Gold — consisting of up to five players each. Each team starts off with a “queen” and four “workers.” Workers can later be converted into warriors. In terms of general rules and ways to win — there are only three ways to win and many ways to lose.
Throughout the game, the workers can collect berries found in the arena and use them for character transformations or for scoring.
The three victory conditions of the game that teams must be aware of are: Military, Economic and Snail. For more information on the official game rules, you can check out the “how to play” page on the Killer Queen website.
During the game, the game engine’s API emits logs denoting the state of the world as the game proceeds. Each state has information about different game related statistics. For this post, our goal is to demonstrate how to use the information from these states and the Sumo Logic Notebook to predict the winner between two teams as the game progresses.
We set up the Sumo Logic Notebooks with the required Access IDs and Access Keys for a Sumo Logic account as described in the documentation here.
We use a custom script to collect Killer Queen game data to a Sumo HTTP Source.
Then we query the data using the Sumo query language in the Zeppelin Notebook and retrieve the game logs in the notebook. Note the %spark.sumo annotation at the top of the query cell.
Once we have the data, we can convert it into a python pandas dataframe from a Scala dataframe using the Zeppelin context (using the z variable). (see code) Note the difference in the type of cells — the top one is your usual spark environment, using a Scala compiler.
The bottom one with the %spark.pyspark annotation uses the python interpreter and behaves like a usual python notebook.
From now on, we can proceed as if we were entirely in a pure python notebook. Before we pre-process the raw logs, we define the schema we want to convert them into. Here, we use a few simple classes to lay out the data vector (see code). We also add a couple of utility functions (see code).
Then we can go ahead and preprocess the raw logs into a pandas data frame:
Here in this pandas dataframe columns 0-15 denote the game state at each event using data vector format defined earlier and column 16 and 17 denote labels. Column 16 denotes the eventual victor label (0 for blue, 1 for gold) and column 17 denotes victory type (0 for economy, 1 for military, 2 for gold). Thus our dataset consists of a series of events vectorized and containing labels for how the game ended and what the type of victory was.
Modeling the Game
We use the python machine learning module scikit-learn to build a very simple logistic regression model to predict the probability of winning as the game proceeds.
After fitting a Logistic Regression model, we predict the probability of winning by looking at the events as a game proceeds (see code).
Using matplotlib and seaborn, we can also plot the winning probability as the game proceeds.
Annotating two of those plots to analyze the prediction of the model we see that we can come up with a few indicators of a winning strategy:
- Killing the opponent’s queen definitely improves your chances of winning, but not as much as depositing berries
- Killing soldiers helps a bit less than converting gates, but converting gates at the end is pretty useful over killing soldiers
From the plots, we can also see a surprising result — depositing a berry gives you a better chance/strategic advantage to win the game as opposed to killing a queen. This might sound counterintuitive because queens are only given three lives and you must deposit 12 berries to win the game under the Economic Victory win condition, which might seem like a lot of berries. However, the data show us something different.
Want to Collect your Own Killer Queen Game Data?
If you’re interested in collecting game data from your Killer Queen machine and sending it to Sumo Logic, check out this GitHub repository.
Using Sumo Logic Notebooks, we showed how to query data stored in Sumo Logic and work with it in a manner similar to most data science workflows and using popular tools like Pandas, Numpy, Scikit-Learn and Matplotlib/Seaborn. Using Sumo Zeppelin Notebooks you too can connect and empower your data science workflows on your data stored in Sumo Logic!
We actually demoed this capability at re:Invent 2018, inviting conference attendees to stop by the Sumo Logic booth to play a game of Killer Queen, while our machine data analytics platform collected game data in real time. Once the game ended, we invited them to review their performance stats in Sumo Logic dashboards to reveal key information about the game outcome. This data shows key patterns and actions that allow the player to continually improve their strategy techniques and performance in future games.
- Want more reads like this? Check out this blog for more information on how we created our own Sumo Smash Bros game to again show the power of data
- Read our 2018 State of Modern Applications & DevSecOps in the Cloud report
- Learn how data scientists can use Scala to encode notions of data sensitivity, privacy, or contamination when working with data science notebooks.
Interested in working with the Sumo Logic engineering team? We’re hiring! Check out our open positions here.