Sign up for a live Kubernetes or DevSecOps demo

Click here
Back to blog results

June 22, 2015 By Stefan Zier

Using Sleeper Cells to Load Test Microservices

In the good old days of monolithic services, basic load testing was relatively straightforward. You’d start your service on a production-like server, maybe a database. Then, you’d point a load generation tool at it and measure how much load the service could push through. Unfortunately, things are a bit more complicated in microservices architectures, especially in Amazon Web Services (AWS), due to issues like random-assigned IP addresses, security groups, etc.

For starters, if the service you’re trying to load test isn’t a leaf node in your architecture, it’ll depend on other services. Those, in turn, will depend on more services. Besides, even once you manage to boot up all the services needed to run the test, you still have the challenge of deploying, starting and managing your load testing tool. Especially once you need a load test tool that runs on more than one host, you’re having to work pretty hard.

Distributed load testing setups require the following parts:

  • Remote Control: Manage the fleet of load generators without touching every host.
  • Load Generator: Create the load on the microservice.
  • Measurements: Measure the performance of the tested microservice and summarize it.

I recently faced this problem when I wanted to load test one of our microservices on short notice. Through sheer luck, I came up with a simple solution that turned out to solve all these problems.

Meet sleeper cells

It struck me that we already have all the tooling to deploy, manage and monitor load generators. We just didn’t call them that — we called them clients. In other words: In a deployment of Sumo Logic, we had clusters of clients to the microservice. We just didn’t have a way to get them to generate synthetic load.

Instead of hacking it and dealing with one-off throwaway builds, I decided to turn those clients into load generators. Every client would now include a sleeper agent, ready to spring into action.

Here is the base class for the sleeper cells:

package com.sumologic.util.scala.bench

import java.util.concurrent.atomic.{AtomicInteger, AtomicLong}

import com.netflix.config.scala.DynamicProperties
import com.sumologic.util.scala.env.Environment
import com.sumologic.util.scala.log.Logging
import com.sumologic.util.scala.rateLimiter.FixedRateLimiter
import com.sumologic.util.scala.time.{TimeConstants, TimeFormats, TimeSource}

import scala.util.control.NonFatal


abstract class SleeperCell(name: String,
 assemblyName: String)
 extends DynamicProperties
 with Logging
 with TimeSource
 with TimeConstants {

 // API to implement by subclasses. 

 protected def makeRequest(): Unit

 protected def logStats(): Unit

 protected def resetStats(): Unit

 // Remote control.

 private val configUpdateCallback = new Runnable() {
 override def run(): Unit = checkForConfigurationUpdate()
 }

 private val activatedAssemblies = dynamicStringListProperty(s"sleeper.cell.$name.assemblies", List[String]())
 activatedAssemblies.addCallback(configUpdateCallback)

 protected val requestsPerSecond = dynamicIntProperty(s"sleeper.cell.$name.rate", Int.MaxValue)
 requestsPerSecond.addCallback(configUpdateCallback)

 protected val agentThreads = dynamicIntProperty(s"sleeper.cell.$name.agents", 64)
 agentThreads.addCallback(configUpdateCallback)

 // Stats.

 protected val lastLog = new AtomicLong(now)
 protected val requestCount = new AtomicInteger(0)
 protected val failedRequestCount = new AtomicInteger(0)

 // State.

 private var activeAgents: Seq[SleeperAgent] = Seq.empty[SleeperAgent]

 checkForConfigurationUpdate()
 prefix(s"$name sleeper cell")
 info("Initialized and awaiting instructions.")

 private def checkForConfigurationUpdate() {
 this synchronized {
 val cellActivated = !Environment().isProd && activatedAssemblies.get().contains(assemblyName)
 if (cellActivated && activeAgents.isEmpty) {
 activateCell()
 } else if (!cellActivated && activeAgents.nonEmpty) {
 goToSleep()
 } else if (cellActivated && activeAgents.size != agentThreads.get()) {
 info(s"Agent count changed from ${activeAgents.size} to ${agentThreads.get()} - restarting.")
 goToSleep()
 activateCell()
 }
 }
 }

 private def activateCell() {
 info(s"We have been activated. Activating ${agentThreads.get()} agents.")
 activeAgents = (1 to agentThreads.get()).map(new SleeperAgent(_))
 activeAgents.foreach(_.start())
 }

 private def goToSleep() {
 info(s"We have been told to go back to sleep. Shutting down ${activeAgents.size} agents.")
 activeAgents.foreach(_.keepRunning = false)
 activeAgents.foreach(_.join())

 requestCount.set(0)
 failedRequestCount.set(0)
 resetStats()
 }

 private class SleeperAgent(id: Int)
 extends Thread(s"Sleeper-Agent-$name-$id")
 with TimeConstants
 with TimeFormats {

 var keepRunning = true

 val rateLimiter = new FixedRateLimiter(requestsPerSecond.get(), 1.second)

 override def run() {
 while (keepRunning) {

 while (!rateLimiter.isActionAllowed) {
 Thread.sleep(50)
 }

 try {
 rateLimiter.recordAction()
 requestCount.incrementAndGet()
 makeRequest()
 } catch {
 case NonFatal(e) => failedRequestCount.incrementAndGet()
 }

 def timeToLogStats: Boolean = (now - lastLog.get()) > 15.seconds
 if (timeToLogStats) {
 lastLog synchronized {
 if (timeToLogStats) {
 logStats()
 lastLog.set(now)
 }
 }
 }
 }
 }
 }
}

Remote Control

Under normal circumstances, the sleeper agents simply watch out for a particular property in Archaius. If that property is set, the sleeper agents wake up and start attacking the target microservice with requests. For safety, the code includes a check to prevent it from being activated in production. A different configuration property controls the amount of load generated.

Load Generator

Sleeper agents are threads that call a custom makeRequest() function at a pre-set rate limit. Each cell contains a configurable number of agent threads. The number of threads can be changed at runtime (again, via the remote control).

Measurements

Each of the sleeper agents logs a set of measurements every 15 seconds into the logs of their host, which already we already collect into Sumo Logic. Based on the logs, we can aggregate and determined how our target behaved. Bonus tip: Log the settings of the load generator alongside the results, so you don’t need to track those externally.

2015-06-11 18:29:17,026 -0700 INFO [logger=scala.config.util.ConfigClientSleeperCell] 
 [settings: 64 threads, 10000 requests/s] 
 5979 requests sent in 15s at 398 requests/sec. 5696 requests failed. 
 (loadById: 4989, loadByUri: 5687, findByUriPattern: 119, failing loadById: 163)

Conclusion

This Sleeper Agent pattern was a quick and easy way to get a load test going. We’ve since replicated this a number of times, and all of our environments contains several sleeper cells.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Stefan Zier

Stefan was Sumo’s first engineer and Chief Architect. He enjoys working on cloud plumbing and is plotting to automate his job fully, so he can spend all his time skiing in Tahoe.

More posts by Stefan Zier.

People who read this also enjoyed