Sumo Logic Illuminate White RGB

October 6–7, 2020 | Illuminate conference—A virtual experience Learn more

Learn more
Back to blog results

June 22, 2015 By Stefan Zier

Using Sleeper Cells to Load Test Microservices

In the good old days of monolithic services, basic load testing was relatively straightforward. You’d start your service on a production-like server, maybe a database. Then, you’d point a load generation tool at it and measure how much load the service could push through. Unfortunately, things are a bit more complicated in microservices architectures, especially in Amazon Web Services (AWS), due to issues like random-assigned IP addresses, security groups, etc.

For starters, if the service you’re trying to load test isn’t a leaf node in your architecture, it’ll depend on other services. Those, in turn, will depend on more services. Besides, even once you manage to boot up all the services needed to run the test, you still have the challenge of deploying, starting and managing your load testing tool. Especially once you need a load test tool that runs on more than one host, you’re having to work pretty hard.

Distributed load testing setups require the following parts:

  • Remote Control: Manage the fleet of load generators without touching every host.
  • Load Generator: Create the load on the microservice.
  • Measurements: Measure the performance of the tested microservice and summarize it.

I recently faced this problem when I wanted to load test one of our microservices on short notice. Through sheer luck, I came up with a simple solution that turned out to solve all these problems.

Meet sleeper cells

It struck me that we already have all the tooling to deploy, manage and monitor load generators. We just didn’t call them that — we called them clients. In other words: In a deployment of Sumo Logic, we had clusters of clients to the microservice. We just didn’t have a way to get them to generate synthetic load.

Instead of hacking it and dealing with one-off throwaway builds, I decided to turn those clients into load generators. Every client would now include a sleeper agent, ready to spring into action.

Here is the base class for the sleeper cells:

package com.sumologic.util.scala.benchimport java.util.concurrent.atomic.{AtomicInteger, AtomicLong}import com.netflix.config.scala.DynamicPropertiesimport com.sumologic.util.scala.env.Environmentimport com.sumologic.util.scala.log.Loggingimport com.sumologic.util.scala.rateLimiter.FixedRateLimiterimport com.sumologic.util.scala.time.{TimeConstants, TimeFormats, TimeSource}import scala.util.control.NonFatalabstract class SleeperCell(name: String, assemblyName: String) extends DynamicProperties with Logging with TimeSource with TimeConstants { // API to implement by subclasses.  protected def makeRequest(): Unit protected def logStats(): Unit protected def resetStats(): Unit // Remote control. private val configUpdateCallback = new Runnable() { override def run(): Unit = checkForConfigurationUpdate() } private val activatedAssemblies = dynamicStringListProperty(s"sleeper.cell.$name.assemblies", List[String]()) activatedAssemblies.addCallback(configUpdateCallback) protected val requestsPerSecond = dynamicIntProperty(s"sleeper.cell.$name.rate", Int.MaxValue) requestsPerSecond.addCallback(configUpdateCallback) protected val agentThreads = dynamicIntProperty(s"sleeper.cell.$name.agents", 64) agentThreads.addCallback(configUpdateCallback) // Stats. protected val lastLog = new AtomicLong(now) protected val requestCount = new AtomicInteger(0) protected val failedRequestCount = new AtomicInteger(0) // State. private var activeAgents: Seq[SleeperAgent] = Seq.empty[SleeperAgent] checkForConfigurationUpdate() prefix(s"$name sleeper cell") info("Initialized and awaiting instructions.") private def checkForConfigurationUpdate() { this synchronized { val cellActivated = !Environment().isProd && activatedAssemblies.get().contains(assemblyName) if (cellActivated && activeAgents.isEmpty) { activateCell() } else if (!cellActivated && activeAgents.nonEmpty) { goToSleep() } else if (cellActivated && activeAgents.size != agentThreads.get()) { info(s"Agent count changed from ${activeAgents.size} to ${agentThreads.get()} - restarting.") goToSleep() activateCell() } } } private def activateCell() { info(s"We have been activated. Activating ${agentThreads.get()} agents.") activeAgents = (1 to agentThreads.get()).map(new SleeperAgent(_)) activeAgents.foreach(_.start()) } private def goToSleep() { info(s"We have been told to go back to sleep. Shutting down ${activeAgents.size} agents.") activeAgents.foreach(_.keepRunning = false) activeAgents.foreach(_.join()) requestCount.set(0) failedRequestCount.set(0) resetStats() } private class SleeperAgent(id: Int) extends Thread(s"Sleeper-Agent-$name-$id") with TimeConstants with TimeFormats { var keepRunning = true val rateLimiter = new FixedRateLimiter(requestsPerSecond.get(), 1.second) override def run() { while (keepRunning) { while (!rateLimiter.isActionAllowed) { Thread.sleep(50) } try { rateLimiter.recordAction() requestCount.incrementAndGet() makeRequest() } catch { case NonFatal(e) => failedRequestCount.incrementAndGet() } def timeToLogStats: Boolean = (now - lastLog.get()) > 15.seconds if (timeToLogStats) { lastLog synchronized { if (timeToLogStats) { logStats() lastLog.set(now) } } } } } }}

Remote Control

Under normal circumstances, the sleeper agents simply watch out for a particular property in Archaius. If that property is set, the sleeper agents wake up and start attacking the target microservice with requests. For safety, the code includes a check to prevent it from being activated in production. A different configuration property controls the amount of load generated.

Load Generator

Sleeper agents are threads that call a custom makeRequest() function at a pre-set rate limit. Each cell contains a configurable number of agent threads. The number of threads can be changed at runtime (again, via the remote control).

Measurements

Each of the sleeper agents logs a set of measurements every 15 seconds into the logs of their host, which already we already collect into Sumo Logic. Based on the logs, we can aggregate and determined how our target behaved. Bonus tip: Log the settings of the load generator alongside the results, so you don’t need to track those externally.

2015-06-11 18:29:17,026 -0700 INFO [logger=scala.config.util.ConfigClientSleeperCell]  [settings: 64 threads, 10000 requests/s]  5979 requests sent in 15s at 398 requests/sec. 5696 requests failed.  (loadById: 4989, loadByUri: 5687, findByUriPattern: 119, failing loadById: 163)

Conclusion

This Sleeper Agent pattern was a quick and easy way to get a load test going. We’ve since replicated this a number of times, and all of our environments contains several sleeper cells.

Complete visibility for DevSecOps

Reduce downtime and move from reactive to proactive monitoring.

Categories

Sumo Logic Continuous Intelligence Platform™

Build, run, and secure modern applications and cloud infrastructures.

Start free trial

Stefan Zier

Stefan was Sumo’s first engineer and Chief Architect. He enjoys working on cloud plumbing and is plotting to automate his job fully, so he can spend all his time skiing in Tahoe.

More posts by Stefan Zier.

People who read this also enjoyed

Blog

AWS Observability: Designed specifically for AWS environments

Blog

Observability: The Intelligence Economy has arrived

Blog

How to Use the New Sumo Logic Terraform Provider for Hosted Collectors