Get the reportMore
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.
"The Legend of Zelda: Tears of the Kingdom" (TOTK) is an award-winning game that was released on May 12 this year. The game fans were so excited that some took vacation or sick leave to play uninterrupted. The game sold more than 10 million units in three days! Zelda TOTK presents an open world where players, as Link (the protagonist), can explore, build, and solve puzzles. An interesting phenomenon emerged in which players found they could use logs (as in “long wooden objects”) to solve challenges in the game.
Let’s look at parallels between the world of the game and the world of DevOps—where logs (as in “timestamped records of events”) are used to solve many puzzles as well. Some concepts hold true in both worlds, such as why logs are preferable to other tools, the limitations of logs, and how combining multiple logs is extremely useful.
If you’re only here because you read all things Zelda, let’s give you a primer on DevOps, observability, and logs.
DevOps is the practice that empowers developers to deliver applications, features and service at a high velocity.
An organization that embraces a DevOps approach relies heavily on observability. The state of the infrastructure, pipeline, and code in the CI pipeline must be known and accessible, and observability provides this. An effective observability stack depends on data that comes from logs, metrics, and distributed traces. The focus of this article, of course, is on logs.
Of the different types of observability data, logs are often considered the most versatile and informative. Metrics can give you an overall picture, such as the number of errors encountered or the amount of CPU currently in use. But if you suddenly detect that the number of errors has increased, you still need to figure out what went wrong. Similarly, distributed tracing can tell you the path a request takes and the amount of time it spends along that path. But if the time spent in a particular component suddenly triples, you still need to dig deeper and figure out what's wrong.
Logs give you that detailed information. With diligent logging done right, you can often extract great analytics that can be augmented with metrics and distributed tracing.
In Zelda, the same sentiment applies. Sure, you have multiple tools and ways to solve problems and advance in the game. However, logs are the universal solution. This is so much the case that one player even tweeted: I refuse to solve anything any other way than “more logs.”
Before we can use logs, we need to collect logs. In Zelda, you collect logs by chopping trees with your axe.
In the DevOps world, applications and system components typically write logs to files. In Kubernetes, containers write to standard output, and the container runtime captures the logs.
However, just having logs lying around is not enough. To benefit from logs, you need to transfer them to a central location so that you can later combine them to solve problems.
In Zelda, you can use Link's Ultrahand ability to grab logs and move them around. This is super useful when you need a lot of logs for a project, but trees are sparse. You can chop down trees in different areas and transfer all the logs you chopped to a central location. There, you can assemble them and build magnificent creations.
In the DevOps world, the equivalent is transferring or ingesting, logs from the machine they are recorded on to a centralized logging solution. This is often called log aggregation. When you manage more than a handful of machines, log aggregation is a best practice. Trying to review or extract log files from individual machines in an ad hoc manner is not scalable.
Often, log aggregation is typically done by installing an agent on each machine. The agent watches the log files and sends them to the centralized logging solution. For example, Sumo Logic offers the industry standard OpenTelemetry collectors as well as installed collectors that perform this task.
You may want to collect additional log information beyond local log files. Sumo Logic offers hosted collectors that can do the job without having to install anything on your organization’s machines.
We’ve collected our logs and aggregated them in one place. Next, we can combine them to solve puzzles and problems.
In Zelda, you combine logs by manipulating them with the handy Ultrahand ability (see what we did there?) and fusing them together. You can chain logs together to create huge bridges, put them next to each other to build rafts, or use them for anything else you can imagine.
With logs working together, you can cross bodies of water, climb slippery surfaces, or deliver items.
You can attach other objects to logs to create even more sophisticated solutions and solve more difficult problems. For example, you may attach a propeller to a few logs, yielding a fully functional raft.
In the DevOps world, combining data from different log files to get useful analytics is at the heart of investigating incidents and finding the root cause of problems. Within the domain of security, forensic analysis relies heavily on audit and access logs to identify breaches and culprits.
Just like in Zelda—where you can combine logs with other objects like sails—in the DevOps world, you often combine logs and log analytics with other forms of observability data, like metrics and distributed traces. This will help you have deeper insights to solve more challenging problems by augmenting and enriching the information you extract from logs.
Logs are great, but they are not a panacea. You must understand how to utilize logs efficiently to achieve gains with minimal pain. Let's look at some logging gotchas.
In Zelda, if you chop too many trees, your axe might get damaged and eventually break.
You can also lose logs if you drop them into the abyss.
Another problem occurs if you're overzealous when striking a log. You destroy it and end up with a pile of chopped wood, which can't be used as a log.
One other limitation of logs in Zelda is that you can't fuse more than 21 logs together.
Finally, some areas simply don't have trees that you can chop down and use as logs. Check out this dreary, icy landscape.
Let's turn our attention to the DevOps world, which has its own logging-related challenges.
In the DevOps world, you can lose and/or destroy logs just like in Zelda. There is a little bit more complexity to this, so let's unpack it.
You can store your logs in cloud storage such as AWS S3 buckets. This mostly solves the issue of disk space, but brings a host of other challenges such as how to organize your logs (one bucket for all logs, one bucket per log, some partitioning inside some buckets, etc) and how to hydrate them. In addition, applications need to be aware of where they are supposed to write their log to and manage cloud API keys. For 3rd party open source applications it’s impossible (without forking) to change their logging implementation.
Logs on machines are usually stored on disks (local or network disks). Disks have limited space. If you're not vigilant and just keep writing more and more logs, you will eventually run out of disk space. This might be detrimental to your applications, which may also need to write to the disk. But it will definitely be a disaster for your logs. In many cases, failure to write logs will also cause your applications to fail, as logging might be treated as a critical function.
The best practice is to control your log file sizes. This is typically accomplished by log rotation, in which you periodically start writing to a new log file while old log files are deleted. Log rotation addresses the disk space issue, but if you want to inspect the deleted log files, you're out of luck. This is where log aggregation comes into play. You can ship your logs to a centralized logging solution, which likely has elastic storage. However, if there is any problem with log transfer (such as a misconfigured log agent or an unreachable log aggregator), then you might lose your logs again.
Let’s assume that you’ve set up everything correctly. All your logs are flowing smoothly, stored properly, and are available for query in your centralized logging solution. The next thing you know, the CFO is on the line asking why 30% of the IT budget is dedicated to logging. Yes, engineers want to record everything and keep it forever. However, in the real world, you need to assess the business value of comprehensively logging everything at microsecond granularity.
So, you hunker down and start reducing the retention period of some logs, the types of events that you capture, and the frequency and amount of captured data for some events. You’ve trimmed down how much you log. The logs you capture are still sufficient to diagnose most problems, but controlled enough to prevent gratuitous spending. The accountants are happy.
Not so fast! The phone rings again. This time, it's a conference call with the head of security and the head of legal. As it turns out, your logs contain credential information like passwords and API keys. In addition, you’re logging sensitive data like customer names, addresses, and email addresses. Your logging practices are violating data privacy policies and compliance requirements.
You grab your team and quickly implement a configuration and redaction system to keep your logs clean of sensitive data. All is well in the world once again.
Logging and log analytics is a useful and valuable practice in both Zelda and DevOps. It can be used on its own but can benefit from aggregation and integration with other capabilities. In the Zelda world, the Ultrahand is your primary tool. In the DevOps world, a centralized observability platform is an essential tool for managing large-scale, distributed systems.
Learn more about how log analytics can help you troubleshoot issues fast so you can get back to playing more TOTK.
Reduce downtime and move from reactive to proactive monitoring.
Build, run, and secure modern applications and cloud infrastructures.Start free trial