ICYMI, this year’s Monitorama marks a return to the in-person event following a pandemic hiatus. In Part 1 of this series, I shared what it was like to navigate a tech conference in the post-pandemic world and the most engaging themes of the conference including tracing, SLIs, OpenTelemetry, and more. Now for Part 2, let’s dive into the talks.
As usual the Monitorama talk selection team did a bang-up job. Every talk was interesting, but a few jumped out at me for some very specific reasons. I’ll broadly categorize these as the best talks (talks that gave me hope and inspiration), the worst talks (talks that terrified or depressed me), and the beautiful talks (talks that captured something beautiful).
The best talks
These talks gave me hope and inspiration.
Sophia Russell (@girlnamedsophia) - The little SLI that could
This talk struck me as being very genuine as well as very pragmatic. The speaker described her journey to drive adoption of SLIs within her company, using a code-first methodology, and like The Little Engine That Could doing what you can, when you can, and continuing to try even when the end goal seems too far away to see.
Jonathan Perry (@yonchco) - Practical eBPF: How to build service-aware network monitoring
I’ll be honest and say that I’ve been looking forward to this talk since last August when I learned about the advancements in eBPF and observability during Jaana Dogan’s talk at the eBPF Summit 2021. The energy in the eBPF community is at 11 right now and this technology will unlock an entirely new level of galaxy-brain in our industry. The most profound thing Jonathan said was his poignant appeal to us, as a community, to build a better database: “we need databases that can handle [...] 100x more data points at 100x less cost per data point”. Of course, the problematic database he was using was Prometheus. Ahem.
Steve Flanders (@smflanders) - Open Telemetry
Much can be said about OpenTelemetry, and much was said. I’m just happy that it got a spotlight talk. The future is open yall, and we will get nowhere fast without an open standard built into the core of all observability pipelines. Let’s make that the big OKR for Q1 2023, if it isn’t already.
Suman Karumuri (@mansu)- KalDB: A k8s native log search platform
This one was more of a personal victory for me than, I think, it was for most of the folks at Monitorama (or most of the folks reading this blog). See, I worked at Twitter for many years and used internal products like Viz, Zipkin, and Loglens, all of which Suman had a hand in building. Also, I’m a search engine nerd. Seeing Suman on stage talking about building the database/search engine at Slack that we 100% should have built at Twitter (but couldn’t for various organizational reasons) was vindicating and exciting, to say the least.
But even without that post-Twitter angst that Suman and I share, you will be excited by the incredible engineering that is going into KalDB, a next-level distributed search engine built on-top of Elasticsearch. Suman is building something deeply important to anyone who cares about logs.
The worst talks
These talks terrified or depressed me.
Note that “the worst” doesn’t mean it was a bad talk, or that the speaker wasn’t amazing, but that the specific contents of the talk left me feeling… disturbed.
Adrian Cockroft (@adrianco) - Monitoring Carbon
Closing out the first day of the conference was a talk from Adrian Cockroft. If you don’t know his name, then you absolutely should. His is one of the most brilliant minds in computing in the past few decades. We had the honor of seeing him speak just days after he announced his retirement.
Most recently Adrian was working as VP Sustainability Architecture at AWS. His talk covered a growing concern - calculating the carbon impact of a business. This is already something that companies in the EU need to report on, and soon every public company in the US will too. It’s as serious as taxes, and in fact, works a lot like taxes. If your company is responsible for too much carbon, you’ll need to pay it down, by buying carbon offsets. Adrian’s talk covered all this and dove deep into the methods used to calculate this debt and some of the big challenges facing our industry’s role in climate change.
To me, this talk was terrifying because the way I see it, we are doing too little too late to address climate change. We can’t legislate and litigate our way out of this, and it’s disappointing that, while large companies like AWS are in fact doing a lot to offset their carbon impact, I was hoping to hear that they were taking a much more aggressive approach, and that government entities were going to set much more rigorous standards. If climate change doesn’t scare the pants off you, then you aren’t paying attention.
Alex Hidalgo (@ahidalgosre) - Meaningful Measurements: Lessons from Outside of Tech
Alex’s talk was a little less species-level-existential-crisis terrifying than Adrian’s, but it covered a wide range of human tragedies caused by monitoring and control system failures. A litany of high body count train crashes, plane crashes, and more. What I loved about this talk was the way that it made what we do very real, and also how it gently intertwined the concepts of control theory and observability (aka cybernetics), with real-world examples. I think the Next Big Word™ is going to be cybernetics (jokes on you, it’s already the big word, because Kubernetes and Cybernetics are the same word).
Clint Sharp (@clintsharp) - Universal Observability Requires Universal Instrumentation
This talk covered an interesting new tool called AppScope. And by interesting I mean absolutely horrifying. Why? Because AppScope wears the grayest of gray hats. It is a hacker’s approach to instrumentation. The tool, which self-describes as being “like strace meets tcpdump” can automatically instrument any piece of code (including curl) by “loading a library into the address space of a process and establishing a number of interposed functions”. This requires root access, using ptrace to accomplish said wizardry.
Why is this horrifying? Because AppScope makes it easy and accessible to expose a massive security hole. As my co-worker Jef said, “this is basically what SELinux was designed to prevent”.
Harish Dixit - Detecting silent errors in the wild
Now, for the pinnacle of existential techno-psycho-horror, we have a representative from Meta (aka Facebook). No, I’m not talking about the extremely creepy future that they are pushing, one in which, without legs, we live out our lives in a pixel-starved 3D meeting room, doing exactly what we’ve always done, but somehow worse, somehow less human, and more deeply alienating than our current world of Zoom-fatigued isolation… No, instead Harish brought us a new reality-shaking truth that will deeply upset anyone who has ever used a computer or like, shown their work on a 5th grade math test.
Sometimes 1+1 does not equal 2. Why? Because sometimes, the processors that we so deeply rely on, and so fervently trust, have minor manufacturing flaws in the silicon. Sometimes they have non-error-state false-positive results. Sometimes, and only in certain conditions, with certain math problems, at certain temperatures, they will do math wrong. And that goes entirely unnoticed, most of the time. Unless you happen to be building a product like Facebook, which has racks and racks of what should be identical servers, but which sometimes, turn out not to be identical and strangely, produce unexpected incorrect results.
I’m used to bugs being part of code. That’s human error. We know and love this fallibility. It’s one of the things that sets us apart from our beloved soon-to-be-overlord machines. But I am not used to living in a world where my calculator can’t be trusted to do basic math correctly all the time, every time. Harish walked us through this unfortunate truth, and showed how finding and remedying this is extremely, extremely difficult. Harish is a master of debugging and his description of the process used to find this problem was truly fascinating… and left me deeply shaken.
The beautiful talks
These talks had moments of beauty and humor that I really appreciated.
Pete Cheslock (@petecheslock) & Jason Dixon (@obfuscurity) - An Industry Retrospective: Through the lens of Monitorama sponsor logos
Due to an unfortunate last-minute cancellation by one of the scheduled speakers, there was a talk slot to fill on Tuesday afternoon. The two main organizers of the conference, Jason and Pete, decided to use that time to give a funny, beautiful, and heart-warming presentation about the history of Monitorama, told through the logos of their sponsors.
I was full of pride seeing this, because Sensu has been a part of Monitorama since the beginning, offering not just sponsorship, but also lots of practical volunteer support (including at least two volunteers this year!).
Another theme in the talk was how many of the companies from the early days had been acquired by other companies. As the years progressed, you’d see their logo get an addition of “by XYZ” tacked on post-acquisition. This was funny for us because Sumo Logic and Sensu have both been sponsors in the past, but this year was the first Monitorama where our logo carried the “by Sumo Logic” addition.
Also, regular conference goers will recognize Caleb Hailey (@calebhailey), Sensu’s co-founder, from his many moments on stage in previous years. This year, he took the stage representing Sumo Logic for the first time, and indeed, was the first to challenge the notion that vendors are not embracing OpenTelemetry after Corey’s talk.
Sensu’s past is deeply intertwined with Monitorama’s, and Sensu’s future with Sumo Logic is deeply intertwined with OpenTelemetry, which is at the core of all of our future product plans.
Honorable Mention: On the topic of “recently acquired companies”, Austin Parker (@austinlparker) of recently-acquired Lightstep said “you might have heard of our glorious corporate acquirers, ServiceNow. They’re great. I love them. <clears throat>”... in perhaps the most unconvincingly flat monotone I have ever heard. Responding to the crowd’s laughter Austin countered with “No really, they’re good people to work for. Everyone thinks that’s a bit. It’s not a bit! Okay...”. Austin, also an OpenTelemetry maintainer, was disappointed with Corey’s hot-take on OTel, saying “He’s officially off my Christmas card list!”.
Joy Scharmen (@peculiaire) - Starting smart and planning for growth; what I wish I knew
I really enjoyed Joy’s story of how she came to work in the tech industry. In her own words, “I have been working in systems and network engineering since 1996, when I dropped out of my art degree and just kept working in the computer lab at my college”... Sooo relatable. I wonder how many of us are frustrated artists working in tech, vs frustrated engineers working in art?
Also, in terms of beauty, Joy’s slides were very colorful and engagingly simple. I loved the visual aesthetic she used for them. Putting that unfinished art degree to good use!
Honorable Mention: Since I’m talking about slides, Adrian Cockroft’s slides were also very beautiful. They featured a background of the night sky with the moon slowly moving across it as the slide deck progressed. Moon-itorama?
Leon Adato (@leonadato) - Technical Empathy
There was so much about Leon’s talk to discuss that it’s hard to know where to start. His talk was funny, irreverent, engaging, and full of important insights. One of the quotes that really stood out to me was when he was discussing the need for accessible design in all our products:
“If you’re thinking that you’re going to go back to the office and somebody there is going to say ‘Well, I don’t know. There’s just not enough demand’... F*** YOU, THERE IS!”
That’s right, Leon. That’s the kind of empathy we’re looking for. It’s worth a few F-bombs to encourage folks to be more inclusive and build acessiblity in from the start. There’s a huge demand.
Fred Moyer (@phredmoyer) - SLIs, SLOs, and Error Budgets at Scale
I loved Fred’s talk and especially the discussion of error budgets. But the moment that made it beautiful to me was when he flipped to the slide reading only “Latency AND Availability”. Which captures the essence of our observability work in the most boiled-down simplest possible terms.
I’m so glad that Monitorama 2022 happened, and that I was able to attend without the need to travel to another city. It was awesome to finally meet some of my coworkers and favorite industry peers in person, and it was inspiring to see how much is still happening in our space.
Bringing people together is very important, and these kinds of events are sorely missed. I hope we can find a way to continue doing this safely in the future, hopefully with better ventilation, more observing, more OpenTelemetry, and less Prometheus.
Complete visibility for DevSecOps
Reduce downtime and move from reactive to proactive monitoring.