Running infrastructure at Swiggy’s scale requires a great amount of observable insight into what the systems are doing and how they are performing, in order to be able to make sure that the service delivery is seamless. This means being able to look at metrics, look at logs, look at events happening on each of the systems and servers, be able to examine them, discover early warning signals of something going wrong, and intervening quickly.
In order to do this, Swiggy needed infrastructure that allowed them to collect, store, and analyze logs as they come in and to be alerted on conditions that emerge out of that analysis. While they started off by doing this with their own, in-house infrastructure, they started seeing problems as they hit scale. This led to considerations about what the core technological innovation that they wanted to focus on was. Was logging with an inbuilt capability as important to them, or was it something that was better served through an appropriate off-the-shelf solution?
Swiggy was looking for a solution that would eliminate their operational overhead, scale with their requirements, was easy to use and operate, and gave them the visibility into their infrastructure - logs, metrics, monitoring and alerting. With these requirements they realized a powerful off-the-shelf solution was the answer. It was with this perspective they selected Sumo Logic.
Swiggy’s primary use of Sumo Logic was to ensure that their systems are always up, always serving customers, and never going down and that they were in constant control of what is happening. Every engineer in Swiggy has access to Sumo Logic logs. Any traces that the system emits allow them to detect something going wrong, especially in an early warning sense. The observability lens that Sumo Logic gives them is a critical overview day in and day out for routine monitoring sets and debugging in situations where an emergent problem presents itself. The observability lens also helps Swiggy from a long-term planning and optimization point of view, where their systems decide an operation could be better.