04.17.2012

RAFC: Internet Connectivity for the cloud age

Sumo Logic's Mountain View Data Center

RAFC - Redundant Array of Flaky Connections

We are big believers in Cloud Computing — 100% of our own infrastructure is in the cloud. In our first office, we learned that reliable and fast internet connectivity is absolutely crucial. When all your infrastructure is in the cloud, all work grinds to a screeching halt whenever connectivity is lost. In that office, we had a single, “business class” symmetric 10MBit link. In short, it sucked.

When we moved to our new 605 Castro Street office last year, we decided to try a different approach. We took design cues from web-scale applications: Pool commodity resources. Distribute load over the pool of resources. Anticipate failures. Scale horizontally. In concrete terms: 

  • Set up multiple consumer grade internet connections.
  • Buy a router that supports multiple-WAN load balancing and failover.
  • Add more consumer grade internet connections when more bandwidth is needed.
So instead of a single symmetric 10 MBit link, we ordered:
  • 100MBit/10MBit cable modem connection (Comcast).
  • 25MBit/5MBit bonded DSL connection (Sonic.net). 

We call it ”RAFC” - or “Redundant Array of Flaky Connections”. Combined, these two connections cost around $530/mo (with free Cable TV!), or about 65% less than our previous connection, which ran $1,500/mo. Instead of 10/10MBit, we now have 125/15MBit. 

The trickiest part was finding the multi-WAN router we liked. After trying a FortiNet Fortigate box, a Cisco ASA, a Netgear “business class” box, we settled a unit made by a company called Peplink.

Peplink’s entire business is built around doing multi-WAN routers right, and it shows: the box is impressive. It’s very easy to set up, supports a rich set of features, doesn’t crash, has great monitoring capabilities (including syslog, which we feed into Sumo Logic). The Peplink still also has a 3rd WAN port for future growth — horizontal scalability. When we discovered that one of our connections is more reliable, while the other one was faster, we adjusted outbound rules on the Peplink accordingly. SSH connections use reliable connection, S3 transfers use fast one. Most other traffic is load balanced in proportion to the uplink/downlink available on each connection. When one connection fails, all traffic fails over to the other one. This took about 5 minutes to set up. 

At this point, this setup supports more than 30 users in our office, and while we have connection outages almost daily, nobody notices. Connectivity has not been an issue in months. 

«
  1. R says:

    Just purchased a Peplink Balance 580 and 3 x Peplink Balance 310′s for our business also… Am completely content and happy with the devices not just living up to the expectations, but surpassing them.

    Love the “RAFC” Acronym too…

  2. Sumo says:

    > we now have 125/25MBit

    10 +5 = 25 upstream?

    • Stefan Zier, Cloud Infrastructure Architect says:

      Haha. Glad somebody’s paying attention :) Fixed! Thanks for pointing it out!

  3. Glenn Beeson says:

    RAFC. I cannot believe how unbelievably accurate that is. New favorite acronym at work. Yes – I have favorite acronyms for other places as well.

  4. lex ein says:

    Good bandwidth for the buck, and good decision picking Annex M for the Sonic lines.
    How bad, really, were the FortiNet Fortigate box, the Cisco ASA, and the Netgear “business class” box?
    In your neighborhood, how often does the Sonic bonded DSL (ADSL2+, right?) fallback to single channel or none, compared with how often the Comcast cable slows/blacks out? It would of course be great to see a 1 week graph showing their comparative availability, but that’s extra work…

    • Stefan Zier, Cloud Infrastructure Architect says:

      The other boxes were really ok for the most part, but all fell short when it came to the outbound load balancing. One big issue with most of them was that you couldn’t “stick” outbound SSL traffic to one of the outgoing WAN connections. Many sites break as a result (when your IP address keeps flip/flopping).

      In terms of failures, the Comcast link is the clear “winner”. The initial hardware they gave us (made by SMC) would crash almost daily. After months we convinced them to let us use our own Motorola SB6120 modem, and that’s been working much better. Still, it locks up about every other week or so. Not a big deal, though, the Peplink emails me, and when I go downstairs anyways I power cycle it on my way to the espresso machine :) Nobody usually notices.

      In fact, every once in a while we go without the Comcast link for a few days until I remember to turn it back on :)

Twitter