Monday, 29 October 2012

Monitoring Data Centers in the path of Hurricane Sandy

Using The NOC (www.onms.net), I was able to monitor a bunch of data centers in the path of Hurricane Sandy, the results highlighted something I never really considered...

When the charts indicated the data center was unreachable from Sandy I would try and access them from my local machine as well as their status updates on twitter or status websites when available. Mostly when the charts indicated they were offline I could find a status update confirming this, however for example with NYI (www.nyi.net) which was online the entire time I have clear indication that they dropped off...

After a quick investigation using BGPlay (http://bgplay.routeviews.org/) it would seem that the various links between my monitoring system in Kansas and the data centers where offline due to Sandy or flapping and were clearing influencing my access to NYI although other routes to NYI stayed online all the way (clearly the reason for Netcrafts 100% uptime article about NYI). NYI definitely did an amazing job, my main point is from a pure monitoring perspective: 

1. Have more than one monitoring location if possible, especially as certain locations may give you an all clear and others indicate problems (not that NYI could do much about this in any case)

2. Keep in mind your BGP peers are uber important, similar to what happened during Hurricane Katrina a data center may remain online but your peers may fail, especially due to fuel shortages, in Katrina the one data center even delivered fuel to a peer to keep them online, perhaps something to consider in planning.

3. If your data center offers monitoring of your servers/services it should by no means be your only monitoring service, if an entire data center goes down chances are you won't be receiving any notifications.

No comments:

Post a Comment