Is your data center ready for the coming zombie apocalypse? Data center designers generally do a good job preparing for conventional risks, like earthquakes, fires, floods and hurricanes, but if your disaster recovery plan doesn’t include provisions for dealing with the undead, your risk mitigation strategy has a gaping hole. Data centers are a natural refuge from zombie hoards, but only if you prepare in advance.
Unlike conventional disaster recovery (DR)/business continuity planning (BCP), zombie preparedness has a unique set of goals beyond data protection and business resumption. RPO/RTO goals go out the window when there’s a geek chewing on your skull. I generally recommend hiring a zombie specialist to develop your zombie survival plan (ZSP) but there are steps you can take on your own.
Start with establishing the goals for your ZSP. For most organizations, ZSP goals will fall into 5 categories
Containment – Keep the zombies out
Endurance – Stay alive until the zombies are gone
Sustenance – Don’t go hungry
Eradication – Kill every zombie you find
Repopulation – Breed new humans for the continuation of the race
A good ZSP is measurable and testable. Data centers are used to measuring availability and power usage effectiveness (PUE). Your ZSP needs a similar metrics program. A best practice is to assign weighted values to your ZSP goals, measure them quarterly, and report to executive management on your composite zombie protection effectiveness (ZPE) score. (read more…)
Choosing a data center is a big decision for most companies. Your IT infrastructure represents a critical asset for your company, and unless you are an uber-dot com company like Google or Facebook (which spread their gear around the country in tens of locations), you probably only have one or two data centers. Changing data centers is expensive and time consuming, so choosing the right data center partner is incredibly important.
Unfortunately, data centers don’t make it easy on you to differentiate between them. Everyone says they are “secure,” “highly available,” and “high density.” They all show you their generator farms, their battery rooms, and their security vestibules with bullet proof glass. Tour any three data centers and you’ll be left scratching your head trying to figure out what the difference is. As a result, many people end up using price and proximity as the primary decision points. Or even worse, they look at non-material amenities like free sodas and xboxes in the break room as the deciding factor.
There are critical differences, however, between data centers. Failing to recognize them can cost you more in the long run than any savings you might glean by choosing the low-cost provider. Having purchased services from a multitude of data centers over the last two decades, and having dealt with even more as an IT consultant, I’ve learned to recognize some of the hard to spot differences that can make or break a long term data center relationship. For simplicity (so you can copy/paste into your next RFP), I’ve listed the 10 questions you should ask your next data center below. A detailed explanation of each question follows, so you know what you should look for. I hope you find this list informative.
10 questions to ask your next data center provider
Which components of the data center facility are both fault tolerant and concurrently maintainable?
How are cooling zones provisioned to maintain operating temperatures during maintenance or failures of CRAC/CRAH units?
What are the average and maximum power densities of the facility on a watts/sq’ and watts/cabinet basis?
How often does the data center load test its generators?
What are the highest risk natural disasters for the area, and what has the data center done to mitigate their impact?
What are the minimum skill sets of the remote hands and eyes staff?
Does the data center maintain multiple redundant sources of fuel and water?
What certifications has the data center earned, and do they undergo annual audits to maintain them?
How does the data center track SLA compliance, and what is their historical track record? Can they provide their last 5 failure reports?
What is the profile of their top 5 clients, and what percentage of total revenue for the facility do they represent? (read more…)
It was the shot heard round the hosting world. Last month, my good friends at RagingWire announced their latest offering, IronScale, which has the potential to fundamentally change the hosting business. At least, that’s what the press release and the voice mail I received from Doug Adams, their head of sales claimed. Now, I’ve been doing business with RagingWire for almost 8 years, and I often tell people they have the best designed/built/run data center in Northern California, so I know they offer great services. I’m one of their only three-peat customers (I’ve put three different companies into their facility) and I’ve never been disappointed. Still, I tend to discount terms like “game-changing” as marketing fluff. I’m a “show-me” kind of guy. So they did.
Today I had the pleasure of an on-site demonstration and walk through of the IronScale service. I am impressed. On the surface, it is a typical managed server hosting offering. You rent one or more dedicated servers in their data center and they provide the operating system, network, internet bandwidth, security, etc. Pretty common stuff, and pretty boring. Why did I drive to Sacramento on one of the hottest days of the year for this (110F)? Well, you have to look beneath the surface, which I did, to see what they are really offering. At what I saw was awesome. (read more…)
I’ve spent the past few days trying to develop a simple mathematical model to predict the expected availability of complex systems. In IT, we are often asked to develop and commit to service level agreements (SLAs). If the points of failure of the system are not analyzed, and then the system availability calculated, the SLA is flawed from the beginning. To complicate matters further, different people have different definitions of availability. For instance, does scheduled downtime for maintenance count against your system availability calculation?
Common Availability Definitions:
Availability = MTBF/(MTTR+MTBF) (Mean Time Between Failure, Mean Time To Recover). This is a classic definition of availability and is often used by hardware manufacturers when they publish an availability metric for a given server.
Availability = (Uptime + Scheduled Maintenance)/(Unscheduled Downtime + Uptime + Scheduled Maintenance). This is an IT centric availability metric where the business can support scheduled downtime after hours. This model works for some types of systems, such as a file server that isn’t needed at night, but it doesn’t work as well for websites, even though many web companies still use this for their SLAs.
Availability = Uptime/(Uptime + Downtime). This metric best applies to systems that are needed 24×7 such as e-commerce sites.
Availability is most often expressed as a percentage. Sometimes, people will refer to “four nines” (99.99%) or “five nines” (99.999%). To simplify things, the following table shows the minutes of downtime allowed per year for a given availability level: