It was the shot heard round the hosting world. Last month, my good friends at RagingWire announced their latest offering, IronScale, which has the potential to fundamentally change the hosting business. At least, that’s what the press release and the voice mail I received from Doug Adams, their head of sales claimed. Now, I’ve been doing business with RagingWire for almost 8 years, and I often tell people they have the best designed/built/run data center in Northern California, so I know they offer great services. I’m one of their only three-peat customers (I’ve put three different companies into their facility) and I’ve never been disappointed. Still, I tend to discount terms like “game-changing” as marketing fluff. I’m a “show-me” kind of guy. So they did.
Today I had the pleasure of an on-site demonstration and walk through of the IronScale service. I am impressed. On the surface, it is a typical managed server hosting offering. You rent one or more dedicated servers in their data center and they provide the operating system, network, internet bandwidth, security, etc. Pretty common stuff, and pretty boring. Why did I drive to Sacramento on one of the hottest days of the year for this (110F)? Well, you have to look beneath the surface, which I did, to see what they are really offering. At what I saw was awesome. (read more…)
I’ve spent the past few days trying to develop a simple mathematical model to predict the expected availability of complex systems. In IT, we are often asked to develop and commit to service level agreements (SLAs). If the points of failure of the system are not analyzed, and then the system availability calculated, the SLA is flawed from the beginning. To complicate matters further, different people have different definitions of availability. For instance, does scheduled downtime for maintenance count against your system availability calculation?
Common Availability Definitions:
Availability = MTBF/(MTTR+MTBF) (Mean Time Between Failure, Mean Time To Recover). This is a classic definition of availability and is often used by hardware manufacturers when they publish an availability metric for a given server.
Availability = (Uptime + Scheduled Maintenance)/(Unscheduled Downtime + Uptime + Scheduled Maintenance). This is an IT centric availability metric where the business can support scheduled downtime after hours. This model works for some types of systems, such as a file server that isn’t needed at night, but it doesn’t work as well for websites, even though many web companies still use this for their SLAs.
Availability = Uptime/(Uptime + Downtime). This metric best applies to systems that are needed 24×7 such as e-commerce sites.
Availability is most often expressed as a percentage. Sometimes, people will refer to “four nines” (99.99%) or “five nines” (99.999%). To simplify things, the following table shows the minutes of downtime allowed per year for a given availability level: