October 29, 2007

In Search of Five 9s – Calculating Availability of Complex Systems

I’ve spent the past few days trying to develop a simple mathematical model to predict the expected availability of complex systems. In IT, we are often asked to develop and commit to service level agreements (SLAs). If the points of failure of the system are not analyzed, and then the system availability calculated, the SLA is flawed from the beginning. To complicate matters further, different people have different definitions of availability. For instance, does scheduled downtime for maintenance count against your system availability calculation?

Common Availability Definitions:

  1. Availability = MTBF/(MTTR+MTBF) (Mean Time Between Failure, Mean Time To Recover). This is a classic definition of availability and is often used by hardware manufacturers when they publish an availability metric for a given server.
  2. Availability = (Uptime + Scheduled Maintenance)/(Unscheduled Downtime + Uptime + Scheduled Maintenance). This is an IT centric availability metric where the business can support scheduled downtime after hours. This model works for some types of systems, such as a file server that isn’t needed at night, but it doesn’t work as well for websites, even though many web companies still use this for their SLAs.
  3. Availability = Uptime/(Uptime + Downtime). This metric best applies to systems that are needed 24×7 such as e-commerce sites.

Availability is most often expressed as a percentage. Sometimes, people will refer to “four nines” (99.99%) or “five nines” (99.999%). To simplify things, the following table shows the minutes of downtime allowed per year for a given availability level:


Min Downtime/Year

Hours Downtime/Year

95.000% 26,298 438
98.000% 10,519 175
98.500% 7,889 131
99.000% 5,260 88
99.500% 2,630 44
99.900% 526 8.8
99.990% 52.6 .88
99.999% 5.26 .088

(read more…)

          Comments (20)

October 15, 2007

When good security goes bad

My new job with StubHub came with a host of excellent benefits, including a shiny, new 401K with Charles Schwab. Schwab is generally known as a good, stable company with a strong online presence, so I was shocked by what arrived in the mail today. About a week after signing up for my 401K, I received a letter from Schwab titled “Confirmation of Personal Identification Number Change,” and right below the subject line is the password I had chosen for the website! To make matters worse, the letter came in an envelope from Charles Schwab labeled “Personal and Confidential,” ie. “STEAL ME.”

This letter got me thinking about all the supposedly strong security mechanisms employed by various online companies that I deal with that just make matters worse. The schwabplan.com PIN # confirmation is just one example. I used one of my common passwords expecting Schwab would treat it with the utmost care. To me, this would mean storing it in an encrypted, non-human readable form. Ideally, the password itself would not be stored at all. Instead, a hash of the password would be stored, and any time I entered my password, the hash of what I entered would be compared to the stored hash. This would protect my password from unscrupulous Schwab insiders, since statistics show that approximately 70% of security breaches occur from the inside. (read more…)

          Comments (5)

October 4, 2007

Hannah Montana is my new best friend!

Two months ago, I had no idea who Hannah Montana was. My daughter is too young, thankfully, to care about Hannah and my nieces had not yet introduced me to the phenom. Now, she is my best friend. I love her! she rocks!

I should probably admit that I still really don’t have any idea who she is, and have never heard her music. The reason for my new found respect for Hannah is that two months ago, I changed my day job. I am now running technical operations for www.stubhub.com, a subsidiary of eBay.

If you’ve never heard of StubHub, click the banner on the left. StubHub is the leading secondary marketplace for concerts, sports events, and theater. If you want tickets to the World Series, the Super Bowl, or a sold-out concert, there is no better place than StubHub. And right now, Hannah Montana and Baseball Playoffs are the hot tickets.

People are going nuts for Hannah. As I write this, floor seats in Oakland, right in front of the stage are going for $1,500. There is also a luxury box with 20 tickets for over $11,000! This is the gotta have, must see, take me PLEEEAAASSSEEE!!!!! concert of the year. I love it!

I joined StubHub because it is truly my kind of company. First, it is a company with a solid foundation in the bricks-and-mortar world. People have been “scalping” tickets for a long time. By creating a neutral online marketplace, and backing it up with solid logistics and world class customer service, StubHub became the dominant player in the secondary ticket market. Second, it is a company that values its technology and its technologists. As such, it is a great place for an IT guy to work. Lastly, it is growing exponentially. The opportunity to design and build a highly-scalable, highly-available technical architecture was one I could not pass up.

Conceptually, I also love the free-market approach to ticket sales. StubHub does not take inventory of the tickets. We offer a secure place where fans can buy and sell tickets, and let the free market, not the ticket promoters set the market price. Hannah is a great example. The news is full of articles this week on parents complaining of being “gouged” by the ticket brokers. The Attorney General of Arkansas is investigating! What the people crying about the price seem to forget is the old laws of supply and demand. If they weren’t so desperate for the tickets, the price would fall.

I’ve neglected the blog lately, trying to get up to speed with the new gig. In the coming months, I have a bunch of articles planned based on the scalability challenges I am now facing. They should be worth the wait. In the mean time, visit StubHub, buy some tickets and go see Hannah. Let me know how you like the show.

PS> If you simply must see Hannah (in other words, you have a daughter), follow the StubHub advice for buying Hannah tickets. It may save you some money.

          Comments (0)

June 5, 2007

It’s Still the Latency, Stupid…pt.2

Buy this book!In part 1 of this series, I established the problem latency can cause in high speed networks. What one reader correctly referred to as “big long pipes.” To summarize, in large bandwidth networks that span long distances, network latency becomes the bottleneck that retards performance. The reason for this the impact of network delays on TCP windowing. In part, 2 I will discuss what to do about it.

Dealing with latency can be tricky business. The methods used to mitigate the impact of distance depends on many factors including the services being accessed, the protocols being used, and the amount of money you want to spend. What works for a home user does not work for a multi-national corporation. In general, there are 4 approaches one can take to deal with latency:

  1. Tweak the host TCP settings
  2. Change the protocol
  3. Move the service closer to the user
  4. Use a network accelerator

The first and least effective method is to tweak the TCP settings on your hosts. I say least effective for several reasons: It is hard to determine the correct TCP window size; not all operating systems support the RFC 1323 extensions; you may not have control of all the hosts; available bandwidth may change due to network congestion. Most importantly, some time sensitive applications such as VOIP will still exhibit problems in high latent networks, even if you tweak TCP. Still, if you are a home user on a big long pipe, this is the only option for you. Changing TCP is OS specific. Slaptijack.com has an excellent series on TCP tuning operating systems. Below are links to his specific guides as well as other sources: (read more…)

          Comments (18)

May 31, 2007

It’s Still the Latency, Stupid…pt.1

Buy This Book!One concept that continues to elude many IT managers is the impact of latency on network design. 11 years ago, Stuart Cheshire wrote a detailed analysis on the difference between bandwidth and latency ISP links. Over a decade later, his writings are still relevant. Latency, not bandwidth, is often the key to network speed (or lack thereof).

I was reminded of Cheshire’s article and the underlying principles recently when working on an international WAN design. What Cheshire noted was that light signals pass through fibre optics at roughly 66% of the speed of light, or 200*10^6 m/s. Regardless of the equipment or protocols you use, your data cannot exceed that theoretical limit. This limit equals the delay between when a packet is sent, and when it is received, aka latency.

In the US, we tend to focus on bandwidth and carrier technology when ordering circuits, completely ignoring latency. For instance, when choosing between cable and DSL for your house do you ever ask the carrier for its latency SLA? Maybe you should. Using a cable connection a ping to www.google.com in Mountain View, CA from my house (137 KM) yields an average ping time (aka round-trip time or RTT) of 73ms. The theoretical latency for this distance (round trip) is 1.37ms meaning my cable connection is roughly 50 times worse than the theoretical limit. No surprise that Comcast focuses on bandwidth and not latency in its marketing. (read more…)

          Comments (50)

May 30, 2007

500GB/Month of bandwidth. How fast is that, really?

Gimmee Bandwidth Bumper StickerRecently, I was evaluating ISP’s for my hosting requirements. If you take a gander at 1-and-1, or most of the providers on the Personal Colocation site (and almost every other hosting provider in the world) they apportion your bandwidth in GB per month. Exactly what does this mean to people that are more familiar with buying bandwidth by the circuit? Exactly how much bandwidth is 500GB/Month? Is that equivalent to T1 internet (DS1 or E1 for you euros?) (read more…)

          Comments (5)

May 25, 2007

Web Proxies – Surf the Net Anonymously

Today we launched our own anonymous web proxy: http://www.edgeproxy.net. Like most security tools, anonymous proxies are incredibly useful but also controversal. Web proxies mask your activities on the net in two ways: First, they allow you to access one web site through another, hiding you IP address from the target; Second, they encode the target URL hiding it from any local firewalls or proxies you might be sitting behind. They are great for pen testing where you want to hide your activities, especially if you want to mask your location. They are a nightmare if you are trying to manage a web filter and your users are able to bypass your filters.

Web Proxies are very popular among with students whose schools block access to MySpace and Facebook. We launched it because we needed a reliable proxy we control for testing. We debated whether it was wise to provide a public vehicle for bypassing someone else’s security controls, but felt in the end that adding one more proxy on the net will not increase the web’s threat profile. Our TOCs state that we will cooperate will law enforcement if we determine that our site is being used for nefarious purposes. Hopefully, that will be enough to scare away those who hide behind proxies to abuse the web.

          Comments (3)

March 17, 2007

Lockdown Windows 2003 & XP with Simple Scripts

Windows Advanced ScriptingNow that DST 2007 is over, we are going to start a series of articles on securing systems and networks. I have built a lot of systems for various companies over the years. The challenge is to create repeatable processes that work in a variety of operating environments. Having a strong scripting toolkit can make all the difference, especially when you are under deadline.

The first script in the series is a Windows Services lockdown script for Windows XP & 2003. Disabling services is generally a good idea to reduce the threat profile of your computer, and to improve its performance. Every security guide out there tells you to disable unnecessary services. A few of them also give some guidance as to which services are unnecessary. Few of them tell you how to disable them consistently.

There are three ways to disable services: 1) Use the Services MMC GUI. This is a time consuming process and is prone to mistakes. 2) Use Group Policy. This works well for environments that use Group Policy, but is harder to implement for stand-alone servers, such as web servers. 3) Use the sc.exe command line utility.

If you do not know the sc command, learn it! sc is a powerful utility for controlling services on local or remote hosts. sc will let you configure how services start, change the user account and password they run under, and start/stop/pause the services. The basic syntax of sc is:

sc <server> [command] [service name] <option1> <option2>

We are going to use 2 different sc commands in our service lockdown script: config & stop. These should be self explanatory, but config will allow us to disable the service, and stop will stop the service. To make this work, we need three files: 1) The script batch file; 2) a list of servers by name called hosts.txt; 3) a list of services we want to disable called services.txt. The two text files must be in the same directory as the batch file. The code is fairly simple: (read more…)

          Comments (6)

March 7, 2007

Microsoft Releases Updated Mobile DST Fix

Microsoft has released an updated daylight saving time fix for Windows Mobile. Nice of them to wait until 5 days before the change! I am recommending everyone use the official patch found here: http://www.microsoft.com/windowsmobile/daylightsaving/default.mspx, but I will leave my unoffical patch online.

I’m noticing a trend that many vendors are releasing last minute patches to fix DST issues with their 1st round of patches. If you have patched your systems already, I HIGHLY recommend you recheck with all your vendors to make sure they haven’t released an update. Good luck to all for this weekend.


          Comments (2)

February 10, 2007

Cingular BlackJack For Free!!!

Amazon is now selling the BlackJack for FREE!!!CLICK HERE. Amazon changes its specials frequently, so I would not expect this deal to last. As we’ve discussed, this is a great phone.

With a 100% rebate, how can you lose? Order today.

          Comments Off on Cingular BlackJack For Free!!!
« Previous Page« Previous entries « Previous Page · Next Page » Next entries »Next Page »