You also can’t understand how to apply this to new concepts. For example, this same article could be applied to a RAID 1 drive configuration, but not directly to a RAID 5. However, only a slight modification would be needed to do so if you understood what’s doing on.

The most realistic model would need to include Bayesian probability incorporating each type of fault that could cause down time. A code bug has a high probability of taking all servers to a down state, but a motherboard damaged due to ESD would most likely only affect a single server. All the things everyone is mentioning can be accounted for by other much more complex models.

Take this for what it is. It’s a good engineering approximation to demonstrate that adding servers to create redundant systems eventually has diminishing returns.

]]>I think you are correct on both points. The equation is still correct:

As = Ac(n-1) + ((1 â€“ Ac(n-1)) * Acn)

What you are doing is producing a more accurate calculation for Ac(n-1) during peak periods. My paragraph was in fact too simplistic. You could get more refined that that if you have good data for your traffic patterns and loads. You could calculate it based on 30-minute or 1-hour time slices. It depends on how sophisticated you want your model. Thanks for pointing that out!

]]>You might be right on both cases, but I need to spend some time thinking about it.

]]>To make sure I understand this correctly, here some questions regarding the paragraph and load & avaiabliity:

1. “… but under peak, we need three servers? … but under peak load, the availability would drop back to 85%. …”. Shouldn’t the availability be 85% * 85% * 85% in this case?

2. “… What if our peak load required 2 servers? In this case, the availability under peak would be 97.75%… “. In case, we need at least any two servers to be up. In my calculation, it is 85% * 97.75% + ( 1 – 85% ) * 85% * 85% = 93.93%; when the first server is up, we just need at least one server up from the other two; when the first server is down, we need both of the other two up.

Also in consultancy when making recommendations customers always expect 97% + Availabilty with minimal costs, having a table such as the one you have above will make it easier for me to convince them to either increase their H/W Budget or expect less in terms of availability.

Many thanks for sharing with us.

]]>I don’t think I understand your question. It depends on your SLA. If your SLA is a next business day response and you closed the ticket before 17:00 the next day, you were within your SLA so that is 100%. Are you asking if I have a spreadsheet that tells you for any given day, what the next business day is?

]]>How are you ?

Sorry, but do you have any spredsheet to calc SLA ? For example: I Opened the ticket yesterday at 16:59PM and this ticket has closed today at 10:00AM anddd my SLA ir Worked days (08AM to 17PM)…By the way if you calculated this, you find the 3hours and 1 minut. All Rigth?! So can you help me with this please? If you don’t understand my problem, contact me please 😀

Have a nice day! See you later!!!

Best Regards,

Raphael Teixeira

]]>Excellent read for anyone embarking on system availability calculation.

Thanks for sharing.

I’m not sure I understand your question, although I’d really like to. I’d consider all IT systems non-deterministic, but there is enough determinism in the constituent components that when designed properly, you can have a reasonable chance at accurate prediction. You used the example of a private communication system deployed with scada. Under that scenario, you would estimate an expected availability (Not necesarily the SLA contracted for) to derive its impact on the total availability of the system. If its expected availability is not sufficient for the desired end result, you would then need to add redundancy, possibly with a second independant communication system.

So, let’s posit that we have a multi-site scada system, which uses MPLS to communicate under nominal operations. We have an SLA of 99.999% from the carrier, but we only believe it is likely to be 99% in the real world. We could then choose an alternate, such as a cellular system, that would take over if MPLS is down. It would be important to make sure the cellular system doesn’t rely on the same MPLS network, ie its with another carrier and uses different backhaul fiber. We would then estimate its availability also at 99%. This means that there is a small likelyhood that both systems will be down at the same time. This is the same calculation as equation #2 above and would give us 99.99% expected availability. If this isn’t enough, we might choose a 3rd option. It depends on the cost of downtime versus the cost to mitigate the expected risk. I hope this answers your question. If not, please restate it. Thanks for commenting!

]]>Availability calculations in deterministic systems may give a logical prediction, while same availability works might not be logical in non deterministic systems

ex: private communication system deployed in a real time scada system under such availability calculations give an engineering since….

While utilizing public communication systems (mpls, 3g, ..etc) would imply a non defined risk even the operation sign a contract or agreement to guarantee service continuity . in this case availability will not be that much easy to guarantee expected availability figures ….

i think availability is not that magical engineering term to validate designs ….

i need yr comments and suggestion of real availability calculation in a typical SCADA system implementation.

Kind regards

mohamed eltahan

]]>