Send
Close Add comments:
(status displays here)
Got it! This site uses cookies. You consent to this by clicking on "Got it!" or by continuing to use this website.nbsp; Note: This appears on each machine/browser from which this site is accessed.
Failure and recovery: MTTF and MTTR
1. Failure and recovery: MTTF and MTTR
There are two ways to detect and handle errors are the following.
Maximize the MTTF (Mean Time to Failure)
Minimize the MTTR (Mean Time to Recovery)
2. Mean time between failure

The
MTTF (
Mean Time To Failure) is a measure of the average time between a failure.
What is the MTTF of a light bulb?
What is the MTTF of a system of 100 light bulbs?
The MTTF of a (real or practical) system is less than the MTTF of the individual components.
3. MTTF
What is the goal for
MTTF ? Sometimes this is referred to as
MTBF (
Mean Time Between Failure) ?
Do it right the first time.
The goal for MTTF is infinity (i.e., forever). That is, there will not be a failure.
Increasing MTTF without bound can be very expensive.
One approach to avoiding failure: never do anything. What is another approach?
4. MTTR
What is the goal for
MTTR (
Mean Time To Recovery) ?
The goal for MTTR is zero. That is, recover as soon as possible such that the cost of failure is minimal.
What are some examples of mitigating the cost of increasing the MTTF by the cost of lowering the MTTR?
5. Error detection and recovery
Unless program proofs and verification are done, software bugs are inevitable. There are at least two ways to handle such bugs.
Maximize the MTTF - make sure it never happens
Minimize the MTTR - make it easier to fix and go on
In the limit, maximizing the
MTTF requires program proofs and verification. Since testing cannot accomplish this, and, baring proofs, bugs are inevitable, in the limit, the cost can become very high.
Minimizing
MTTR means designing the system such that when inevitable bugs happen, the recovery can be made quickly, efficiently, and effectively.
A good testing philosophy makes judicious trade-offs between
MTTF and
MTTR .
6. Software today
What do real software companies do today?
Make sure the software works pretty good.
Ship the software (even if not completely ready)
Update the software with fixes on a periodic basis.
What helps in this process?
Minimize all redundant code to be able to make changes quickly
Such concepts can be introduced in a beginning programming course.
7. My systems
I use this technique in my programming, web systems, etc.
minimize noncomputer-checked redundancy
find an issue, fix it quickly and update
8. Old CS 101 web site
Old CS 101 web site: (static web pages)
Change a zip file for work download: manually go through and update all of them.
Change an instruction for how to do something: go through entire web site and update every place it appears.
... and so on ...
My CS 101 web site: (dynamic pages)
Change a zip file for work download: all zips get re-created and updated automatically
Change an instruction for how to do something: any change, all changed pages get updated and update automatically.
... and so on ...
Redundancy is minimized and changes and updates controlled by the computer.
9. System scale
The web system has this scale. (2020-04-23)
600+ pages (at 5 printed pages each, 3,000 pages or 6 reams of paper)
100,000+ lines of formatter and update code and 300,000+ lines of classroom management code
1000+ images (many custom-programmed)
600+ program examples
1000+ submissions (Spring 2020)
1000+ on-line exams (Spring 2020, 60,000 questions answered, including practice exams)
This web system did not exist 9 months ago (though parts of code and data existed and were adapted).
The following helps in the process of reacting quickly to the present (minimize MTTR).
macro formatting system of the entire web site
quick update of the remote web site from local web site
quick download of submitted content
All this helps in minimizing MTTR instead of maximizing MTTF.
10. Bank example
For example, if a bank, in avoiding bank robberies, attempts to maximize
MTTF , the cost can be high, people can get hurt, etc. Instead, banks just let it happen (when faced with a robbery), handing over the (marked money), triggering a silent alarm, and are then, hopefully, back in business in a short while.
11. Risk
Risk avoidance (maximize MTTF )
Risk mitigation (minimize MTTR )
Avoidance means it will not happen.
Mitigation means it will not cost too much if it happens.
12. Credit card fraud
It would cost a lot to avoid credit card fraud. Can the cost of the fraud be kept manageable?
You can afford to lose some money if, in effect, you then make more money than you lose.
Risk avoidance is very expensive.
Risk mitigation balances risk with cost.
13. Drive configurations
RAID drive configurations (redundancy, hot-swappable) is networking systems.
Search engine companies:
Google approach: buy up all the low cost inexpensive drives and find a way to sufficiently recover from errors and failed drives.
Other approaches (many of these search companies not in business now) : By the most reliable and expensive drives so they seldom fail (they do anyway).
14. Operating systems
Memory access in operating systems.
Control every page to insure it is valid (maximize MTTF).
Let page faults happen and then recover (minimize MTTR).
15. University example
MTU (Mythical Typical University) :
MTTF to
MTTR
University puts strict procedures in place, with lots of paperwork and coordination, to insure that students do not sign up for courses that they should not be taking. This tries to increase the MTTF.
University allows you to sign up for any course you want to, but immediately disallows any after checking the database - allowing you to take other options. Decrease the MTTR.
16. Forecasting
Michael Hammer:
Perhaps the most startling notion that arises from process-centered planning is the suggestion that long-range forecasting is a waste of time. Hammer, M. (1996).
Beyond reengineering. New York: Harper Business., p. 203.
17. McCarthy: Decisions
Jim McCarthy:
The goal on a software development project is not to have the correct plan in advance but to make the right decisions every day as things that were unknown become known. McCarthy, J. (1995).
Dynamics of Software Development. Redmond, WA: Microsoft Press., p. 101.
There are crucial elements to systems that cannot be known in advance.
18. Microsoft: Specifications
Nonetheless, the basic idea shared by these approaches is that users' needs for many types of software are so difficult to understand and that changes in hardware and software technologies are so continuous and rapid, it is unwise to attempt to design a software system completely in advance. Cusamano, M., & Selby, R. (June 1997). "How Microsoft builds software",
Communications of the ACM, 40:6., p. 55, 56.
19. Initial specifications
... But the initial specification document does not try to cover all the details of each feature of lock the project into the original set of features. ... Experience at Microsoft suggests that the feature set in a specification document may change by 30% or more. Cusamano, M., & Selby, R. (June 1997). "How Microsoft builds software",
Communications of the ACM, 40:6., p. 55, 56.
20. Customers and market surveys
It can be hard to use market surveys to make certain types of decisions, especially when it involves something new - whether that something is a software product, an engineering project, etc.
21. Marketing surveys
Michael Hammer:
... this fundamental precept - that marketing research done for a product that does not yet exist is useless. Hammer, M., & Champy, J. (1993).
Reengineering the corporation. New York: HarperBusiness., p. 88.
Yet, market surveys of this type continue to be done.
22. Sony Walkman
The example used by Hammer was/is the Sony Walkman. A market survey would not have been of much use because the product was revolutionary, a completely new product.
23. Food products
What do companies many (e.g., fast-food) companies really do?
Ask customers which product they would buy.
Test market the new product in a selected/limited area.
Most people will try new things at least once, if the cost is not too high.
24. McCarthy: Customers
Jim McCarthy:
Customers often won't tell you what they really want, particularly if it goes against conventional wisdom. Because they're insecure, they'll tell you instead what they think they're supposed to say they want. McCarthy, J. (1995).
Dynamics of Software Development. Redmond, WA: Microsoft Press., p. 74.
25. Tickets
Why are tickets sold for events when the event could be done for free?
Selling tickets is one way to get a more accurate count of who will actually attend the event.
26. Capacity planning
Jon Bentley says that users will ask for a certain amount of capacity, but then use the system with much more capacity, as much as 10 times the capacity.
His classic example is the Pennsylvania Turnpike.
27. Pennsylvania Turnpike
Before building the PA turnpike in the late 1930's, extensive surveys were done in order to predict customer demand and usage for the new turnpike.
The next milestone occurred in 1937 when Representative Clifford S. Patterson sponsored the Pennsylvania Turnpike authorization, launching the nation's first superhighway. http://www.legis.state.pa.us/WU01/VC/visitor_info/hello_pennsylvania/powering.htm [as of 3, 2 16, 2005]
What happened?
Once they introduced the turnpike, people started using it for things they never imagined, like visiting relatives many hours away.
So, even though they built it to handle about 10 times the traffic from the survey, it was used about 10 times as much as was planned for.
Customer's expectations changed.
28. Steve Jobs: Customers
Some people say, "Give the customers what they want." But that's not my approach. Our job is to figure out what they're going to want before they do. I think Henry Ford once said, "If I'd asked customers what they wanted, they would have told me, 'A faster horse!'" People don't know what they want until you show it to them. That's why I never rely on market research. Our task is to read things that are not yet on the page. Steve Jobs (1955-2011)
29. Steve Jobs
Started Apple computer. Changed the computer industry.
Introduced the MacIntosh computer: Changed the computer industry again.
Introduced the Apple LaserWriter: Changed the publishing industry.
Started Next computer: failure.
Started Pixar: Changed the move and animation industry.
Started iTunes: Changed the music industry.
Started iPad: Changed the tablet industry.
Restarted Apple: Used Next techonogy to change the computer industry again.
30. Programming
The same problems arise in programming computers and/or analyzing system behavior.
31. Measure twice, cut once
Computers can be very useful.
Have you heard the following saying.
Do it right the first time.
Can you actually do this? Why or why not?
32. Do it
If you believe in always "doing it right the first time", then try writing a paper with a manual typewriter, from beginning to end, in one try.
What is a more realistic philosophy?
33. More realistic
A more realistic philosophy is as follows.
Try do to it right the first time, while making it easier to do over the second time.
This is where the storage aspect of computers is critical.
You can work on a document, save what you were doing, and then, later, pick up where you left off.
34. Saying
Traditional adage: Measure twice, cut once.
In programming terms: Think about what will happen before changing your program. Be ready to go back to the previous version.
Generalization: Keep doing it until the result is acceptable or until a fixed point is reached.
35. Software goal
Software design/implementation goal: Minimize non-computer-checked redundancy (repetition).
Recognize the repetition.
Remove/control the repetition.
36. End of page
37. Multiple choice questions for this page
17 questions omitted (login required)
38. Acronyms and/or initialisms for this page
3 acronyms omitted (login required)