Re: Predictive failures

Liste des GroupesRevenir à se design 
Sujet : Re: Predictive failures
De : boB (at) *nospam* K7IQ.com (boB)
Groupes : sci.electronics.design
Date : 19. Apr 2024, 20:16:02
Autres entêtes
Message-ID : <v4d52jtf67qnie0d7kk7evfjakmcvtoo07@4ax.com>
References : 1 2 3
User-Agent : ForteAgent/8.00.32.1272
On Thu, 18 Apr 2024 15:05:07 -0700, Don Y
<blockedofcourse@foo.invalid> wrote:

On 4/18/2024 10:18 AM, Buzz McCool wrote:
On 4/15/2024 10:13 AM, Don Y wrote:
Is there a general rule of thumb for signalling the likelihood of
an "imminent" (for some value of "imminent") hardware failure?
 
This reminded me of some past efforts in this area. It was never demonstrated
to me (given ample opportunity) that this technology actually worked on
intermittently failing hardware I had, so be cautious in applying it in any
future endeavors.
>
Intermittent failures are the bane of all designers.  Until something
is reliably observable, trying to address the problem is largely
wack-a-mole.
>

The problem I have with troubleshooting intermittent failures is that
they are only intermittend sometimes.


https://radlab.cs.berkeley.edu/classes/cs444a/KGross_CSTH_Stanford.pdf
>
Thanks for that.  I didn't find it in my collection so it's addition will
be welcome.

Yes, neat paper.

boB


>
Sun has historically been aggressive in trying to increase availability,
especially on big iron.  In fact, such a "prediction" led me to discard
a small server, yesterday (no time to dick with failing hardware!).
>
I am now seeing similar features in Dell servers.  But, the *actual*
implementation details are always shrouded in mystery.
>
But, it is obvious (for "always on" systems) that there are many things
that can silently fail that will only manifest some time later -- if at
all and possibly complicated by other failures that may have been
precipitated by it.
>
Sorting out WHAT to monitor is the tricky part.  Then, having the
ability to watch for trends can give you an inkling that something is
headed in the wrong direction -- before it actually exceeds some
baked in "hard limit".
>
E.g., only the memory that you actively REFERENCE in a product is ever
checked for errors!  Bit rot may not be detected until some time after it
has occurred -- when you eventually access that memory (and the memory
controller throws an error).
>
This is paradoxically amusing; code to HANDLE errors is likely the least
accessed code in a product.  So, bit rot IN that code is more likely
to go unnoticed -- until it is referenced (by some error condition)
and the error event complicated by the attendant error in the handler!
The more reliable your code (fewer faults), the more uncertain you
will be of the handlers' abilities to address faults that DO manifest!
>
The same applies to secondary storage media.  How will you know if
some-rarely-accessed-file is intact and ready to be referenced
WHEN NEEDED -- if you aren't doing patrol reads/scrubbing to
verify that it is intact, NOW?
>
[One common flaw with RAID implementations and naive reliance on that
technology]

Date Sujet#  Auteur
15 Apr 24 * Predictive failures70Don Y
15 Apr 24 +* Re: Predictive failures27Martin Rid
16 Apr 24 i`* Re: Predictive failures26Don Y
16 Apr 24 i `* Re: Predictive failures25Edward Rawde
16 Apr 24 i  `* Re: Predictive failures24Don Y
16 Apr 24 i   +* Re: Predictive failures3Edward Rawde
16 Apr 24 i   i+- Re: Predictive failures1Edward Rawde
17 Apr 24 i   i`- Re: Predictive failures1legg
16 Apr 24 i   `* Re: Predictive failures20Edward Rawde
16 Apr 24 i    `* Re: Predictive failures19Don Y
16 Apr 24 i     +* Re: Predictive failures16Edward Rawde
16 Apr 24 i     i`* Re: Predictive failures15Don Y
16 Apr 24 i     i +* Re: Predictive failures13Edward Rawde
16 Apr 24 i     i i`* Re: Predictive failures12Don Y
17 Apr 24 i     i i `* Re: Predictive failures11Edward Rawde
17 Apr 24 i     i i  `* Re: Predictive failures10Don Y
17 Apr 24 i     i i   `* Re: Predictive failures9Edward Rawde
17 Apr 24 i     i i    `* Re: Predictive failures8Don Y
17 Apr 24 i     i i     `* Re: Predictive failures7Edward Rawde
17 Apr 24 i     i i      `* Re: Predictive failures6Don Y
17 Apr 24 i     i i       `* Re: Predictive failures5Edward Rawde
17 Apr 24 i     i i        `* Re: Predictive failures4Don Y
17 Apr 24 i     i i         `* Re: Predictive failures3Edward Rawde
17 Apr 24 i     i i          `* Re: Predictive failures2Don Y
17 Apr 24 i     i i           `- Re: Predictive failures1Edward Rawde
17 Apr 24 i     i `- Re: Predictive failures1Don Y
17 Apr 24 i     `* Re: Predictive failures2Liz Tuddenham
17 Apr 24 i      `- Re: Predictive failures1Don Y
15 Apr 24 +- Re: Predictive failures1john larkin
15 Apr 24 +* Re: Predictive failures11Joe Gwinn
16 Apr 24 i`* Re: Predictive failures10Joe Gwinn
16 Apr 24 i +* Re: Predictive failures7john larkin
16 Apr 24 i i`* Re: Predictive failures6Joe Gwinn
16 Apr 24 i i `* Re: Predictive failures5John Larkin
17 Apr 24 i i  +* Re: Predictive failures3Edward Rawde
17 Apr 24 i i  i`* Re: Predictive failures2John Larkin
17 Apr 24 i i  i `- Re: Predictive failures1Edward Rawde
17 Apr 24 i i  `- Re: Predictive failures1Joe Gwinn
16 Apr 24 i `* Re: Predictive failures2Phil Hobbs
16 Apr 24 i  `- Re: Predictive failures1Joe Gwinn
15 Apr 24 +* Re: Predictive failures8Edward Rawde
16 Apr 24 i`* Re: Predictive failures7Don Y
16 Apr 24 i +* Re: Predictive failures4Edward Rawde
16 Apr 24 i i+* Re: Predictive failures2Don Y
16 Apr 24 i ii`- Re: Predictive failures1Edward Rawde
16 Apr 24 i i`- Re: Predictive failures1Martin Brown
17 Apr 24 i `* Re: Predictive failures2Don Y
17 Apr 24 i  `- Re: Predictive failures1Don Y
16 Apr 24 +* Re: Predictive failures7Martin Brown
16 Apr 24 i+- Re: Predictive failures1Don Y
16 Apr 24 i`* Re: Predictive failures5Bill Sloman
16 Apr 24 i `* Re: Predictive failures4Edward Rawde
17 Apr 24 i  `* Re: Predictive failures3Edward Rawde
17 Apr 24 i   `* Re: Predictive failures2John Larkin
17 Apr 24 i    `- Re: Predictive failures1Edward Rawde
16 Apr 24 +* Re: Predictive failures8Don
16 Apr 24 i+* Re: Predictive failures3Edward Rawde
16 Apr 24 ii+- Re: Predictive failures1Don
16 Apr 24 ii`- Re: Predictive failures1Don Y
17 Apr 24 i+* Re: Predictive failures3john larkin
17 Apr 24 ii`* Re: Predictive failures2Don
17 Apr 24 ii `- Re: Predictive failures1Don
17 Apr 24 i`- Re: Predictive failures1Don Y
18 Apr 24 `* Re: Predictive failures7Buzz McCool
19 Apr 24  `* Re: Predictive failures6Don Y
19 Apr 24   +- Re: Predictive failures1Don Y
19 Apr 24   `* Re: Predictive failures4boB
19 Apr 24    `* Re: Predictive failures3Don Y
21 Apr 24     `* Re: Predictive failures2boB
21 Apr 24      `- Re: Predictive failures1Don Y

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal