Re: Predictive failures

Liste des GroupesRevenir à se design 
Sujet : Re: Predictive failures
De : blockedofcourse (at) *nospam* foo.invalid (Don Y)
Groupes : sci.electronics.design
Date : 16. Apr 2024, 07:40:29
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <uvl30j$phap$3@dont-email.me>
References : 1 2 3 4
User-Agent : Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2
On 4/15/2024 9:14 PM, Edward Rawde wrote:
It always puzzled me how HAL could know that the AE-35 would fail in the
near future, but maybe HAL had a motive for lying.
>
Why does your PC retry failed disk operations?
 Because the software designer didn't understand hardware.
Actually, he DID understand the hardware which is why he retried
it instead of ASSUMING every operation would proceed correctly.
[Why bother testing the result code if you never expect a failure?]

The correct approach is to mark that part of the disk as unusable and, if
possible, move any data from it elsewhere quick.
That only makes sense if the error is *persistent*.  "Shit
happens" and you can get an occasional failed operation when
nothing is truly "broken".
(how do you know the HBA isn't the culprit?)

If I ask the drive to give
me LBA 1234, shouldn't it ALWAYS give me LBA1234?  Without any data
corruption
(CRC error) AND within the normal access time limits defined by the
location
of those magnetic domains on the rotating medium?
>
Why should it attempt to retry this MORE than once?
>
Now, if you knew your disk drive was repeatedly retrying operations,
would your confidence in it be unchanged from times when it did not
exhibit such behavior?
 I'd have put an SSD in by now, along with an off site backup of the same
data :)
So, any problems you have with your SSD, today, should be solved by using the
technology that will be invented 10 years hence!  Ah, that's a sound strategy!

Assuming you have properly configured a EIA232 interface, why would you
ever get a parity error?  (OVERRUN errors can be the result of an i/f
that is running too fast for the system on the receiving end)  How would
you even KNOW this was happening?
>
I suspect everyone who has owned a DVD/CD drive has encountered a
"slow tray" as the mechanism aged.  Or, a tray that wouldn't
open (of its own accord) as soon/quickly as it used to.
 If it hasn't been used for some time then I'm ready with a tiny screwdriver
blade to help it open.
Why don't they ship such drives with tiny screwdrivers to make it
easier for EVERY customer to address this problem?

But I forget when I last used an optical drive.
When the firmware in your SSD corrupts your data, what remedy will
you use?
You're missing the forest for the trees.

[Turns out, there was a city-wide gas shortage so there was enough
gas available to light the furnace but not enough to bring it up to
temperature as quickly as the designers had expected]
 That's why the furnace designers couldn't have anticipated it.
Really?  You can't anticipate the "gas shutoff" not being in the ON
position?  (which would yield the same endless retry cycle)

They did not know that such a contition might occur so never tested for it.
If they planned on ENDLESSLY retrying, then they must have imagined
some condition COULD occur that would lead to such an outcome.
Else, why not just retry *once* and then give up?  Or, not
retry at all?

A component could fail suddenly, such as a short circuit diode, and
everything would work fine after replacing it.
The cause could perhaps have been a manufacturing defect, such as
insufficient cooling due to poor quality assembly, but the exact real
cause
would never be known.
>
You don't care about the real cause.  Or, even the failure mode.
You (as user) just don't want to be inconvenienced by the sudden
loss of the functionality/convenience that the the device provided.
 There will always be sudden unexpected loss of functionality for reasons
which could not easily be predicted.
And if they CAN'T be predicted, then they aren't germane to this
discussion, eh?
My concern is for the set of failure modes that can realistically
be anticipated.
I *know* the inverters in my monitors are going to fail.  It
would be nice if I knew before I was actively using one when
it went dark!
[But, most users would only use this indication to tell them
to purchase another monitor; "You have been warned!"]

People who service lawn mowers in the area where I live are very busy right
now.
 
A component could fail suddenly as a side effect of another failure.
One short circuit output transistor and several other components could
also
burn up.
>
So, if you could predict the OTHER failure...
Or, that such a failure might occur and lead to the followup failure...
>
A component could fail slowly and only become apparent when it got to the
stage of causing an audible or visible effect.
>
But, likely, there was something observable *in* the circuit that
just hadn't made it to the level of human perception.
 Yes a power supply ripple detection circuit could have turned on a warning
LED but that never happened for at least two reasons.
1. The detection circuit would have increased the cost of the equipment and
thus diminished the profit of the manufacturer.
That would depend on the market, right?  Most of my computers have redundant
"smart" (i.e., internal monitoring and reporting) power supplies.  Because
they were marketed to folks who wanted that sort of reliability.  Because
a manufacturer who didn't provide that level of AVAILABILITY would quickly
lose market share.  The cost of the added components and "handling" is
small compared to the cost of lost opportunity (sales).

2. The user would not have understood and would have ignored the warning
anyway.
That makes assumptions about the market AND the user.
If one of my machines signals a fault, I look to see what it is complaining
about:  is it a power supply failure (in which case, I'm now reliant on
a single power supply)?  is it a memory failure (in which case, a bank
of memory may have been disabled which means the machine will thrash
more and throughput will drop)?  is it a link aggregation error (and
network traffic will suffer)?
If I can't understand these errors, then I either don't buy a product
with that level of reliability *or* have someone on hand who CAN
understand the errors and provide remedies/advice.
Consumers will replace a PC because of malware, trashed registry,
creeping cruft, etc.  That's a problem with the consumer buying the
"wrong" sort of computing equipment for his likely method of use.
(buy a Mac?)

My home wireless Internet system doesn't care if one access point fails,
and
I would not expect to be able to do anything to predict a time of
failure.
Experience says a dead unit has power supply issues. Usually external but
could be internal.
>
Again, the goal isn't to predict "time of failure".  But, rather, to be
able to know that "this isn't going to end well" -- with some advance
notice
that allows for preemptive action to be taken (and not TOO much advance
notice that the user ends up replacing items prematurely).
 Get feedback from the people who use your equpment.
Users often don't understand when a device is malfunctioning.
Or, how to report the conditions and symptoms in a meaningful way.
I recall a woman I worked with ~45 years ago sitting, patiently,
waiting for her computer to boot.  As I walked past, she asked me how
long it takes for that to happen (floppy based systems).  Alarmed
(I had designed the workstations), I asked "How long have you been
waiting?"
Turns out, she had inserted the (8") floppy rotated 90 degrees from
it's proper orientation.
How much longer would she have waited had I not walked past?

I don't think it would be possible to "watch" everything because it's
rare
that you can properly test a component while it's part of a working
system.
>
You don't have to -- as long as you can observe its effects on other
parts of the system.  E.g., there's no easy/inexpensive way to
check to see how much the belt on that CD/DVD player has stretched.
But, you can notice that it HAS stretched (or, some less likely
change has occurred that similarly interferes with the tray's actions)
by noting how the activity that it is used for has changed.
 Sure but you have to be the operator for that.
So you can be ready to help the tray open when needed.
One wouldn't bother with a CD/DVD player -- they are too disposable
and reporting errors won't help the user (even though you have a
big ATTACHED display at your disposal!)
     "For your continued video enjoyment, replace me, now!"
OTOH, if a CNC machine tries to "home" a mechanism and doesn't
get (electronic) confirmation of that event having been completed,
would you expect *it* to just sit there endlessly waiting?
Possibly causing damage to itself in the process?
Would you expect it to "notice" if the drive motor APPEARED to
be connected and was drawing the EXPECTED amount of current?
Or, would you expect an electrician to come along and start
troubleshooting (taking the machine out of production in the process)?

These days I would expect to have fun with management asking for software
to
be able to diagnose and report any hardware failure.
Not very easy if the power supply has died.
>
What if the power supply HASN'T died?  What if you are diagnosing the
likely upcoming failure *of* the power supply?
 Then I probably can't, because the power supply may be just a bought in
power supply which was never designed with upcoming failure detection in
mind.
You wouldn't pick such a power supply if that was an important
failure mode to guard against!  (that's why smart power supplies
are so common -- and redundant!)

You have ECC memory in most (larger) machines.  Do you silently
expect it to just fix all the errors?  Does it have a way of telling you
how many such errors it HAS corrected?  Can you infer the number of
errors that it *hasn't*?
>
[Why have ECC at all?]
 Things are sometimes done the way they've always been done.
Then, we should all be using machines with MEGAbytes of memory...

I used to notice a missing chip in the 9th position but now you mention it
the RAM I just looked at has 9 chips each side.
Much consumer kit has non-ECC RAM.  I'd wager many of the
devices designed by folks *here* use non-ECC RAM (because
support for ECC in embedded products is less common).
Is this ignorance?  Or, willful naivite?

Date Sujet#  Auteur
15 Apr 24 * Predictive failures70Don Y
15 Apr 24 +* Re: Predictive failures27Martin Rid
16 Apr 24 i`* Re: Predictive failures26Don Y
16 Apr 24 i `* Re: Predictive failures25Edward Rawde
16 Apr 24 i  `* Re: Predictive failures24Don Y
16 Apr 24 i   +* Re: Predictive failures3Edward Rawde
16 Apr 24 i   i+- Re: Predictive failures1Edward Rawde
17 Apr 24 i   i`- Re: Predictive failures1legg
16 Apr 24 i   `* Re: Predictive failures20Edward Rawde
16 Apr 24 i    `* Re: Predictive failures19Don Y
16 Apr 24 i     +* Re: Predictive failures16Edward Rawde
16 Apr 24 i     i`* Re: Predictive failures15Don Y
16 Apr 24 i     i +* Re: Predictive failures13Edward Rawde
16 Apr 24 i     i i`* Re: Predictive failures12Don Y
17 Apr 24 i     i i `* Re: Predictive failures11Edward Rawde
17 Apr 24 i     i i  `* Re: Predictive failures10Don Y
17 Apr 24 i     i i   `* Re: Predictive failures9Edward Rawde
17 Apr 24 i     i i    `* Re: Predictive failures8Don Y
17 Apr 24 i     i i     `* Re: Predictive failures7Edward Rawde
17 Apr 24 i     i i      `* Re: Predictive failures6Don Y
17 Apr 24 i     i i       `* Re: Predictive failures5Edward Rawde
17 Apr 24 i     i i        `* Re: Predictive failures4Don Y
17 Apr 24 i     i i         `* Re: Predictive failures3Edward Rawde
17 Apr 24 i     i i          `* Re: Predictive failures2Don Y
17 Apr 24 i     i i           `- Re: Predictive failures1Edward Rawde
17 Apr 24 i     i `- Re: Predictive failures1Don Y
17 Apr 24 i     `* Re: Predictive failures2Liz Tuddenham
17 Apr 24 i      `- Re: Predictive failures1Don Y
15 Apr 24 +- Re: Predictive failures1john larkin
15 Apr 24 +* Re: Predictive failures11Joe Gwinn
16 Apr 24 i`* Re: Predictive failures10Joe Gwinn
16 Apr 24 i +* Re: Predictive failures7john larkin
16 Apr 24 i i`* Re: Predictive failures6Joe Gwinn
16 Apr 24 i i `* Re: Predictive failures5John Larkin
17 Apr 24 i i  +* Re: Predictive failures3Edward Rawde
17 Apr 24 i i  i`* Re: Predictive failures2John Larkin
17 Apr 24 i i  i `- Re: Predictive failures1Edward Rawde
17 Apr 24 i i  `- Re: Predictive failures1Joe Gwinn
16 Apr 24 i `* Re: Predictive failures2Phil Hobbs
16 Apr 24 i  `- Re: Predictive failures1Joe Gwinn
15 Apr 24 +* Re: Predictive failures8Edward Rawde
16 Apr 24 i`* Re: Predictive failures7Don Y
16 Apr 24 i +* Re: Predictive failures4Edward Rawde
16 Apr 24 i i+* Re: Predictive failures2Don Y
16 Apr 24 i ii`- Re: Predictive failures1Edward Rawde
16 Apr 24 i i`- Re: Predictive failures1Martin Brown
17 Apr 24 i `* Re: Predictive failures2Don Y
17 Apr 24 i  `- Re: Predictive failures1Don Y
16 Apr 24 +* Re: Predictive failures7Martin Brown
16 Apr 24 i+- Re: Predictive failures1Don Y
16 Apr 24 i`* Re: Predictive failures5Bill Sloman
16 Apr 24 i `* Re: Predictive failures4Edward Rawde
17 Apr 24 i  `* Re: Predictive failures3Edward Rawde
17 Apr 24 i   `* Re: Predictive failures2John Larkin
17 Apr 24 i    `- Re: Predictive failures1Edward Rawde
16 Apr 24 +* Re: Predictive failures8Don
16 Apr 24 i+* Re: Predictive failures3Edward Rawde
16 Apr 24 ii+- Re: Predictive failures1Don
16 Apr 24 ii`- Re: Predictive failures1Don Y
17 Apr 24 i+* Re: Predictive failures3john larkin
17 Apr 24 ii`* Re: Predictive failures2Don
17 Apr 24 ii `- Re: Predictive failures1Don
17 Apr 24 i`- Re: Predictive failures1Don Y
18 Apr 24 `* Re: Predictive failures7Buzz McCool
19 Apr 24  `* Re: Predictive failures6Don Y
19 Apr 24   +- Re: Predictive failures1Don Y
19 Apr 24   `* Re: Predictive failures4boB
19 Apr 24    `* Re: Predictive failures3Don Y
21 Apr 24     `* Re: Predictive failures2boB
21 Apr 24      `- Re: Predictive failures1Don Y

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal