Sujet : Re: Diagnostics
De : david.brown (at) *nospam* hesbynett.no (David Brown)
Groupes : comp.arch.embeddedDate : 19. Oct 2024, 13:57:30
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vf06ra$3s5is$1@dont-email.me>
References : 1 2 3
User-Agent : Mozilla Thunderbird
On 18/10/2024 23:42, George Neuner wrote:
On Fri, 18 Oct 2024 20:30:06 -0000 (UTC), antispam@fricas.org (Waldek
Hebisch) wrote:
Don Y <blockedofcourse@foo.invalid> wrote:
Typically, one performs some limited "confidence tests"
at POST to catch gross failures. As this activity is
"in series" with normal operation, it tends to be brief
and not very thorough.
>
Many products offer a BIST capability that the user can invoke
for more thorough testing. This allows the user to decide
when he can afford to live without the normal functioning of the
device.
>
And, if you are a "robust" designer, you often include invariants
that verify hardware operations (esp to I/Os) are actually doing
what they should -- e.g., verifying battery voltage increases
when you activate the charging circuit, loopbacks on DIOs, etc.
>
But, for 24/7/365 boxes, POST is a "once-in-a-lifetime" activity.
And, BIST might not always be convenient (as well as requiring the
user's consent and participation).
>
There, runtime diagnostics are the only alternative for hardware
revalidation, PFA and diagnostics.
>
How commonly are such mechanisms implemented? And, how thoroughly?
>
This is strange question. AFAIK automatically run diagnostics/checks
are part of safety regulations. Even if some safety critical software
does not contain them, nobody is going to admit violationg regulations.
And things like PLC-s are "dual use", they may be used in non-safety
role, but vendors claim compliance to safety standards.
However, only a minor percentage of all devices must comply with such
safety regulations.
As I understand it, Don is working on tech for "smart home"
implementations ... devices that may be expected to run nearly
constantly (though perhaps not 365/24 with 6 9's reliability), but
which, for the most part, are /not/ safety critical.
WRT Don's question, I don't know the answer, but I suspect runtime
diagnostics are /not/ routinely implemented for devices that are not
safety critical. Reason: diagnostics interfere with operation of
<whatever> they happen to be testing. Even if the test is at low(est)
priority and is interruptible by any other activity, it still might
cause an unacceptable delay in a real time situation. To ensure 100%
functionality at all times effectively requires use of redundant
hardware - which generally is too expensive for a non safety critical
device.
That brings up one of the critical points about any kind of runtime diagnostics - what do you do if there is a failure? Until you can answer that question, any effort on diagnostics is not just pointless, but worse than useless because you are adding more stuff that could go wrong.
I think bad or useless diagnostics are a more common problem than missing diagnostics. People feel pressured into having them when they can't measure anything useful and you can't do anything sensible with the results.
I have seen first-hand how the insistence of having all sorts of diagnostics added to a product so that it could be "safety" certified actually result in a less reliable and less safe product. The only "safety" they provided was legal safety so that people could claim it wasn't their fault if it failed, because they had added all the self-tests required by the so-called safety experts.