Sujet : Re: Diagnostics
De : blockedofcourse (at) *nospam* foo.invalid (Don Y)
Groupes : comp.arch.embeddedDate : 19. Oct 2024, 06:17:21
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vevbss$3mr5m$2@dont-email.me>
References : 1 2 3 4 5 6 7
User-Agent : Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2
On 10/18/2024 8:53 PM, Waldek Hebisch wrote:
One of the FETs that controls the shifting of the automatic
transmission as failed open. How do you detect that /and recover
from it/?
Detecting such thing looks easy. Recovery is tricky, because
if you have spare FET and activate it there is good chance that
it will fail due to the same reason that the first FET failed.
OTOH, if you have propely designed circuit around the FET,
disturbance strong enough to kill the FET is likely to kill
the controller too.
The immediate goal is to *detect* that a problem exists.
If you can't detect, then attempting to recover is a moot point.
The camera/LIDAR that the self-drive feature uses is providing
incorrect data... etc.
Use 3 (or more) and voting. Of course, this increases cost and one
have to judge if increase of cost is worth increase in safety
As well as the reliability of the additional "voting logic".
If not a set of binary signals, determining what the *correct*
signal may be can be problematic.
(in self-driving car using multiple sensors looks like no-brainer,
but if this is just an assist to increase driver comfort then
result may be different).
It is different only in the sense of liability and exposure to
loss. I am not assigning values to those consequences but,
rather, looking to address the issue of run-time testing, in
general.
Even if NONE of the failures can result in injury or loss,
it is unlikely that a user WANTS to have a defective product.
If the user is technically unable to determine when the
product is "at fault" (vs. his own misunderstanding of how it
is *supposed* to work), then those failures contribute to
the users' frustrations with the product.
There are innumerable failures that can occur to compromise
the "system" and no *easy*/inexpensive/reliable way to detect
and recover from *all* of them.
Sure. But for common failures or serious failures having non-negligible
pobability redundancy may offer cheap way to increase reliability.
For critical functions a car could have 3 processors with
voting circuitry. With separate chips this would be more expensive
than single processor, but increase of cost probably would be
negligible compared to cost of the whole car. And when integrated
on a single chip cost difference would be tiny.
>
IIUC car controller may "reboot" during a ride. Intead of
rebooting it could handle work to a backup controller.
>
How do you know the circuitry (and other mechanisms) that
implement this hand-over are operational?
It does not matter if handover _always_ works. What matter is
if system with handover has lower chance of failure than
system without handover. Having statistics of actual failures
(which I do not have but manufacturers should have) and
after some testing one can estimate failure probablity of
different designs and possibly decide to use handover.
Again, I am not interested in "recovery" as that varies with
the application and risk assessment. What I want to concentrate
on is reliably *detecting* faults before they lead to product
failures.
I contend that the hardware in many devices has that capability
(to some extent) but that it is underutilized; that the issue
of detecting faults *after* POST is one that doesn't see much
attention. The likely thinking being that POST will flag it the
next time the device is restarted.
And, that's not acceptable in long-running devices.
It is VERY difficult to design reliable systems. I am not
attempting that. Rather, I am trying to address the fact that
the reassurances POST (and, at the user's perogative, BIST)
are not guaranteed when a device runs "for long periods of time".
You may have tests essentially as part of normal operation.
I suspect most folks have designed devices with UARTs. And,
having written a driver for it, have noted that framing, parity
and overrun errors are possible.
Ask yourself how many of those systems ever *use* that information!
Is there even a means of propagating it up out of the driver?
Of course, if you have single-tasked design with a task which
must be "always" ready to respond, then running test becomes
more complicated. But in most designs you can spare enough
time slots to run tests during normal operation. Tests may
interfere with normal operation, but here we are in domain
specific teritory: sometimes result of operation give enough
assurance that device is operating correctly. And if testing
for correct operation is impossible, then there is nothing to
do, I certainly do not promise to deliver impossible.