Sujet : Re: Cray style vectors
De : tr.17687 (at) *nospam* z991.linuxsc.com (Tim Rentsch)
Groupes : comp.archDate : 11. Mar 2024, 17:02:56
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <864jdcsqmn.fsf@linuxsc.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
User-Agent : Gnus/5.11 (Gnus v5.11) Emacs/22.4 (gnu/linux)
Terje Mathisen <
terje.mathisen@tmsw.no> writes:
Tim Rentsch wrote:
>
Thomas Koenig <tkoenig@netcologne.de> writes:
>
Tim Rentsch <tr.17687@z991.linuxsc.com> schrieb:
>
Thomas Koenig <tkoenig@netcologne.de> writes:
>
Signed integer overflow is undefined behavior in C and prohibited
in Fortran. Yet, there is no straightforward, standard-compliant
way to check for signed overflow (and handle this appropriately)
in either language. [...]
>
It isn't hard to write standard C code to determine whether a
proposed addition or subtraction would overflow, and does so
safely and reliably.
>
Also efficiently and without resorting to implementation-
defined or undefined behavior (and without needing a bigger
type)?
>
Heavens to Betsy! Are you impugning the quality and excellence
of my code? Of *my* code? I can only hope that you are suitably
chagrined and contrite. ;)
>
It's a little bit tedious perhaps but not
difficult. Checking code can be wrapped in an inline function
and invoke whatever handling is desired, within reason.
>
Maybe you could share such code?
>
Rather that do that I will explain.
>
An addition overflows if the two operands have the same sign and
the sign of an operand is the opposite of the sign of the sum
(taken mod the width of the operands). Convert the signed
operands to their unsigned counterparts, and form the sum of the
unsigned values. The sign is just the high-order bit in each
case. Thus the overflow condition can be detected with a few
bitwise xors and ands.
>
Subtraction is similar except now overflow can occur only when
the operands have different signs and the sign of the sum is
the opposite of the sign of the first operand.
>
The above description works for two's complement hardware where
unsigned types have the same width as their corresponding signed
types. I think for most people that's all they need. The three
other possibilities are all doable with minor adjustments, and
code appropriate to each particular implementation can be
selected using C preprocessor conditional, as for example
>
#if UINT_MAX > INT_MAX && INT_MIN == -INT_MAX - 1
// this case is the one outlined above
>
#elif UINT_MAX > INT_MAX && INT_MIN == -INT_MAX
>
#elif UINT_MAX == INT_MAX && INT_MIN == -INT_MAX - 1
>
#elif UINT_MAX == INT_MAX && INT_MIN == -INT_MAX
>
Does that all make sense?
>
>
The next question would be how to do the same for multiplication....
>
Multiplication is a whole other ball game. First we need to
consider only the widest types, because narrower types can be
carried out in a wider type and the resulting product value
checked. Off the top of my head, for the widest types I would
try converting to float or double, do a floating-point multiply,
and do some trivial accepts and trivial rejects based on the
exponent of the result. Any remaining cases would need more
care, but probably (we hope!) there aren't many of those and they
don't happen very often. So for what it's worth there is my
first idea. Second idea is to compute a double-width product,
or at least part of one, using standard multiple-precision
arithmetic, and speed compare against the floating-point method.
I better stop now or the ideas will probably get worse rather
than better. :/
>
Your floating point method is pretty bad, imho, since it can give
you both false negatives and false positives, with no way to know
for sure, except doing it all over again.
I think you misunderstood what I was suggesting. The initial
tests using floating point don't produce any false positives or
false negatives. They may give a lot of "not sure" cases, but
none of the other cases is ever wrong. It is only the "not sure"
cases that need further investigation. If the FP multiplication
is done using long double (10 byte IEEE), I'm pretty sure the
results are essentially perfect with respect to a 64x64 multiply;
that is, there are very few "not sure" cases, perhaps even zero.
If I really had to write a 64x64->128 MUL, with no widening MUL or
MULH which returns the high half, then I would punt and do it using
32-bit parts (all variables are u64): [...]
I wrote some code along the same lines. A difference is you
are considering unsigned multiplication, and I am considering
signed multiplication.