Sujet : Re: Microarch Club
De : terje.mathisen (at) *nospam* tmsw.no (Terje Mathisen)
Groupes : comp.archDate : 28. Mar 2024, 09:31:11
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <uu39sg$3fb7n$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.1
Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
BGB wrote:
>
On 3/26/2024 5:27 PM, Michael S wrote:
>
>
For slightly less then 20 years ARM managed OK without integer divide.
Then in 2004 they added integer divide instruction in ARMv7 (including
ARMv7-M variant intended for small microcontroller cores like
Cortex-M3) and for the following 20 years instead of merely OK they are
doing great :-)
>
>
OK.
>
The point is they are doing better now after adding IDIV and FDIV.
>
I think both modern ARM and AMD Zen went over to "actually fast" integer
divide.
>
I think for a long time, the de-facto integer divide was ~ 36-40 cycles
for 32-bit, and 68-72 cycles for 64-bit. This is also on-par with what I
can get from a shift-add unit.
>
While those numbers are acceptable for shift-subtract division (including
SRT variants).
>
What I don't get is the reluctance for using the FP multiplier as a fast
divisor (IBM 360/91). AMD Opteron used this means to achieve 17-cycle
FDIS and 22-cycle SQRT in 1998. Why should IDIV not be under 20-cycles ??
and with special casing of leading 1s and 0s average around 10-cycles ???
Empirically, the ARM CortexM7 udiv instruction requires 3+[s/2] cycles
(where s is the number of significant digits in the quotient).
https://www.quinapalus.com/cm7cycles.html
That looks a lot like an SRT divisor with early out?
Having variable timing DIV means that any crypto operating (including hashes?) where you use modulo operations, said modulus _must_ be a known constant, otherwise information about will leak from the timings, right?
>
I submit that at 10-cycles for average latency, the need to invent screwy
forms of even faster division fall by the wayside {accurate or not}.
I agree.
Terje
-- - <Terje.Mathisen at tmsw.no>"almost all programming can be viewed as an exercise in caching"