Re: Microarch Club

Liste des GroupesRevenir à c arch 
Sujet : Re: Microarch Club
De : terje.mathisen (at) *nospam* tmsw.no (Terje Mathisen)
Groupes : comp.arch
Date : 29. Mar 2024, 14:38:55
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <uu6cp0$9s5h$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12
User-Agent : Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Firefox/91.0 SeaMonkey/2.53.18.1
Michael S wrote:
On Thu, 28 Mar 2024 09:31:11 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:
 
Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
BGB wrote:
 
On 3/26/2024 5:27 PM, Michael S wrote:
>
>
For slightly less then 20 years ARM managed OK without integer
divide. Then in 2004 they added integer divide instruction in
ARMv7 (including ARMv7-M variant intended for small
microcontroller cores like Cortex-M3) and for the following 20
years instead of merely OK they are doing great :-)
 
 
OK.
>
The point is they are doing better now after adding IDIV and FDIV.
 
I think both modern ARM and AMD Zen went over to "actually fast"
integer divide.
 
I think for a long time, the de-facto integer divide was ~ 36-40
cycles for 32-bit, and 68-72 cycles for 64-bit. This is also
on-par with what I can get from a shift-add unit.
>
While those numbers are acceptable for shift-subtract division
(including SRT variants).
>
What I don't get is the reluctance for using the FP multiplier as
a fast divisor (IBM 360/91). AMD Opteron used this means to
achieve 17-cycle FDIS and 22-cycle SQRT in 1998. Why should IDIV
not be under 20-cycles ?? and with special casing of leading 1s
and 0s average around 10-cycles ???
>
Empirically, the ARM CortexM7 udiv instruction requires 3+[s/2]
cycles (where s is the number of significant digits in the
quotient).
>
https://www.quinapalus.com/cm7cycles.html
>
That looks a lot like an SRT divisor with early out?
>
Having variable timing DIV means that any crypto operating (including
hashes?) where you use modulo operations, said modulus _must_ be a
known constant, otherwise information about will leak from the
timings, right?
 Are you aware of any professional crypto algorithm, including hashes,
that uses modulo operations by modulo that is neither power-of-two nor
at least 192-bit wide?
I was involved with the optimization of DFC, the AES condidate from CERN:
It uses a fixed prime just above 2^64 as the modulus (2^64+13 afair), and that resulted in a very simple reciprocal, i.e. no need for a DIV opcode.
Terje
--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Date Sujet#  Auteur
21 Mar 24 * Microarch Club22George Musk
25 Mar 24 `* Re: Microarch Club21BGB-Alt
26 Mar 24  `* Re: Microarch Club20MitchAlsup1
26 Mar 24   `* Re: Microarch Club19BGB
26 Mar 24    `* Re: Microarch Club18MitchAlsup1
26 Mar 24     `* Re: Microarch Club17BGB-Alt
27 Mar 24      +* Re: Microarch Club12Michael S
27 Mar 24      i`* Re: Microarch Club11BGB
27 Mar 24      i `* Re: Microarch Club10MitchAlsup1
28 Mar 24      i  +* Re: Microarch Club4Michael S
2 Apr 24      i  i`* Re: Microarch Club3BGB-Alt
5 Apr 24      i  i `* Re: Microarch Club2MitchAlsup1
6 Apr 24      i  i  `- Re: Microarch Club1BGB
28 Mar 24      i  +- Re: Microarch Club1MitchAlsup1
28 Mar 24      i  `* Re: Microarch Club4Terje Mathisen
28 Mar 24      i   `* Re: Microarch Club3Michael S
29 Mar 24      i    `* Re: Microarch Club2Terje Mathisen
29 Mar 24      i     `- Re: Microarch Club1Michael S
27 Mar 24      `* Re: Microarch Club4MitchAlsup1
27 Mar 24       `* Re: Microarch Club3BGB
27 Mar 24        `* Re: Microarch Club2MitchAlsup1
1 Apr 24         `- Re: Microarch Club1BGB

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal