Re: Microarch Club

Liste des GroupesRevenir à c arch 
Sujet : Re: Microarch Club
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1)
Groupes : comp.arch
Date : 28. Mar 2024, 01:11:34
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <4f63a339527a85e67bcd85c6f5388bfa@www.novabbs.org>
References : 1 2 3 4 5 6 7 8 9 10
User-Agent : Rocksolid Light
Scott Lurndal wrote:

mitchalsup@aol.com (MitchAlsup1) writes:
BGB wrote:
>
On 3/26/2024 5:27 PM, Michael S wrote:
  For slightly less then 20 years ARM managed OK without integer divide.
Then in 2004 they added integer divide instruction in ARMv7 (including
ARMv7-M variant intended for small microcontroller cores like
Cortex-M3) and for the following 20 years instead of merely OK they are
doing great :-)
 
>
OK.
>
The point is they are doing better now after adding IDIV and FDIV.
>
I think both modern ARM and AMD Zen went over to "actually fast" integer divide.
>
I think for a long time, the de-facto integer divide was ~ 36-40 cycles for 32-bit, and 68-72 cycles for 64-bit. This is also on-par with what I can get from a shift-add unit.
>
While those numbers are acceptable for shift-subtract division (including
SRT variants).
>
What I don't get is the reluctance for using the FP multiplier as a fast
divisor (IBM 360/91). AMD Opteron used this means to achieve 17-cycle
FDIS and 22-cycle SQRT in 1998. Why should IDIV not be under 20-cycles ??
and with special casing of leading 1s and 0s average around 10-cycles ???

Empirically, the ARM CortexM7 udiv instruction requires 3+[s/2] cycles
(where s is the number of significant digits in the quotient).
I submit that a 5+2×ln8(s) is faster still.
32-bits = 15 cycles <not so much faster>
64-bits = 17 cycles <A lot faster>
{Log base 8, where one uses Newton-Raphson or Goldschmidt to get 8 significant
digits (9.2 bits are correct) and double the significant bits each iteration (2-cycles). }
5 comes from looking at numerator and denominator to find the first bit of
significance, and then shifting numerator and denominator so that the FDIV
algorithm can work.

https://www.quinapalus.com/cm7cycles.html

>
I submit that at 10-cycles for average latency, the need to invent screwy
forms of even faster division fall by the wayside {accurate or not}.

Date Sujet#  Auteur
21 Mar 24 * Microarch Club22George Musk
25 Mar 24 `* Re: Microarch Club21BGB-Alt
26 Mar 24  `* Re: Microarch Club20MitchAlsup1
26 Mar 24   `* Re: Microarch Club19BGB
26 Mar 24    `* Re: Microarch Club18MitchAlsup1
26 Mar 24     `* Re: Microarch Club17BGB-Alt
27 Mar 24      +* Re: Microarch Club12Michael S
27 Mar 24      i`* Re: Microarch Club11BGB
27 Mar 24      i `* Re: Microarch Club10MitchAlsup1
28 Mar 24      i  +* Re: Microarch Club4Michael S
2 Apr 24      i  i`* Re: Microarch Club3BGB-Alt
5 Apr 24      i  i `* Re: Microarch Club2MitchAlsup1
6 Apr 24      i  i  `- Re: Microarch Club1BGB
28 Mar 24      i  +- Re: Microarch Club1MitchAlsup1
28 Mar 24      i  `* Re: Microarch Club4Terje Mathisen
28 Mar 24      i   `* Re: Microarch Club3Michael S
29 Mar 24      i    `* Re: Microarch Club2Terje Mathisen
29 Mar 24      i     `- Re: Microarch Club1Michael S
27 Mar 24      `* Re: Microarch Club4MitchAlsup1
27 Mar 24       `* Re: Microarch Club3BGB
27 Mar 24        `* Re: Microarch Club2MitchAlsup1
1 Apr 24         `- Re: Microarch Club1BGB

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal