Re: Making Lemonade (Floating-point format changes)

Liste des GroupesRevenir à c arch 
Sujet : Re: Making Lemonade (Floating-point format changes)
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 20. May 2024, 18:05:25
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v2fvso$3e3f$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : Mozilla Thunderbird
On 5/19/2024 7:10 PM, MitchAlsup1 wrote:
BGB wrote:
 
On 5/19/2024 4:16 PM, MitchAlsup1 wrote:
BGB wrote:
>
On 5/19/2024 11:37 AM, Terje Mathisen wrote:
Thomas Koenig wrote:
So, I did some more measurements on the POWER9 machine, and it came
to around 18 cycles per FMA.  Compared to the 13 cycles for the
FMA instruction, this actually sounds reasonable.
>
The big problem appears to be that, in this particular
implementation, multiplication is not pipelined, but done by
piecewise by addition.  This can be explained by the fact that
this is mostly a decimal unit, with the 128-bit QP just added as
an afterthought, and decimal multiplication does not happen all
that often.
>
A fully pipelined FMA unit capable of 128-bit arithmetic would be
an entirely different beast, I would expect a throughput of 1 per
cycle and a latency of (maybe) one cycle more than 64-bit FMA.
>
The FMA normalizer has to handle a maximally bad cancellation, so it needs to be around 350 bits wide. Mitch knows of course but I'm
guessing
>
that this could at least be close to needing an extra cycle on its own and/or heroic hardware?
>
>
This sort of thing is part of what makes proper FMA hopelessly
expensive.
>
Getting the LoB correctly rounded showed up the generation prior to
FMAC showing up.
>
 
Well, in this case, I have neither in a proper sense.
 
FMAC operators were sorta faked, but mostly exist because they were needed for RV64G, but double-rounded (and not able to expose anything that exists below the ULP, unlike proper FMA).
 But FMAC can expose the bits below LoB.
 
Granted, but exposing these bits is for proper FMA, but but not really what one gets if just gluing together a multiplier and an adder (and effectively doubling the latency).

 
            Granted, full FMA also allows faking higher precision using
>
SIMD vector operations, with math that does not work with
double-rounded
>
FMA instructions.
>
It also enabled error free floating point calculations, but no existing
FP implementation allows exact FP calculations that do not ALSO SET the
inexact flag !?!? {Whereas My 66000 gets this right}
>
 
Dunno.
 
It seems like the existence of anything below the ULP justifies setting
>
the inexact flag...
 You misunderstand !!
 When one computes 2 Operands that are single wide, and can deliver a single result twice as wide or a pair of results each single wide,
you are delivering all the bits, so there is no inexact. However, if you use more than 1 instruction to perform the calculation, then, you
HAVE to set an inexact bit even though the delivery of the second
result makes the first setting of the inexact bit in error !!
 My ISA is expressive enough to do this, just like IEEE 754-2019
requires on augmented addition and augmented subtraction.
 
OK.

Well, and also an issue if one can "just barely" afford to have a
single
>
double-precision unit.
>
This is NOT an architectural issue, but an implementation choice issue.
>
 
Absent things like microcode or traps, architectural and implementation
>
choices are closely tied together. Can't have instructions for things which one can't afford the hardware cost to implement.
 I understand your limitations--the problem I have is that you express
your limitations AS-IF others should make the same choices you had to make. And that is patently FALSE !!
 
I have seen some amount of low-end chips which make the choice as:
   Binary32 only;
   No FPU.
This seems to cover both soft-processors and hard-processors, albeit usually in the microcontroller space.

Defending an indefensible position under the illusion that "That's all I
 got to work with" is an insufficient defense against someone who has
more.
 
Well, and the usefulness of an FPU is dependent on  performance. Inaccurate FPU can still be useful, but slow FPU is not.
 Kahan has several lectures about this....
 
There have been apparently more things killed off by slow performance than by lack of FPU accuracy.
Say, at the time, performance apparently killed off:
   Amiga (killed off by its slow graphics)
     Bit planar graphics rather sucking if one wants fast screen redraws;
   M68K, killed off for being too slow vs x86;
   Cyrix, because its Pentium equivalent was slow at running Quake;
   ...
Granted, an FPGA soft processor isn't really winning in terms of objective performance against much of anything, but would be more left to compete in the microcontroller space, where one needs highly configurable IO, which depends mostly on being able to fit on affordable FPGAs.
Though, sadly, the more affordable FPGAs (like XC7S20 or XC7S50) have a harder time fitting a full-featured BJX2 core. OTOH, an XC7A100T is significantly more expensive than a microcontroller.
Downside of microcontrollers is that they tend to have slow IO and are not so great at computational workloads, but if one can beat a Cortex-M in terms of performance per clock to offset the difference in MHz, this could be something.
But, objectively, still hard, as one can get OK Quake performance on an ESP32 or Cortex-M3. But, they tend to fail miserably at Software OpenGL (so, eg, no Quake 3 Arena on a Cortex-M3), ...
But, this mostly just leaves more highly programmable IO or ISA configurability.

Though, the trick of possibly having four 27-bit multiplies which combine into a virtual 54 bit multiplier seems like an interesting possibility, though not great as DSP's don't natively handle this size (and would be too expensive to stretch it out with LUTs). Likely, one would need to build it from 34*34->68 bit multipliers (each costing 4
DSPs).
>
This is your implementation choice coloring what you take as
architectural
decisions.
>
In terms of DSP cost, it would be higher than the current solution:
   16 vs 6+4 (10).
But, possibly lower LUT cost (in both the Binary32 and Binary64 multipliers, the shortfall is made up using smaller LUT-based
multipliers).
>
We can now fit (5nm) hundreds of GBOoO cores on a single die. The
difference between a 53×53 tree and a 64×64 tree (makes all problems vanish) is
not
visible at this level (100+ cores on a die).
>
This is your implementation choice coloring you thoughts.
>
 
I can afford FPGAs...
I can't afford to get an ASIC made.
 I am not asking you to spend big money--I am merely asking you to quit
defending "doing the wrong thing" when others have to follow standards.
{{If you properly caveated all your defense statements--I would not complain.}}
 
I am not claiming any of this "doesn't kinda suck", so what is the issue?...

So, implementation choices here are:
   FPGA;
   Nothing.
 I have been wondering for a while--are the DSP things you build your
multiplier out of synthesized by Verliog compilation, or hard coded
into the gates themselves ?? Because if they are synthesized, you could
create Verilog that builds the multiplier tree of whatever size you
need
without all the DSP overhead.
 
They are hard-logic in the FPGAs (similar to the Block-RAM).
Mostly they give 18*18 signed bit widening multiply (or 17*17 unsigned), with the ability to add the result into a 48-bit accumulator, with something like an 8ns latency IIRC.
The exact number of them in an FPGA varies.
Like with Block-RAM, they are used by invoking magic patterns in the Verilog.
LUTs can also be used to build multipliers, but are a lot slower and eat a larger amount of FPGA resources, unless one does Shift-Add, but this is slow.
Some lower-end FPGAs, like the Lattice ICE40 line, don't have DSPs, but generally only 5k LUTs and they are LUT4. These are more limited, but some people have managed to shoehorn RV32I cores into them.
ECP5 is more capable, generally with more LUTs per $ than the Xilinx chips, but they are LUT4s, these are seemingly more popular with people who are having their own custom PCBs made though.
Cyclone-V is also semi-popular, but is more directly comparable to Zynq (smaller FPGA coupled to ARM based hard processors). The whole MiSTer (retro games emulation) thing is built mostly around the Cyclone-V and DE10 boards, generally doing hybrid devices which will run the CPU itself in the FPGA part, but then use the ARM side of things mostly for all of the peripheral hardware.
Part of the appeal had usually been that one can emulate 6502/65C816/... faster on an FPGA than on a RasPi. Apparently, some people had gotten 386/486 class CPUs working on the thing well enough to run Doom.

>
What kind of car do you drive ??
>
 
I don't drive a car...
I tend to fairly rapidly get tired out if trying to drive.
 I was going to ask if your car had hand rolled windows, a manual
transmission, ... in the early 1980s all of us were similarly constrained, computer architecture grew out of the fast-and-dirty
modus operandi and into the follow-standards Operandi.
Apparently more modern cars have mostly gone over to replacing all of the manual controls with a touchscreen.
The cars my parents have are from the 2000s though, so still have manually operated controls. Still generally power windows and automatic transmission though.
I guess there is some controversy with the more modern cars, as many people prefer the manual aspects of having a steering column and throttle/brake pedals that connect directly to the engine and brakes, rather than what is effectively a permanently mounted force-feedback racing-wheel controller.
Also apparently the whole thing about police having access to the full driving history of the cars and the ability to remotely disable them, etc.
...
I am more neutral on some of this.
I would also be more accepting of the self-driving thing, if it were more reliable, mostly as my own driving skills are almost non-existent.
But, I guess things like the whole issue of mistaking anything blue-colored for sky, and tending to follow Looney Tunes logic for things like painted murals, isn't great.
Like, if there is a painted depiction of a road on a wall and the car will try to drive into it, not ideal...
Relatedly though, getting basic computer vision tasks working on an FPGA soft processor hasn't been so easy either. Seems to require a fair bit more in terms of processing power to run non-trivial neural nets.
Would probably be OK if I were trying to run OCR or something, but alas.
Then again, trying to do similar in software on a RasPi wasn't working so great either. But, this would be for more simple tasks, like trying to make a small self-driving robot that would be able to navigate and not run into stuff (well, and combined with the limitation of it being difficult to fit the CPU configuration I wanted into an XC7S50, such as to use an Arty-S7 board as the controller...).
Arguably, this leaves using a RasPi or similar...
Had also wanted to try to experiment with customized lenses to try to gain depth information from optical effects, but this hasn't gone anywhere as of yet (always too busy with other stuff than to try to machine specialized camera lens add-ons).
...

Date Sujet#  Auteur
12 May 24 * Making Lemonade (Floating-point format changes)101John Savard
12 May 24 +* Re: Making Lemonade (Floating-point format changes)3wolfgang kern
15 May 24 i`* Re: Making Lemonade (Floating-point format changes)2Michael S
15 May 24 i `- Re: Making Lemonade (Floating-point format changes)1BGB
12 May 24 +* Re: Making Lemonade (Floating-point format changes)6Thomas Koenig
12 May 24 i`* Re: Making Lemonade (Floating-point format changes)5John Savard
12 May 24 i +- Re: Making Lemonade (Floating-point format changes)1John Savard
12 May 24 i `* Re: Making Lemonade (Floating-point format changes)3MitchAlsup1
13 May 24 i  `* Re: Making Lemonade (Floating-point format changes)2John Savard
13 May 24 i   `- Re: Making Lemonade (Floating-point format changes)1BGB
12 May 24 `* Re: Making Lemonade (Floating-point format changes)91John Dallman
12 May 24  `* Re: Making Lemonade (Floating-point format changes)90Thomas Koenig
13 May 24   `* Re: Making Lemonade (Floating-point format changes)89Michael S
13 May 24    +* Re: Making Lemonade (Floating-point format changes)56Thomas Koenig
14 May 24    i`* Re: Making Lemonade (Floating-point format changes)55Michael S
15 May 24    i `* Re: Making Lemonade (Floating-point format changes)54Thomas Koenig
15 May 24    i  `* Re: Making Lemonade (Floating-point format changes)53Michael S
19 May 24    i   `* Re: Making Lemonade (Floating-point format changes)52Thomas Koenig
19 May 24    i    +* Re: Making Lemonade (Floating-point format changes)3Michael S
19 May 24    i    i+- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
20 May 24    i    i`- Re: Making Lemonade (Floating-point format changes)1Thomas Koenig
19 May 24    i    `* Re: Making Lemonade (Floating-point format changes)48Terje Mathisen
19 May 24    i     +* Re: Making Lemonade (Floating-point format changes)40Michael S
19 May 24    i     i+- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
20 May 24    i     i+* Re: Making Lemonade (Floating-point format changes)30Terje Mathisen
20 May 24    i     ii`* Re: Making Lemonade (Floating-point format changes)29Michael S
20 May 24    i     ii `* Re: Making Lemonade (Floating-point format changes)28Terje Mathisen
20 May 24    i     ii  `* Re: Making Lemonade (Floating-point format changes)27Michael S
20 May 24    i     ii   +* Re: Making Lemonade (Floating-point format changes)19BGB
20 May 24    i     ii   i`* Re: Making Lemonade (Floating-point format changes)18MitchAlsup1
20 May 24    i     ii   i +- Re: Making Lemonade (Floating-point format changes)1Chris M. Thomasson
20 May 24    i     ii   i +- Re: Making Lemonade (Floating-point format changes)1Thomas Koenig
21 May 24    i     ii   i `* Re: Making Lemonade (Floating-point format changes)15BGB
21 May 24    i     ii   i  +* Re: Making Lemonade (Floating-point format changes)12Thomas Koenig
21 May 24    i     ii   i  i+* Re: Making Lemonade (Floating-point format changes)7Michael S
21 May 24    i     ii   i  ii+* Re: Making Lemonade (Floating-point format changes)5Terje Mathisen
21 May 24    i     ii   i  iii+- Re: Making Lemonade (Floating-point format changes)1Michael S
21 May 24    i     ii   i  iii`* Re: Making Lemonade (Floating-point format changes)3BGB
22 May 24    i     ii   i  iii `* Re: Making Lemonade (Floating-point format changes)2MitchAlsup1
22 May 24    i     ii   i  iii  `- Re: Making Lemonade (Floating-point format changes)1BGB-Alt
21 May 24    i     ii   i  ii`- Re: Making Lemonade (Floating-point format changes)1Thomas Koenig
21 May 24    i     ii   i  i`* Re: Making Lemonade (Floating-point format changes)4BGB
21 May 24    i     ii   i  i `* Re: Making Lemonade (Floating-point format changes)3MitchAlsup1
21 May 24    i     ii   i  i  +- Re: Making Lemonade (Floating-point format changes)1BGB
22 May 24    i     ii   i  i  `- Re: Making Lemonade (Floating-point format changes)1Terje Mathisen
21 May 24    i     ii   i  `* Re: Making Lemonade (Floating-point format changes)2MitchAlsup1
21 May 24    i     ii   i   `- Re: Making Lemonade (Floating-point format changes)1BGB
20 May 24    i     ii   `* Re: Making Lemonade (Floating-point format changes)7Terje Mathisen
21 May 24    i     ii    `* Re: Making Lemonade (Floating-point format changes)6Michael S
21 May 24    i     ii     `* Re: Making Lemonade (Floating-point format changes)5MitchAlsup1
21 May 24    i     ii      +* Re: Making Lemonade (Floating-point format changes)2Stefan Monnier
22 May 24    i     ii      i`- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
22 May 24    i     ii      +- Re: Making Lemonade (Floating-point format changes)1Terje Mathisen
22 May 24    i     ii      `- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
20 May 24    i     i`* binary128 implementation (was: Making Lemonade (Floating-point format changes)8Anton Ertl
20 May 24    i     i `* Re: binary128 implementation7Terje Mathisen
23 May 24    i     i  `* Re: binary128 implementation6BGB-Alt
23 May 24    i     i   `* Re: binary128 implementation5MitchAlsup1
24 May 24    i     i    `* Re: binary128 implementation4Terje Mathisen
24 May 24    i     i     `* Re: binary128 implementation3BGB-Alt
25 May 24    i     i      `* Re: binary128 implementation2MitchAlsup1
25 May 24    i     i       `- Re: binary128 implementation1BGB
19 May 24    i     +* Re: Making Lemonade (Floating-point format changes)6BGB
19 May 24    i     i`* Re: Making Lemonade (Floating-point format changes)5MitchAlsup1
20 May 24    i     i `* Re: Making Lemonade (Floating-point format changes)4BGB
20 May 24    i     i  `* Re: Making Lemonade (Floating-point format changes)3MitchAlsup1
20 May 24    i     i   `* Re: Making Lemonade (Floating-point format changes)2BGB
20 May 24    i     i    `- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
19 May 24    i     `- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
13 May 24    `* Re: Making Lemonade (Floating-point format changes)32BGB
13 May 24     `* Re: Making Lemonade (Floating-point format changes)31MitchAlsup1
14 May 24      +* Re: Making Lemonade (Floating-point format changes)22BGB
14 May 24      i`* Re: Making Lemonade (Floating-point format changes)21MitchAlsup1
14 May 24      i `* Re: Making Lemonade (Floating-point format changes)20BGB
14 May 24      i  `* Re: Making Lemonade (Floating-point format changes)19MitchAlsup1
14 May 24      i   +* Re: Making Lemonade (Floating-point format changes)2Michael S
15 May 24      i   i`- Re: Making Lemonade (Floating-point format changes)1Michael S
14 May 24      i   +- Re: Making Lemonade (Floating-point format changes)1BGB
16 May 24      i   `* Re: Making Lemonade (Floating-point format changes)15MitchAlsup1
17 May 24      i    `* Re: Making Lemonade (Floating-point format changes)14MitchAlsup1
17 May 24      i     +* Re: Making Lemonade (Floating-point format changes)2MitchAlsup1
18 May 24      i     i`- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
18 May 24      i     `* Re: Making Lemonade (Floating-point format changes)11Chris M. Thomasson
19 May 24      i      `* Re: Making Lemonade (Floating-point format changes)10Chris M. Thomasson
19 May 24      i       `* Re: Making Lemonade (Floating-point format changes)9Chris M. Thomasson
19 May 24      i        `* Re: Making Lemonade (Floating-point format changes)8Chris M. Thomasson
20 May 24      i         `* Re: Making Lemonade (Floating-point format changes)7Chris M. Thomasson
20 May 24      i          `* Re: Making Lemonade (Floating-point format changes)6Chris M. Thomasson
20 May 24      i           `* Re: Making Lemonade (Floating-point format changes)5Chris M. Thomasson
24 May 24      i            `* Re: Making Lemonade (Floating-point format changes)4Chris M. Thomasson
26 May 24      i             `* Re: Making Lemonade (Floating-point format changes)3George Neuner
27 May 24      i              +- Re: Making Lemonade (Floating-point format changes)1Chris M. Thomasson
1 Jun 24      i              `- Re: Making Lemonade (Floating-point format changes)1Chris M. Thomasson
14 May 24      +* Re: Making Lemonade (Floating-point format changes)4Anton Ertl
14 May 24      i`* Re: Making Lemonade (Floating-point format changes)3MitchAlsup1
14 May 24      i +- Re: Making Lemonade (Floating-point format changes)1MitchAlsup1
14 May 24      i `- Re: Making Lemonade (Floating-point format changes)1BGB
10 Jun 24      `* Re: Making Lemonade (Floating-point format changes)4Lawrence D'Oliveiro
10 Jun 24       `* Re: Making Lemonade (Floating-point format changes)3Terje Mathisen
10 Jun 24        `* Re: Making Lemonade (Floating-point format changes)2Niklas Holsti
11 Jun 24         `- Re: Making Lemonade (Floating-point format changes)1Lawrence D'Oliveiro

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal