Newsportal USENET - Re: Computer architects leaving Intel...

On Sun, 22 Sep 2024 20:43:38 +0000, Paul A. Clayton wrote:

On 9/19/24 11:07 AM, EricP wrote:
[snip]
If the multiplier is pipelined with a latency of 5 and throughput
of 1,
then MULL takes 5 cycles and MULL,MULH takes 6.
>
But those two multiplies still are tossing away 50% of their work.
>
I do not remember how multipliers are actually implemented — and
am not motivated to refresh my memory at the moment — but I
thought a multiply low would not need to generate the upper bits,
so I do not understand where your "50% of their work" is coming
from.

   +-----------+   +------------+
   \ mplier /    \   mcand / Big input mux
+--------+    +--------+
   | |
   | +--------------+
   |    /    /
   | /    /
   +-- /    /
/    Tree /
   /    /--+
/    /   |
   /    / |
+---------------+-----------+
hi    low Products
two n-bit operands are multiplied into a 2×n-bit result.
{{All the rest is HOW not what}}

The high result needs the low result carry-out but not the rest of
the result. (An approximate multiply high for multiply by
reciprocal might be useful, avoiding the low result work. There
might also be ways that a multiplier could be configured to also
provide bit mixing similar to middle result for generating a
hash?)
>
I seem to recall a PowerPC implementation did semi-pipelined 32-
bit multiplication 16-bits at a time. This presumably saved area
and power

You save 1/2 of the tree area, but ultimately consume more power.

while also facilitating early out for small
multiplicands,

Dadda showed that doubling the size of the tree only adds one
4-2 compressor delay to the whole calculation.

at the cost of some latency and substantial
throughput compared to a fully pipelined multiplication.

Throughput that the rest of the engine could not use.

If I
remember correctly, this produced a result for 16-bit by 32-bit
multiplication, which is different from generating a low or high
result.
>
And if it does fuse them then the internal uArch cost is the same
as if
you had designed it optimally from the start, except now you have
to pay for a fuser.
>
<sound of soap box being dragged out>
This idea that macro-op fusion is some magic solution is bullshit.

The argument is, at best, of Academic Quality, made by a student
at the time as a way to justify RISC-V not having certain easy
for HW to perform calculations.

1) It's not free.
>
Neither is increasing the number of opcodes or providing extender
prefixes. If one wants binary compatibility, non-fusing
implementations would work.

I did neither and avoided both.

(I tend to favor providing a translation layer between software
distribution format and instruction cache format, which reduces
the binary compatibility constraint.)
>
2) It only works where Decode can see *all* the required lookahead
instructions, which means you have to pay for an N-lane decoder
but only get 1 lane.
>
Most fusion is for two adjacent instructions, which significantly
limits the complexity.

To quadratic {BigO( instruction-OpCode-bits ** 2)}

The fusable patterns are also a subset of
all pairs of two instructions, so complete two-way decoding may
not be needed.
>
There may also be optimization opportunities from looking ahead.
Mitch Alsup proposed such for branch handling in a scalar
implementation.

I use this, to be clear, as a means to eliminate any need of the
branch delay slot in smaller narrow machines.

Apart from fusion, there might be advantages for
avoiding bank conflicts in a banked register file. I.e., the cost
of lookahead might be shared by multiple techniques/optimizations.
>
I tend to agree that fusion tends to be a workaround for sub-
optimal instruction encoding, but it seems that encoding involves
a lot of tradeoffs.
>
3) It's probabilistic as it depends on how the fetch buffers get
loaded.
    Eg if the fetch buffer contains a valid instruction but does
not have
    a next instruction, do you stall Decode to see if a fuser
might arrive
    or dispatch it anyway.
>
This is also somewhat true for variable length encodings that
cross fetch boundaries.

In My 1-wide machine, the only time this comes up is when a
long instruction crosses into a new cache line (or page) and
the cache (or TLB) takes a miss.

In general a boundary-crossing instruction
would probably stall even if such was not strictly necessary
(e.g., if the missing information is opcode refinement — not
related to instruction routing — or an immediate or even a
register source identifier specifying a value that can have
delayed use (e.g., value of a store, addend of a FMADD).

In my case, immediate data for a ST is not needed until the ST
has retired, so a) it is placed last, b) delay can be tolerated
as long as the pipeline depth.

>
This does seem a weakness, but fusion is not entirely negative
factors.
>
4) It gets exponentially expensive if you start doing multiple
instruction
lanes because decode has to deal with all the permutations of
fusion possibilities.

Fusion in an already variable length RISC ISA is already exponential.

>
This is also a factor in mere superscalar decode/execute.
Detecting that an instruction is dependent on another would
normally stall the execution of that instruction.
>
(I feel that encoding some of the dependency information could
be useful to avoid some of this work. In theory, common
dependency detection could also be more broadly useful; e.g.,
operand availability detection and execution/operand routing.)

So useful that it is encoded directly in My 66000 ISA.

5) Any fused instructions leave (multiple) bubbles that should be
compacted out or there wasn't much point to doing the fusion.
>
Even with reduced operations per cycle, fusion could still provide
a net energy benefit.

Here I disagree:: but for a different reason::
In order for RISC-V to use a 64-bit constant as an operand, it has
to execute either:: AUPIC-LD to an area of memory containing the
64-bit constant, or a 6-7 instruction stream to build the constant
inline. While an ISA that directly supports 64-bit constants in ISA
does not execute any of those.
Thus, while it may save power seen at the "its my ISA" level it
may save power, but when seem from the perspective of "it is
directly supported in my ISA" it wastes power.
There is NO less power expensive way to deliver a constant into
execution as from the instruction stream directly to the function
unit performing the calculation.

In my opinion it is better to have an ISA that is optimal by design
rather than being patched up by fusion later.
>
Fusion is mostly presented for "patching up", but there are also
considerations of diverse microarchitectures. With pre-fused
instructions, an implementation might need to crack some of those
instructions. Software optimized for such an implementation might
also prefer more flexible compile-time scheduling of pre-cracked
operations.

Agreed:: there is a cost of implementing a means by which large
constants can be used in the instruction. I argue that this is
a) only apparent in the smallest implementations, b) is smaller
than the cost in cycles and power that fusion requires.

A load-op instruction is perhaps particularly difficult because
one needs frequent stalls, a skewed (or second chance) pipeline to
hide the load latency, out-of-order execution, or some other stall
avoidance mechanism.
>
There are also constraints in encoding granularity.
>
Some of this inefficiency is caused by clinging to now 40 year old
risc design *guidelines* (ie not even rules) that:
- instructions have at most 1 dest and 2 source registers
>
FMADD seems to have mostly killed the 2-source limit. AArch64's
paired load removes the 2 destination limit. (Paired destinations
were common for early double precision implementations.)

FMAD also provides the operand bussing to support the::
mem rd,[Rbase+Rindex<<scale+disp]
addressing mode.
But this was already possible since "disp" always comes from the
instruction, and only goes to the AGEN unit.
FMAD just got rid of all the other excuses not to do the right
thing.

>
- register specifier fields are either source or dest, never both
>
This seems mostly a code density consideration. I think using a
single name for both a source and a destination is not so
horrible, but I am not a hardware guy.

All we HW guys want is the where ever the field is specified,
it is specified in exactly 1 field in the instruction. So, if
field<a..b> is used to specify Rd in one instruction, there is
no other field<!a..!b> specifies the Rd register. RISC-V blew
this "requirement.

- instructions should take at most 1 clock (they never did)
>
That was clearly overconstraining.
>
These self imposed design restrictions cause ISA designers to miss
some possible more optimal solutions. The result is things like
RISC-V's memory reference linkage structures taking 6 instructions
to build a 64-bit PC-relative address. And I'm pretty sure we won't
see any 6 instruction fusers for quite some time.
>
I very much doubt a compiler would generate such outside of some
real-time application where the time constancy might justify the
code bloat.
>
<sound of soap box being dragged back to cupboard>
>
I do not mean my response to be heckling. Your points are very
true. However, I think fusion is a technique — like cracking —
that is a natural part of an architect's toolbox.

Date	Sujet	#	Auteur
27 Aug 24	Computer architects leaving Intel...	539	Thomas Koenig
27 Aug 24	Re: Computer architects leaving Intel...	1	Michael S
27 Aug 24	Re: Computer architects leaving Intel...	1	Stephen Fuld
27 Aug 24	Re: Computer architects leaving Intel...	536	John Dallman
27 Aug 24	Re: Computer architects leaving Intel...	529	BGB
28 Aug 24	Re: Computer architects leaving Intel...	528	MitchAlsup1
28 Aug 24	Re: Computer architects leaving Intel...	527	BGB
28 Aug 24	Re: Computer architects leaving Intel...	2	Robert Finch
28 Aug 24	Re: Computer architects leaving Intel...	1	BGB
28 Aug 24	Re: Computer architects leaving Intel...	524	MitchAlsup1
29 Aug 24	Re: Computer architects leaving Intel...	523	BGB
29 Aug 24	Re: Computer architects leaving Intel...	511	MitchAlsup1
29 Aug 24	Re: Computer architects leaving Intel...	510	BGB
30 Aug 24	Re: Computer architects leaving Intel...	499	John Dallman
30 Aug 24	Re: Computer architects leaving Intel...	11	Thomas Koenig
30 Aug 24	Re: Computer architects leaving Intel...	1	Michael S
30 Aug 24	Re: Computer architects leaving Intel...	8	Anton Ertl
30 Aug 24	Re: Computer architects leaving Intel...	2	Michael S
30 Aug 24	Re: Computer architects leaving Intel...	1	Anton Ertl
30 Aug 24	Re: Computer architects leaving Intel...	5	John Dallman
30 Aug 24	Re: Computer architects leaving Intel...	4	Brett
30 Aug 24	Re: Computer architects leaving Intel...	1	John Dallman
2 Sep 24	Re: Computer architects leaving Intel...	2	Terje Mathisen
2 Sep 24	Re: Computer architects leaving Intel...	1	Thomas Koenig
30 Aug 24	Re: Computer architects leaving Intel...	1	BGB
30 Aug 24	Re: Computer architects leaving Intel...	487	Anton Ertl
30 Aug 24	Re: Computer architects leaving Intel...	302	John Dallman
30 Aug 24	Re: Computer architects leaving Intel...	301	David Brown
30 Aug 24	Re: Computer architects leaving Intel...	293	Anton Ertl
30 Aug 24	Re: Computer architects leaving Intel...	292	Bernd Linsel
31 Aug 24	Re: Computer architects leaving Intel...	1	Thomas Koenig
31 Aug 24	Re: Computer architects leaving Intel...	290	Thomas Koenig
31 Aug 24	Re: Computer architects leaving Intel...	1	Thomas Koenig
31 Aug 24	Re: Computer architects leaving Intel...	288	Bernd Linsel
31 Aug 24	Re: Computer architects leaving Intel...	1	Thomas Koenig
31 Aug 24	Re: Computer architects leaving Intel...	2	Thomas Koenig
31 Aug 24	Re: Computer architects leaving Intel...	1	Bernd Linsel
31 Aug 24	Re: Computer architects leaving Intel...	284	Anton Ertl
31 Aug 24	Re: Computer architects leaving Intel...	279	Thomas Koenig
31 Aug 24	Re: Computer architects leaving Intel...	157	Bernd Linsel
31 Aug 24	Re: Computer architects leaving Intel...	153	MitchAlsup1
1 Sep 24	Re: Computer architects leaving Intel...	152	Stephen Fuld
2 Sep 24	Re: Computer architects leaving Intel...	151	Terje Mathisen
2 Sep 24	Re: Computer architects leaving Intel...	150	Stephen Fuld
3 Sep 24	Re: Computer architects leaving Intel...	139	David Brown
3 Sep 24	Re: Computer architects leaving Intel...	108	Stephen Fuld
4 Sep 24	Re: Computer architects leaving Intel...	107	David Brown
4 Sep 24	Re: Computer architects leaving Intel...	103	Terje Mathisen
4 Sep 24	Re: Computer architects leaving Intel...	101	David Brown
4 Sep 24	Re: Computer architects leaving Intel...	97	jseigh
4 Sep 24	Re: Computer architects leaving Intel...	96	David Brown
4 Sep 24	Re: Computer architects leaving Intel...	95	Brett
4 Sep 24	Re: Computer architects leaving Intel...	1	Thomas Koenig
4 Sep 24	Re: Computer architects leaving Intel...	1	MitchAlsup1
5 Sep 24	Re: Computer architects leaving Intel...	8	BGB
5 Sep 24	Re: Computer architects leaving Intel...	7	MitchAlsup1
5 Sep 24	Re: Computer architects leaving Intel...	6	David Brown
5 Sep 24	Re: Computer architects leaving Intel...	5	Niklas Holsti
5 Sep 24	Re: Computer architects leaving Intel...	4	David Brown
5 Sep 24	Re: Computer architects leaving Intel...	3	BGB
6 Sep 24	Re: Computer architects leaving Intel...	2	David Brown
9 Sep 24	Re: Computer architects leaving Intel...	1	BGB
5 Sep 24	Re: Computer architects leaving Intel...	83	David Brown
5 Sep 24	Re: Computer architects leaving Intel...	82	Terje Mathisen
5 Sep 24	Re: Computer architects leaving Intel...	79	David Brown
5 Sep 24	Re: Computer architects leaving Intel...	2	Thomas Koenig
7 Sep 24	Re: Computer architects leaving Intel...	1	Tim Rentsch
5 Sep 24	Re: Computer architects leaving Intel...	74	Terje Mathisen
5 Sep 24	Re: Computer architects leaving Intel...	16	David Brown
9 Sep 24	Re: Computer architects leaving Intel...	15	Terje Mathisen
9 Sep 24	Re: Computer architects leaving Intel...	12	David Brown
9 Sep 24	Re: Computer architects leaving Intel...	11	Brett
10 Sep 24	Re: Computer architects leaving Intel...	5	Terje Mathisen
10 Sep 24	Re: Computer architects leaving Intel...	4	Brett
10 Sep 24	Re: Computer architects leaving Intel...	2	Michael S
11 Sep 24	Re: Computer architects leaving Intel...	1	Brett
11 Sep 24	Re: Computer architects leaving Intel...	1	Terje Mathisen
10 Sep 24	Re: Computer architects leaving Intel...	5	David Brown
10 Sep 24	Re: Computer architects leaving Intel...	3	Anton Ertl
10 Sep 24	Re: Computer architects leaving Intel...	2	David Brown
10 Sep 24	Re: Computer architects leaving Intel...	1	Stefan Monnier
10 Sep 24	Re: Computer architects leaving Intel...	1	BGB
9 Sep 24	Re: Computer architects leaving Intel...	2	Michael S
10 Sep 24	Re: Computer architects leaving Intel...	1	Michael S
5 Sep 24	Re: Computer architects leaving Intel...	45	Bernd Linsel
6 Sep 24	Re: Computer architects leaving Intel...	1	David Brown
9 Sep 24	Re: Computer architects leaving Intel...	2	Terje Mathisen
9 Sep 24	Re: Computer architects leaving Intel...	1	Tim Rentsch
14 Sep 24	Re: Computer architects leaving Intel...	41	Kent Dickey
14 Sep 24	Re: Computer architects leaving Intel...	32	Anton Ertl
14 Sep 24	Re: Computer architects leaving Intel...	29	MitchAlsup1
14 Sep 24	Re: Computer architects leaving Intel...	28	Thomas Koenig
15 Sep 24	Re: Computer architects leaving Intel...	27	David Brown
16 Sep 24	Re: Computer architects leaving Intel...	5	Thomas Koenig
16 Sep 24	Re: Computer architects leaving Intel...	4	David Brown
16 Sep 24	Re: Computer architects leaving Intel...	3	Thomas Koenig
17 Sep 24	Re: Upwards and downwards compatible, Computer architects leaving Intel...	1	John Levine
17 Sep 24	Re: Computer architects leaving Intel...	1	David Brown
16 Sep 24	Re: Computer architects leaving Intel...	21	Terje Mathisen
16 Sep 24	Re: Computer architects leaving Intel...	20	David Brown
16 Sep 24	Re: Computer architects leaving Intel...	14	Michael S
17 Sep 24	Re: Computer architects leaving Intel...	5	Terje Mathisen
15 Sep 24	Re: Computer architects leaving Intel...	2	BGB
14 Sep 24	Re: Computer architects leaving Intel...	3	Thomas Koenig
16 Sep 24	Re: Computer architects leaving Intel...	5	Tim Rentsch
5 Sep 24	Re: Computer architects leaving Intel...	3	Tim Rentsch
6 Sep 24	Re: Computer architects leaving Intel...	9	Chris M. Thomasson
5 Sep 24	Re: Computer architects leaving Intel...	2	MitchAlsup1
5 Sep 24	Re: Computer architects leaving Intel...	2	MitchAlsup1
7 Sep 24	Re: Computer architects leaving Intel...	1	Tim Rentsch
4 Sep 24	Re: Computer architects leaving Intel...	3	Thomas Koenig
5 Sep 24	Re: Computer architects leaving Intel...	1	Chris M. Thomasson
4 Sep 24	Re: Computer architects leaving Intel...	1	jseigh
13 Sep 24	Re: Computer architects leaving Intel...	2	Stephen Fuld
3 Sep 24	Re: Computer architects leaving Intel...	30	Stefan Monnier
3 Sep 24	Re: Computer architects leaving Intel...	10	Terje Mathisen
31 Aug 24	Re: Computer architects leaving Intel...	3	Thomas Koenig
1 Sep 24	Re: Computer architects leaving Intel...	121	David Brown
1 Sep 24	Re: Computer architects leaving Intel...	3	John Dallman
3 Sep 24	Re: Computer architects leaving Intel...	1	Stefan Monnier
30 Aug 24	Re: Computer architects leaving Intel...	1	MitchAlsup1
30 Aug 24	Re: Computer architects leaving Intel...	4	Stefan Monnier
30 Aug 24	Re: Computer architects leaving Intel...	2	John Dallman
8 Sep 24	Re: Computer architects leaving Intel...	184	Tim Rentsch
30 Aug 24	Re: Computer architects leaving Intel...	10	MitchAlsup1
31 Aug 24	Re: Computer architects leaving Intel...	11	Paul A. Clayton
29 Aug 24	Re: Computer architects leaving Intel...	6	Anton Ertl