Liste des Groupes | Revenir à c arch |
On Sun, 22 Sep 2024 20:43:38 +0000, Paul A. Clayton wrote:So are you saying the high bits come for free? This seems
On 9/19/24 11:07 AM, EricP wrote:+-----------+ +------------+
[snip]If the multiplier is pipelined with a latency of 5 and throughput>
of 1,
then MULL takes 5 cycles and MULL,MULH takes 6.
>
But those two multiplies still are tossing away 50% of their work.
I do not remember how multipliers are actually implemented — and
am not motivated to refresh my memory at the moment — but I
thought a multiply low would not need to generate the upper bits,
so I do not understand where your "50% of their work" is coming
from.
\ mplier / \ mcand / Big input mux
+--------+ +--------+
| |
| +--------------+
| / /
| / /
+-- / /
/ Tree /
/ /--+
/ / |
/ / |
+---------------+-----------+
hi low Products
two n-bit operands are multiplied into a 2×n-bit result.
{{All the rest is HOW not what}}
The power consumption would seem to depend on how frequently bothThe high result needs the low result carry-out but not the rest ofYou save 1/2 of the tree area, but ultimately consume more power.
the result. (An approximate multiply high for multiply by
reciprocal might be useful, avoiding the low result work. There
might also be ways that a multiplier could be configured to also
provide bit mixing similar to middle result for generating a
hash?)
>
I seem to recall a PowerPC implementation did semi-pipelined 32-
bit multiplication 16-bits at a time. This presumably saved area
and power
Interesting.while also facilitating early out for smallDadda showed that doubling the size of the tree only adds one
multiplicands,
4-2 compressor delay to the whole calculation.
The RISC-V published argument for fusion is not great, but fusionThe argument is, at best, of Academic Quality, made by a student<sound of soap box being dragged out>
This idea that macro-op fusion is some magic solution is bullshit.
at the time as a way to justify RISC-V not having certain easy
for HW to perform calculations.
My 66000's CARRY and PRED are "extender prefixes", admittedlyI did neither and avoided both.1) It's not free.>
Neither is increasing the number of opcodes or providing extender
prefixes. If one wants binary compatibility, non-fusing
implementations would work.
How so? My 66000 does not provide any explicit declaration whatbe useful to avoid some of this work. In theory, commonSo useful that it is encoded directly in My 66000 ISA.
dependency detection could also be more broadly useful; e.g.,
operand availability detection and execution/operand routing.)
Yes, but "computing" large immediates is obviously less efficientHere I disagree:: but for a different reason::5) Any fused instructions leave (multiple) bubbles that should be>
compacted out or there wasn't much point to doing the fusion.
Even with reduced operations per cycle, fusion could still provide
a net energy benefit.
In order for RISC-V to use a 64-bit constant as an operand, it has
to execute either:: AUPIC-LD to an area of memory containing the
64-bit constant, or a 6-7 instruction stream to build the constant
inline. While an ISA that directly supports 64-bit constants in ISA
does not execute any of those.
Thus, while it may save power seen at the "its my ISA" level it
may save power, but when seem from the perspective of "it is
directly supported in my ISA" it wastes power.
Only with the Compressed extension, I think. The CompressedAll we HW guys want is the where ever the field is specified,- register specifier fields are either source or dest, never both>
This seems mostly a code density consideration. I think using a
single name for both a source and a destination is not so
horrible, but I am not a hardware guy.
it is specified in exactly 1 field in the instruction. So, if
field<a..b> is used to specify Rd in one instruction, there is
no other field<!a..!b> specifies the Rd register. RISC-V blew
this "requirement.
Les messages affichés proviennent d'usenet.