Liste des Groupes | Revenir à c arch |
On 9/22/24 6:19 PM, MitchAlsup1 wrote:The high order bits are free WRT gates of delay, but consume as muchOn Sun, 22 Sep 2024 20:43:38 +0000, Paul A. Clayton wrote:>
>On 9/19/24 11:07 AM, EricP wrote:>
[snip]If the multiplier is pipelined with a latency of 5 and throughput>
of 1,
then MULL takes 5 cycles and MULL,MULH takes 6.
>
But those two multiplies still are tossing away 50% of their work.
I do not remember how multipliers are actually implemented — and
am not motivated to refresh my memory at the moment — but I
thought a multiply low would not need to generate the upper bits,
so I do not understand where your "50% of their work" is coming
from.
+-----------+ +------------+
\ mplier / \ mcand / Big input mux
+--------+ +--------+
| |
| +--------------+
| / /
| / /
+-- / /
/ Tree /
/ /--+
/ / |
/ / |
+---------------+-----------+
hi low Products
>
two n-bit operands are multiplied into a 2×n-bit result.
{{All the rest is HOW not what}}
So are you saying the high bits come for free? This seems
contrary to the conception of sums of partial products, where
some of the partial products are only needed for the upper bits
and so could (it seems to me) be uncalculated if one only wanted
the lower bits.
It is 100% of the time in FP codes, and generally unknowable in>The high result needs the low result carry-out but not the rest of>
the result. (An approximate multiply high for multiply by
reciprocal might be useful, avoiding the low result work. There
might also be ways that a multiplier could be configured to also
provide bit mixing similar to middle result for generating a
hash?)
>
I seem to recall a PowerPC implementation did semi-pipelined 32-
bit multiplication 16-bits at a time. This presumably saved area
and power
You save 1/2 of the tree area, but ultimately consume more power.
The power consumption would seem to depend on how frequently both
multiplier and multiplicand are larger than 16 bits. (However, I
seem to recall that the mentioned implementation only checked one
operand.) I suspect that for a lot of code, small values are
common.
>Since they cast extra bits over a number of instructions, and
My 66000's CARRY and PRED are "extender prefixes", admittedly
included in the original architecture so compensating for encoding
constraints (e.g., not having 36-bit instruction parcels) rather
than microarchitectural or architectural variation.
[snip]>> (I feel that encoding some of the dependency informationI was talking about how operand routing is explicitly described
could>be useful to avoid some of this work. In theory, common>
dependency detection could also be more broadly useful; e.g.,
operand availability detection and execution/operand routing.)
So useful that it is encoded directly in My 66000 ISA.
How so? My 66000 does not provide any explicit declaration what
operation will be using a result (or where an operand is being
sourced from). Register names express the dependencies so the
dataflow graph is implicit.
I was speculating that _knowing_ when an operand will be availableIt is easier to record which FU will deliver a result, the when
and where a result should be sent (rather than broadcasting) could
be useful information.
The R in RISC-V does not represent REDUCED.>Even with reduced operations per cycle, fusion could still provide>
a net energy benefit.
Here I disagree:: but for a different reason::
>
In order for RISC-V to use a 64-bit constant as an operand, it has
to execute either:: AUPIC-LD to an area of memory containing the
64-bit constant, or a 6-7 instruction stream to build the constant
inline. While an ISA that directly supports 64-bit constants in ISA
does not execute any of those.
>
Thus, while it may save power seen at the "its my ISA" level it
may save power, but when seem from the perspective of "it is
directly supported in my ISA" it wastes power.
Yes, but "computing" large immediates is obviously less efficient
(except for compression), the computation part is known to be
unnecessary. Fusing a comparison and a branch may be a consequence
of bad ISA design in not properly estimating how much work an
instruction can do (and be encoded in available space) and there
is excess decode overhead with separate instructions, but the
individual operations seem to be doing actual work.
>
I suspect there can be cases where different microarchitectures
would benefit from different amounts of instruction/operation
complexity such that cracking and/or fusion may be useful even in
an optimally designed generic ISA.
>
[snip]>>- register specifier fields are either source or dest, never both>
This seems mostly a code density consideration. I think using a
single name for both a source and a destination is not so
horrible, but I am not a hardware guy.
All we HW guys want is the where ever the field is specified,
it is specified in exactly 1 field in the instruction. So, if
field<a..b> is used to specify Rd in one instruction, there is
no other field<!a..!b> specifies the Rd register. RISC-V blew
this "requirement.
Only with the Compressed extension, I think. The Compressed
extension was somewhat rushed and, in my opinion, philosophically
flawed by being redundant (i.e., every C instruction can be
expanded to a non-C instruction). Things like My 66000's ENTER
provide code density benefits but are contrary to the simplicity
emphasis. Perhaps a Rho (density) extension would have been
better.☺ (The extension letter idea was interesting for an
academic ISA but has been clearly shown to be seriously flawed.)
16-bit instructions could have kept the same register fieldThe whole layout of the ISA is sloppy...
placements with masking/truncation for two-register-field
instructions.
Even a non-destructive form might be provided byLipstick on a pig.
different masking or bit inversion for the destination. However,
providing three register fields seems to require significant
irregularity in extracting register names. (Another technique
would be using opcode bits for specifying part or all of a
register name. Some special purpose registers or groups of
registers may not be horrible for compiler register allocation,
but such seems rather funky/clunky.)
>
It is interesting that RISC-V chose to split the immediate field
for store instructions so that source register names would be in
the same place for all (non-C) instructions.
Comparing an ISA design to RISC-V is not exactly the same asI don't even know if My 66000 can or should be termed RISC since
comparing to "best in class".
Les messages affichés proviennent d'usenet.