Liste des Groupes | Revenir à c arch |
On 8/31/24 4:56 PM, BGB wrote:This wouldn't map well to my existing decoder/pipeline, which requires all the ports (and all the registers) to be available at the time an instruction enters EX1, and currently has no support for "cracking" an instruction over multiple cycles, but may spread a single instruction across multiple lanes.
[snip]I was mostly doing dual-issue with a 4R2W design.Stores and MADD allow one register read to be delayed by at least
>
Initially, 6R3W won out mostly because 4R2W disallows an indexed store to be run in parallel with another op; but 6R3W did allow this.
one cycle. If the following cycle had a free read port, that could
be stolen to complete the store/MADD. This could be viewed as
cracking a three-source operation into a two-source operation and
a one-source operation that reads source operands in a following
cycle except that this operation never uses a result from the
previous cycle.
In a VLIW, one could even imagine the register name for theYeah, I wouldn't want to do this.
delayed read being in the next instruction word if the available
read port was always from using an immediate or having fewer
source operands. This would add complexity for exceptions,
branches, and even instruction cache misses. With a small
buffer, a VLIW could also borrow from a previous cycle; an
operation with one register source could include a "load into
buffer" operation. (I do not recall ever reading about cross-
cycle/-instruction-word register fields in any VLIW. While it
seems to fit the VLIW model of static resource management, it
breaks the "atomic" view of an instruction word and of the
operation components — even borrowing within an instruction
word seems not to have been considered.)
Relying on forwarding or stealing from a future surplus would
result in variable performance unless the opportunities were
guaranteed (at least for enough cases that performance glitches
would not be significant).
Les messages affichés proviennent d'usenet.