Liste des Groupes | Revenir à c arch |
On 9/1/2024 4:02 PM, Paul A. Clayton wrote:Your pipeline is amateur at best.On 8/31/24 4:56 PM, BGB wrote:>
[snip]I was mostly doing dual-issue with a 4R2W design.>
>
Initially, 6R3W won out mostly because 4R2W disallows an indexed store
to be run in parallel with another op; but 6R3W did allow this.
Stores and MADD allow one register read to be delayed by at least
one cycle. If the following cycle had a free read port, that could
be stolen to complete the store/MADD. This could be viewed as
cracking a three-source operation into a two-source operation and
a one-source operation that reads source operands in a following
cycle except that this operation never uses a result from the
previous cycle.
>
This wouldn't map well to my existing decoder/pipeline, which requires
all the ports (and all the registers) to be available at the time an
instruction enters EX1, and currently has no support for "cracking" an
instruction over multiple cycles, but may spread a single instruction
across multiple lanes.
But, yeah, if the restriction only applied to indexed store (in theDelaying ST.data only delays LDs which alias that ST.
current implementation, it applies to all stores), it would still be
around 4% of the total instruction stream.
>
As-is, it is closer to 12%, and causing an extra penalty for 12% of the
total-executed instructions was undesirable (but, IMHO, still better
than needing to use multiple instructions).
Les messages affichés proviennent d'usenet.