Liste des Groupes | Revenir à c arch |
On Thu, 12 Sep 2024 3:37:22 +0000, Robert Finch wrote:That sounds like a good idea. The fetch typically idles for a few cycles as it can fetch more instructions than can be consumed in a single cycle. So, while it’s idling it could be fetching down an alternate path. Part of the pipeline would need to be replicated doubling up on the size. Then an A/B switch happens which selects the right pipeline. Would not want to queue to the reorder buffer from the alternate path, as there is a bit of a bottleneck at queue. Not wondering what to do about multiple branches. Multiple pipelines and more switches? Front-end would look like a pipeline tree to handle multiple outstanding branches.
On 2024-09-11 11:48 a.m., Stephen Fuld wrote:In a machine I did in 1990-2 we would fetch down the alternate pathOn 9/11/2024 6:54 AM, Robert Finch wrote:Each stage takes one clock cycle. Unconditional branches are detected at
>
snip
>
>I have found that there can be a lot of registers available if they>
are implemented in BRAMs. BRAMs have lots of depth compared to LUT
RAMs. BRAMs have a one cycle latency but that is just part of the
pipeline. In Q+ about 40k LUTs are being used just to keep track of
registers. (rename mappings and checkpoints).
>
Given a lot of available registers I keep considering trying a VLIW
design similar to the Itanium, rotating register and all. But I have a
lot invested in OoO.
>
>
Q+ has seven in-order pipeline stages before things get to the re-
order buffer.
Does each of these take a clock cycle? If so, that seems excessive.
What is your cost for a mis-predicted branch?
>
>
>
>
the second stage and taken then so they do not consume as many clocks.
There are two extra stages to handle vector instructions. Those two
stages could be removed if vectors are not needed.
>
Mis-predicted branches are really expensive. They take about six clocks,
plus the seven clocks to refill the pipeline, so it is about 13 clocks.
Seems like it should be possible to reduce the number of clocks of
processing during the miss, but I have not got around to it yet. There
is a branch miss state machine that restores the checkpoint. Branches
need a lot of work yet.
and put the recovery instructions in a buffer, so when a branch was
mispredicted, the instructions were already present.
So, you can't help the 6 cycles of branch verification latency,
but you can fix the pipeline refill latency.
We got 2.05 i/c on XLISP SPECnit 89 mostly because of the low backup
overhead.
Les messages affichés proviennent d'usenet.