Liste des Groupes | Revenir à c arch |
On 9/11/2024 6:54 AM, Robert Finch wrote:Each stage takes one clock cycle. Unconditional branches are detected at the second stage and taken then so they do not consume as many clocks. There are two extra stages to handle vector instructions. Those two stages could be removed if vectors are not needed.
snip
I have found that there can be a lot of registers available if they are implemented in BRAMs. BRAMs have lots of depth compared to LUT RAMs. BRAMs have a one cycle latency but that is just part of the pipeline. In Q+ about 40k LUTs are being used just to keep track of registers. (rename mappings and checkpoints).Does each of these take a clock cycle? If so, that seems excessive. What is your cost for a mis-predicted branch?
>
Given a lot of available registers I keep considering trying a VLIW design similar to the Itanium, rotating register and all. But I have a lot invested in OoO.
>
>
Q+ has seven in-order pipeline stages before things get to the re- order buffer.
Les messages affichés proviennent d'usenet.