Liste des Groupes | Revenir à c arch |
I have found that there can be a lot of registers available if they are implemented in BRAMs. BRAMs have lots of depth compared to LUT RAMs. BRAMs have a one cycle latency but that is just part of the pipeline. In Q+ about 40k LUTs are being used just to keep track of registers. (rename mappings and checkpoints).Does each of these take a clock cycle? If so, that seems excessive. What is your cost for a mis-predicted branch?
Given a lot of available registers I keep considering trying a VLIW design similar to the Itanium, rotating register and all. But I have a lot invested in OoO.
Q+ has seven in-order pipeline stages before things get to the re-order buffer.
Les messages affichés proviennent d'usenet.