Sujet : Re: Tonights Tradeoff
De : sfuld (at) *nospam* alumni.cmu.edu.invalid (Stephen Fuld)
Groupes : comp.archDate : 11. Sep 2024, 17:48:03
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vbse3j$f01n$2@dont-email.me>
References : 1 2 3 4 5 6 7 8 9
User-Agent : Mozilla Thunderbird
On 9/11/2024 6:54 AM, Robert Finch wrote:
snip
I have found that there can be a lot of registers available if they are implemented in BRAMs. BRAMs have lots of depth compared to LUT RAMs. BRAMs have a one cycle latency but that is just part of the pipeline. In Q+ about 40k LUTs are being used just to keep track of registers. (rename mappings and checkpoints).
Given a lot of available registers I keep considering trying a VLIW design similar to the Itanium, rotating register and all. But I have a lot invested in OoO.
Q+ has seven in-order pipeline stages before things get to the re-order buffer.
Does each of these take a clock cycle? If so, that seems excessive. What is your cost for a mis-predicted branch?
-- - Stephen Fuld(e-mail address disguised to prevent spam)