Re: Banked register files

Liste des GroupesRevenir à c arch 
Sujet : Re: Banked register files
De : ggtgp (at) *nospam* yahoo.com (Brett)
Groupes : comp.arch
Date : 30. Aug 2024, 01:31:56
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <var0db$5t51$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12
User-Agent : NewsTap/5.5 (iPad)
MitchAlsup1 <mitchalsup@aol.com> wrote:
On Tue, 27 Aug 2024 23:51:59 +0000, Brett wrote:
 
MitchAlsup1 <mitchalsup@aol.com> wrote:
On Mon, 26 Aug 2024 21:10:48 +0000, Brett wrote:
 
Brett <ggtgp@yahoo.com> wrote:
Robert Finch <robfi680@gmail.com> wrote:
On 2024-08-22 5:58 p.m., Brett wrote:
Brett <ggtgp@yahoo.com> wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
 
I saw a design where there was an attempt to process basic blocks in
parallel silos feeding functional units. It made use of fewer registers
by holding data in pipeline registers instead of GPRs which it could do
since some of the data for a basic block never goes outside the block.
 
No reply’s, so I figure y’all are under NDA. ;)
 
It has been well known since mid 1990s that most loops end up with a
single
or dual stream of self dependent instructions and few loop dependencies
{mostly the loop index itself}. This leads to instruction dependency
graphs (and execution times) that look like::
 
| LD  |
| LD  |
|    FMUL    |
|    FADD    |
| STA |                         | STD |
| ADD |
| CMP |
| BV  |
------------------------------------------------------------
| LD  |
| LD  |
|    FMUL   |
 
 
To even out the cluster load you would want the compiler to unroll once,
first bank second, then second bank first.
 
Can also be done without compiler by mapping the links on the second
pass of a loop.
 
The above is done with simple reservation stations and no compiler work.
 
I am assuming clusters or banks as naming and issue width continue
growing.
 
Once you start doing reservation station machines, your 72-entry banked
register file needs to have the RSs watch 72 results instead of just 32.


ALU’s are cheap, so each bank has its own set.
You can forward and complete twice as many results.

The traditional problem of banking is a one cycle delay crossing banks, a
compiler can fix that, a CPU cannot on first pass.


Date Sujet#  Auteur
19 Aug 24 * Banked register files14Brett
20 Aug 24 `* Re: Banked register files13MitchAlsup1
20 Aug 24  `* Re: Banked register files12Brett
20 Aug 24   `* Re: Banked register files11MitchAlsup1
20 Aug 24    +* Re: Banked register files9Brett
22 Aug 24    i`* Re: Banked register files8Brett
24 Aug 24    i `* Re: Banked register files7Robert Finch
24 Aug 24    i  `* Re: Banked register files6Brett
26 Aug 24    i   `* Re: Banked register files5Brett
27 Aug 24    i    `* Re: Banked register files4MitchAlsup1
28 Aug 24    i     `* Re: Banked register files3Brett
28 Aug 24    i      `* Re: Banked register files2MitchAlsup1
30 Aug 24    i       `- Re: Banked register files1Brett
22 Aug 24    `- Re: Banked register files1mac

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal