Re: Banked register files

Liste des GroupesRevenir à c arch 
Sujet : Re: Banked register files
De : ggtgp (at) *nospam* yahoo.com (Brett)
Groupes : comp.arch
Date : 26. Aug 2024, 23:10:48
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vair0o$2k32g$1@dont-email.me>
References : 1 2 3 4 5 6 7 8
User-Agent : NewsTap/5.5 (iPad)
Brett <ggtgp@yahoo.com> wrote:
Robert Finch <robfi680@gmail.com> wrote:
On 2024-08-22 5:58 p.m., Brett wrote:
Brett <ggtgp@yahoo.com> wrote:
MitchAlsup1 <mitchalsup@aol.com> wrote:
On Mon, 19 Aug 2024 23:23:11 +0000, Brett wrote:
 
MitchAlsup1 <mitchalsup@aol.com> wrote:
On Mon, 19 Aug 2024 21:46:07 +0000, Brett wrote:
 
Banked register files, a mental exercise at expanding the register file.
 
 
Four Banked register files, a mental exercise at expanding the register
file.
 
With three operand RISC you have you have three 5 bit register specifiers
using 15 bits.
 
If instead you have four banks of sixteen registers you have a 2 bit bank
specifier and three 4 bit register specifiers with one override bit for the
destination for 15 total bits, the same as a 32 register RISC chip. The
override bit specifies bank zero for destination, 64 total registers.
 
Two operand plus 16 bit offset instructions would need to sacrifice one bit
of offset. Four operand instructions would save two bits, quite useful.
 
Addressing can be done from any bank.
 
The compiler can handle large banks easily, simple dependency grouping and
if you need more than 16 registers for a single calculation you use the
Base override flag to total to the base registers. So you have two chains
that total in two banks and both write to the base registers where the last
total of the two chains are added.
 
Call and return parameters are in the base registers.
Simple code only uses base registers, or base plus one bank.
 
On the mental status, having multiple banks means you can have multiple ALU
clusters and rename. You are no longer limited to 9 way rename and 12ish
way issue, but a multiple of that. The limits are load and store bandwidth,
and some added latency to coordinate.  Lots of money would get piled into
compilers to maximize even bank use.
 
Is this a good idea, i think so, but this is a mental exercise, it proves I
am mental. ;)
 
How does banked compare to high registers? Slightly better.
Intel could pull off something like this to one up ARM. A new fixed width
instruction set with a nice patent moat, and fits the x86 mindset.
 
Yes you can do
Rd,[Rbase+Rindex<<scale+LargeDisplacement]
Large displacements would be in extension words like My 66000.
Nothing stops you from doing add from memory, besides being costly in
opcode bits and die size.
 
 
 
 
Using BRAMs usually allows for a lot more registers than make sense in
an architecture. Makes one wonder what to do with the extra registers.
The MOV instruction can be made to use more bits for the register spec
allowing transfers between banks of registers. Since MOV needs only two
register specs instead of three, there are more bits available.
 
FPGA Block Memory, had to look it up, not a hardware or embedded guy.
 
There is a popular embedded CPU with dual register files and dual
operations.
Not what I was going for, but there are possibilities there. On the high
end you crack the dual instructions and let them execute out of order in
the different bank pipes. This gives you 128 bit vector ops on a 64 bit cpu
with multiple banks. This would be a completely different instruction set
from what I was proposing, but fits in the same encoding. You just need two
types of load pair, etc.
 
Personally I would do the MIPS thing and make all registers 128 bits, but
this gives you 256 bit vectors of a sort with the banks.
 
I have been experimenting with the idea of having a smaller register
file so fewer encoding bits, and then making up for the small file by
having more dedicated registers. For instance, 16 regs with 2
independent link register, eight condition code registers, and a stack
pointer. That really gives over 20 registers, which might be enough for
reasonable compiles.
 
I saw a design where there was an attempt to process basic blocks in
parallel silos feeding functional units. It made use of fewer registers
by holding data in pipeline registers instead of GPRs which it could do
since some of the data for a basic block never goes outside the block.

No reply’s, so I figure y’all are under NDA. ;)

So I posted over on Real World Tech my prediction that Intel APX is not 32
general registers, but two separate banks of 16 registers with their own
pipelines. ;)

Hilarious post. ;)


Date Sujet#  Auteur
19 Aug 24 * Banked register files14Brett
20 Aug 24 `* Re: Banked register files13MitchAlsup1
20 Aug 24  `* Re: Banked register files12Brett
20 Aug 24   `* Re: Banked register files11MitchAlsup1
20 Aug 24    +* Re: Banked register files9Brett
22 Aug 24    i`* Re: Banked register files8Brett
24 Aug 24    i `* Re: Banked register files7Robert Finch
24 Aug 24    i  `* Re: Banked register files6Brett
26 Aug 24    i   `* Re: Banked register files5Brett
27 Aug 24    i    `* Re: Banked register files4MitchAlsup1
28 Aug 24    i     `* Re: Banked register files3Brett
28 Aug 24    i      `* Re: Banked register files2MitchAlsup1
30 Aug 24    i       `- Re: Banked register files1Brett
22 Aug 24    `- Re: Banked register files1mac

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal