Re: 88xxx or PPC

Liste des GroupesRevenir à c arch 
Sujet : Re: 88xxx or PPC
De : mitchalsup (at) *nospam* aol.com (MitchAlsup1) (mitchalsup@aol.com (MitchAlsup1))
Groupes : comp.arch
Date : 09. Mar 2024, 06:17:04
Autres entêtes
Organisation : Rocksolid Light
Message-ID : <f40fa64b4d719b47fb3ab79ca334ebc3@www.novabbs.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
User-Agent : Rocksolid Light
Paul A. Clayton wrote:

On 3/6/24 3:00 PM, MitchAlsup1 wrote:
Paul A. Clayton wrote:
[snip]
                                            It seems that 64-bit
stack-pointer-relative accesses could be roughly as fast by using
the offset as the index (each stack frame would be comparable to a
different thread register context; the tradeoffs of extra storage
for multiple stack frames ("multithreading" — alternating between
indexing up and indexing down would provide some utilization
flexibility with low indexing overhead) relative to pushing out
early frames (normal "context switch"); such a cache would
probably be limited in frame size cached.
 Smells too much like register windows which never outperformed
the flat RF from MIPS. In any event, 50% of subroutines need no
stack <accesses> and those that do typically only store 3 registers
(for restore later).

Register windows were intended to avoid save/restore overhead by
retaining values in registers with renaming. A stack cache is
meant to reduce the overhead of loads and stores to the stack —
not just preserving and restoring registers. A direct-mapped stack
cache is not entirely insane. A partial stack frame cache might
cache up to 256 bytes (e.g.) with alternating frames indexing with
inverted bits (to reduce interference) — one could even reserve a
chunk (e.g., 64 bytes) of a frame and not overlapped by limiting
offset cached to be smaller than the cache.

Such might be more useful than register windows, but that does
not mean that it is actually a good option.
If it is such a good option why has it not reached production ??

An L2 register set that can only be accessed for one operand might be somewhat similar to LD-OP.
 In high speed designs, there are at least 2 cycles of delay from AGEN
to the L2 and 2 cycles of delay back. Even zero cycle access sees at
least 4 cycles of latency, 5 if you count AGEN.

Presumably this is related to the storage technology used as well as the capacity.
Purely wire delay due to the size of the L2 cache.

>

Date Sujet#  Auteur
9 Mar 24 * Re: 88xxx or PPC3Paul A. Clayton
9 Mar 24 `* Re: 88xxx or PPC2mitchalsup@aol.com (MitchAlsup1)
20 Apr 24  `- Re: 88xxx or PPC1Paul A. Clayton

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal