Sujet : Re: Arguments for a sane ISA 6-years later
De : paaronclayton (at) *nospam* gmail.com (Paul A. Clayton)
Groupes : comp.archDate : 28. Jul 2024, 02:01:59
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v845oc$3kaup$2@dont-email.me>
References : 1 2 3
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.0
On 7/25/24 6:07 PM, MitchAlsup1 wrote:
On Thu, 25 Jul 2024 20:09:06 +0000, BGB wrote:
On 7/24/2024 3:37 PM, MitchAlsup1 wrote:
[snip]
D) exception and interrupt control transfer should take no more
..than 1 cache line read followed by 4 cache line reads to the
..same page in DRAM/L3/L2 that are dependent on the first cache
..line read. Control transfer back to the suspended thread should
..be no longer than the control transfer to the exception handler.
[snip]
A fast, but more expensive, option would be to have multiple copies of
the register file which is then bank-switched on an interrupt.
Under My 66000 a low end implementation can choose the write back cache
version, while the GBOoO implementation can choose the bank switcher.
In both cases, the same model is presented to executing SW.
I do not know at what port count a "3D register file" (temporal
banking where extra storage "hides" under the wires) makes sense.
I suspect the 3-read, 1-write register file of a low end My 66000
implementation would have the overhead be too great unless lower
overhead context switching was extremely important.
Another technique for reducing storage overhead in highly ported \
register files is to use checkpoint registers that connect to the
highly ported cells. (The paper that proposed this — I am not
certain I can find it again — did not swap the values, only
allowing push and pop and that at a depth of one. IIRC, this was
proposed for speculatively dead values, so more storage would be
available for in-use values. For short interrupts this might be
usable — with save to cache/memory if the context is still alive
when a different context is scheduled to run.) I do not know if
a ring buffer could be designed that might allow coarse-grained
barrel-like processing (or even a race track memory of slower
arbitrary context switching), but that seems unlikely to be
useful (except possibly in some rather special purpose processor).
In general it seems that one would only want contexts to be
"cached" in registers (even with clever storage cost reductions)
if switches are frequent or the latency was especially critical.
A small core something like the CDC 6600 Peripheral Processor
might justify multithreading at finer granularity than through
cache-based context swapping.
I also *feel* that reduced contexts could have some utility.
Some threads have low ILP and do not benefit as much from
extra register state; even moderately coarse grained
hardware multithreading might bring performance benefits
and reduced contexts could reduce the switch overhead. Of
course, such implies two (slightly) different ISAs.