Robert Finch <
robfi680@gmail.com> writes:
[Context: carry and overflow in GPRs
<
https://www.complang.tuwien.ac.at/anton/tmp/carry.pdf>]
Been thinking some about the carry and overflow and what to do about
register spills and reloads during expression processing. My thought was
that on the machine with 256 registers, simply allocate a ridiculous
number of registers for expression processing, for example 25 or even
50. Then if the expression is too complex, have the compiler spit out an
error message to the programmer to simplify the expression. Remnants of
the ‘expression too complex’ error in BASIC. So, there are no spills or
reloads during expression processing.
The first question is how carry and overflow are represented in the
programming language.
Currently there are programming languages with growable integers, and
overflow is needed short-term for that, so spilling the overflow bit
is probably not necessary for that (and indeed, the one overflow bit
of AMD64 or ARM A64 that is not preserved across calls is good enough
for that).
For dealing with multiple-precision integers (e.g., when the growable
integers actually grow to more than one word), typically library
routines are used, but sure, one could also have a programming
language that computes with multi-precision integers and then is
compiled into either loops over the individual words of these numbers,
or it unrolls these loops (if the length is known in advance). Yes,
if you run out of registers there, you may want to spill and refill a
register, including its carry bit. But that should be rare, so if
it's an expensive operation, we can live with it.
What we have now is things like the GNU C extension
bool __builtin_add_overflow (type1 a, type2 b, type3 *res);
This produces two different results, the return value, and res. With
the kind of architecture I have in mind, these two results could be
allocated into the same register. If at some point the register has
to be spilled, the two results can be stored into different memory
locations, and on refill they will land in different GPRs unless the
compiler writer really puts a lot more work in than is merited (I
don't expect many spills and refills).
I think the storextra / loadextra
registers used during context switching would work okay. But in Q+ there
are 256 regs which require eight storextra / loadextra registers. I
think the store extra / load extra registers could be hidden in the
context save and restore hardware. Not even requiring access via CSRs or
whatever.
Yes. In my paper I wanted to spell out an implementation that does
not look like I am ignoring some hard problems and shove it over to
the implementor. If a computer architect wants to pick my idea up,
they are welcome to implement context-switching in any way they deem
appropriate.
I suppose context loads and stores could be done in blocks of
32 registers. An issue is that the load extra needs to be done before
registers are loaded.
Maybe, with 256 GPRs, you would use 8 storeextra and 8 loadextra
registers, each on associated with 32 registers. This avoids having
to make the whole process a sequential operation working on 32-GPR
blocks. Just store all 256 GPRs, sync (to get the storeextra
registers up-to-date, then store the 8 storeextra registers. For
context load, load the 8 loadextra registers, sync (so the loads of
the loadextra registers are finished), then the 256 GPRs.
Or alternatively just have 8 extra registers that are used for both
context stores and context loads. Then you cannot use the same sync
for both storing and loading, but you may prefer a little more
context-switch overhead to needing 16 extra registers.
Another thought is to store additional info such as a CRC check of the
register file on context save and restore.
Typically ECC memory and something similar in bus protocols achieve
what I guess you want to achieve with the CRC checks.
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>