Stephen Fuld <
sfuld@alumni.cmu.edu.invalid> writes:
The idea is to add 32 bits to the processor state, one per register
(though probably not physically part of the register file) as a tag. If
set, the bit indicates that the corresponding register contains a
floating-point value. Clear indicates not floating point (integer,
address, etc.). There would be two additional instructions, load single
floating and load double floating, which work the same as the other 32-
and 64-bit loads, but in addition to loading the value, set the tag bit
for the destination register. Non-floating-point loads would clear the
tag bit. As I show below, I don’t think you need any special "store
tag" instructions.
...
But we can go further. There are some opcodes that only make sense for
FP operands, e.g. the transcendental instructions. And there are some
operations that probably only make sense for non-FP operands, e.g. POP,
FF1, probably shifts. Given the tag bit, these could share the same
op-code. There may be several more of these.
Certainly makes reading disassembler output fun (or writing the
disassembler). This reminds me of the work on SafeTSA [amme+01] where
they encode only programs that are correct (according to some notion
of correctness).
I think this all works fine for a single compilation unit, as the
compiler certainly knows the type of the data. But what happens with
separate compilations? The called function probably doesn’t know the
tag value for callee saved registers. Fortunately, the My 66000
architecture comes to the rescue here. You would modify the Enter and
Exit instructions to save/restore the tag bits of the registers they are
saving or restoring in the same data structure it uses for the registers
(yes, it adds 32 bits to that structure – minimal cost).
That's expensive in an OoO CPU. There you want each tag to be stored
alongside with the other 64 bits of the register, because they should
be renamed at the same time. So the ENTER instruction would depend on
all the registers that it saves (or maybe on all registers). And upon
EXIT the restored registers have to be reassembled (which ist not that
expensive).
I have a similar problem for the carry and overflow bits in
<
http://www.complang.tuwien.ac.at/anton/tmp/carry.pdf>, and chose to
let those bits not survive across calls; if there was a cheap solution
for the problem, it would eliminate this drawback of my idea.
The same
mechanism works for interrupts that take control away from a running
process.
For context switches one cannot get around the problem, but they are
much rarer than calls and returns, so requiring a pipeline drain for
them is not so bad.
Concerning interrupts, as long as nesting is limited, one could just
treat the physical registers of the interrupted program as taken, and
execute the interrupt with the remaining physical registers. No need
to save any architectural registers or their tag, carry, or overflow
bits.
That is as far as I got. I think you could net save perhaps 8-12 op
codes, which is about 10% of the existing op codes - not bad. Is it
worth it? To me, a major question is the effect on performance. What
is the cost of having to decode the source registers and reading their
respective tag bits before knowing which FU to use?
In in OoO CPU, that's pretty heavy.
But actually, your idea does not need any computation results for
determining the tag bits of registers (except during EXIT), so you
probably can handle the tags in the front end (decoder and renamer).
Then the tags are really separate and not part of the rgisters that
have to be renamed, and you don't need to perform any waiting on
ENTER.
However, in EXIT the front end would have to wait for the result of
the load/store unit loading the 32 bits, unless you add a special
mechanism for that. So EXIT would become expensive, one way or the
other.
@InProceedings{amme+01,
author = {Wolfram Amme and Niall Dalton and Jeffery von Ronne
and Michael Franz},
title = {Safe{TSA}: A Type Safe and Referentially Secure
Mobile-Code Representation Based on Static Single
Assignment Form},
crossref = {sigplan01},
pages = {137--147},
annote = {The basic ideas in this representation are:
variables are named as the pair (distance in the
dominator tree, assignment within basic block);
variables are separated by type, with operations
referring only to variables of the right type (like
integer and FP instructions and registers in
assemblers); memory references use types to encode
that a null-pointer check and/or a range check has
already occured, allowing optimizing these
operations; the resulting code is encoded (using
text compression methods) in a way that supports
only correct code. These ideas are discussed mostly
in a general way, with some Java-specifics, but the
representation supposedly also supports Fortran95
and Ada95. The representation supports some CSE, but
not for address computation operations. The paper
also gives numbers on size (usually a little smaller
than Java bytecode), and some other static metrics,
especially wrt. the effect of optimizations.}
}
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>