On 4/3/2024 11:43 AM, Stephen Fuld wrote:
There has been discussion here about the benefits of reducing the number of op codes. One reason not mentioned before is if you have fixed length instructions, you may want to leave as many codes as possible available for future use. Of course, if you are doing a 16-bit instruction design, where instruction bits are especially tight, you may save enough op-codes to save a bit, perhaps allowing a larger register specifier field, or to allow more instructions in the smaller subset.
It is in this spirit that I had an idea, partially inspired by Mill’s use of tags in registers, but not memory. I worked through this idea using the My 6600 as an example “substrate” for two reasons. First, it has several features that are “friendly” to the idea. Second, I know Mitch cares about keeping the number of op codes low.
Please bear in mind that this is just the germ of an idea. It is certainly not fully worked out. I present it here to stimulate discussions, and because it has been fun to think about.
The idea is to add 32 bits to the processor state, one per register (though probably not physically part of the register file) as a tag. If set, the bit indicates that the corresponding register contains a floating-point value. Clear indicates not floating point (integer, address, etc.). There would be two additional instructions, load single floating and load double floating, which work the same as the other 32- and 64-bit loads, but in addition to loading the value, set the tag bit for the destination register. Non-floating-point loads would clear the tag bit. As I show below, I don’t think you need any special "store tag" instructions.
When executing arithmetic instructions, if the tag bits of both sources of an instruction are the same, do the appropriate operation (floating or integer), and set the tag bit of the result register appropriately.
If the tag bits of the two sources are different, I see several possibilities.
1. Generate an exception.
2. Use the sense of source 1 for the arithmetic operation, but perform the appropriate conversion on the second operand first, potentially saving an instruction
3. Always do the operation in floating point and convert the integer operand prior to the operation. (Or, if you prefer, change floating point to integer in the above description.)
4. Same as 2 or 3 above, but don’t do the conversions.
I suspect this is the least useful choice. I am not sure which is the best option.
Given that, use the same op code for the floating-point and fixed versions of the same operations. So we can save eight op codes, the four arithmetic operations, max, min, abs and compare. So far, a net savings of six opcodes.
But we can go further. There are some opcodes that only make sense for FP operands, e.g. the transcendental instructions. And there are some operations that probably only make sense for non-FP operands, e.g. POP, FF1, probably shifts. Given the tag bit, these could share the same op-code. There may be several more of these.
I think this all works fine for a single compilation unit, as the compiler certainly knows the type of the data. But what happens with separate compilations? The called function probably doesn’t know the tag value for callee saved registers. Fortunately, the My 66000 architecture comes to the rescue here. You would modify the Enter and Exit instructions to save/restore the tag bits of the registers they are saving or restoring in the same data structure it uses for the registers (yes, it adds 32 bits to that structure – minimal cost). The same mechanism works for interrupts that take control away from a running process.
I don’t think you need to set or clear the tag bits without doing anything else, but if you do, I think you could “repurpose” some other instructions to do this, without requiring another op-code. For example, Oring a register with itself could be used to set the tag bit and Oring a register with zero could clear it. These should be pretty rare.
That is as far as I got. I think you could net save perhaps 8-12 op codes, which is about 10% of the existing op codes - not bad. Is it worth it? To me, a major question is the effect on performance. What is the cost of having to decode the source registers and reading their respective tag bits before knowing which FU to use? If it causes an extra cycle per instruction, then it is almost certainly not worth it. IANAHG, so I don’t know. But even if it doesn’t cost any performance, I think the overall gains are pretty small, and probably not worth it unless the op-code space is really tight (which, for My 66000 it isn’t).
Anyway, it has been fun thinking about this, so I hope you don’t mind the, probably too long, post.
Any comments are welcome.
FWIW:
This doesn't seem too far off from what would be involved with dynamic typing at the ISA level, but with many of same sorts of drawbacks...
Say, for example, top 2 bits of a register:
00: Object Reference
Next 2 bits:
00: Pointer (with type-tag)
01: ?
1z: Bounded Array
01: Fixnum (route to ALU)
10: Flonum (route to FPU)
11: Other types
00: Smaller value types
Say: int/uint, short/ushort, ...
...
One issue:
Decoding based on register tags would mean needing to know the register tag bits at the same time the instruction is being decoded. In this case, one is likely to need two clock-cycles to fully decode the opcode.
ID1: Unpack instruction to figure out register fields, etc.
ID2: Fetch registers, specialize variable instructions based on tag bits.
For timing though, one ideally doesn't want to do anything with the register values until the EX stages (since ID2 might already be tied up with the comparably expensive register-forwarding logic), but asking for 3 cycles for decode is a bit much.
Otherwise, if one does not know which FU should handle the operation until EX1, this has its own issues. Or, possible, the FU's decide whether to accept the operation:
ALU: Accepts operation if both are fixnum, FPU if both are Flonum.
But, a proper dynamic language allows mixing fixnum and flonum with the result being implicitly converted to flonum, but from the FPU's POV, this would effectively require two chained FADD operations (one for the Fixnum to Flonum conversion, one for the FADD itself).
Many other cases could get hairy, but to have any real benefit, the CPU would need to be able to deal with them. In cases where the compiler deals with everything, the type-tags become mostly moot (or potentially detrimental).
But, then, there is another issue:
C code expects C type semantics to be respected, say:
Signed int overflow wraps at 32 bits (sign extending);
Unsigned int overflow wraps at 32 bits (zero extending);
Variables may not hold values out-of-range for that type;
The 'long long' and 'unsigned long long' types are exactly 64-bit;
...
...
If one has tagged 64-bit registers, then fixnum might not hold the entire range of 'long long'. If one has 66 or 68 bit registers, then memory storage is a problem.
If one has untagged registers for cases where they are needed, one has not saved any encoding space.
And, if one type-tags statically-typed variables, there no real "value-added" here (and saving a little encoding space at the cost of making the rest of the CPU more complicated and expensive, isn't much of a win).
Better as I see it, to leave the CPU itself mostly working with raw untagged values.
It can make sense to have helper-ops for type-tags, but these don't save any encoding space, but rather making cases for dealing type-tagged data a little faster.
Say:
Sign-extending a fixnum to 64 bits;
Setting the tag bits for a fixnum;
Doing the twiddling to convert between Flonum and Double;
Setting the tag for various bit patterns;
Checking the tag(s) against various bit patterns;
...
Where, on a more traditional ISA, the logic to do the bit-twiddling for type-checking and tag modification are a significant part of the runtime cost of a dynamically typed language.
With luck, one can have dynamic typing that isn't horribly slow.
But, one still isn't likely to see serious use of dynamic typing in systems-level programming (if anything, Haskell style type-systems seem to be more in fashion in this space at present, where trying to get the code to be accepted by the compiler is itself an exercise in pain).
Well, and those of use who would prefer a more ActionScript or Haxe like approach in a systems-level language (at least as an option for when it is useful to do so) are likely kind of the minority.
Well, and having a C dialect where one can be like, say:
__variant obj;
obj = __var { .x=3, .y=4 }; //ex-nihilo object
obj.z=5; //implicitly creates a new 'z' field in the object.
Is, not exactly standard...
And, I end up limiting its use, as any code which touches this stuff can only be compiled in BGBCC (for example, getting the TestKern core to build in GCC ended up needing to disable things like the BASIC interpreter and similar, as I had used some of these features to implement the interpreter).
Though, personally I still prefer being a little more strict than JS/ES in some areas, like any of ""==null or 0==null or 0=="" or null=="null" or similar being true, is a misfeature as far as I am concerned (my original scripting language had quietly dropped this feature, despite otherwise resembling JS/ES).
Though, in my case, the ISA is not tagged, so all of this stuff is built on top of implicit runtime calls. There is not currently a garbage collector, but adding stuff to support precise GC could be possible in theory (and in my current project would be a tempting alternative to the use of conservative GC, if albeit likely neither could be used in a real-time context).
In one of my own languages, I had instead defined rules to try to allow the compiler to figure out object lifetimes in various cases (how an object is created implicitly also gave some information about its semantics and lifetime).
...