John Levine <
johnl@taugh.com> writes:
Nearly all opcodes were one byte other than the extended format floating point
instructions so it's hard to see how they could have made that much smaller
without making it a lot more complicated.
One can look at IA-32 and compare the instruction lengths for frequent
instructions like "add %reg1,%reg2", "add const,%reg", "mov (%reg1),
%reg2", and "mov %reg1, (%reg2)" are. I expect that they are shorter
than on the VAX (exception: if the constant fits in 16 bits, but not
in 8). Of course there is a difference: VAX has 16 GPRs and IA-32 has
only 8. AMD64 has 16 GPRs, and needs a REX prefix byte, but only if
one of the additional registers is used (or 64-bit operation is
needed), so for the frequent cases it probably still has shorter
encodings on average than the VAX, especially if compilers prefer
using the first 8 registers. For three-operand addition, IA-32/AMD64
has lea, and other three-operand instructions are not that common.
IA-32 has even shorter encodings for some operations on %eax (stemming
from the need for compactness on the 8080 and the 8086, and the fact
that assembly-language programmers are good at exploiting such
things). One can use this to make the code even shorter by trying to
get the compiler to use %eax for instructions where such encodings
exist. Alternatively, one could reassign this encoding space for some
other purpose, e.g., avoiding the REX prefix in some cases.
Another opportunity for shorter instructions is that IA-32/AMD64
supports byte-width register-to-register operations. These encodings
are unnecessary and can be reused for better purposes.
Another opportunity for making code shorter is that IA-32/AMD64 has
redundant encodings for register-to-register operations: e.g., "sub
%ecx, %edx" can be encoded with the first byte bein 0x29 or 0x2b (they
make a difference if one of the operands is in memory). These
encodings can be reused; one possibility would be to support only
load-and-op instructions, not read-modify-write instructions; the the
first byte 29 (for sub, and similar for the other operations) can be
used for a different purpose, e.g., avoiding the need for a REX
prefix.
One idea I have had is that many instructions encode for a source
register the same register as the target register of the previous
instruction. One could just refer to the target of the previous
instruction and thus save encoding space. The downside is that such
instructions no longer are complete, but need the previous instruction
to be decoded, which complicates interrupts and various tools.
Bottom line: IA-32 is probably more compact than VAX, and even for
IA-32 one can think of various ways to possibly make it even more
compact.
And looking at my latest code size measurements
<
2024Jan4.101941@mips.complang.tuwien.ac.at>, both armhf (ARM T32) and
riscv64 (RV64GC) result in shorter code than IA-32 and AMD64:
bash grep gzip
595204 107636 46744 armhf
599832 101102 46898 riscv64
796501 144926 57729 amd64
853892 152068 61124 i386
Apparently the additional registers of AMD64 (or maybe the different
calling convention) result in smaller code than IA-32 despite having
to use REX prefixes not only if the additional registers are used, but
also if 64-bit width is required.
The 16-bit wide encodings of ARM T32 and the RISC-V C extension
apparently catch many common cases. These load/store architectures
avoids the encoding waste of having several operation widths[1], and
redundant encodings for register-to-register operations. However, in
those cases where load-and-op instructions are useful, they need to
encode an intermediate register, twice. In those cases where
read-modify-write instructions are useful, they need to encode an
intermediate register, 4 times, and the memory operand a second time;
but obviously on the balance these instruction sets are more compact.
[1] RV64G includes 32-bit wide register-register ops that I consider
unnecessary: Usually the top 32 bits of 32-bit operations are not
used, and then one can just use the 64-bit version. In the few cases
where they are used, a 64-bit operation followed by a sign extension
will produce the same result. But maybe the RISC-V architects have
data that shows that the top 32 bits are used more often than I
expect; maybe in C code with int variables that are used for indexing
arrays (in that case we can thank the people who decided to go with
I32LP64 (rather than ILP64) for that).
- anton
-- 'Anyone trying for "industrial quality" ISA should avoid undefined behavior.' Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>