On 7/6/2024 5:41 PM, Ben Bacarisse wrote:
BGB <cr88192@gmail.com> writes:
On 7/5/2024 5:40 PM, Ben Bacarisse wrote:
BGB <cr88192@gmail.com> writes:
>
On 7/5/2024 6:20 AM, Ben Bacarisse wrote:
BGB <cr88192@gmail.com> writes:
While eliminating structs could also simplify things; structs also tend to
be a lot more useful.
Indeed. And I'd have to use them for this!
>
>
Errm, the strategy I would assume is, as noted:
int a[4][4];
...
l=a[j][k];
Becomes:
int a[16];
...
l=a[j*4+k];
That's what you want to force me to write, but I can use and array of
arrays despite your arbitrary ban on them by simply putting the array in
a struct.
...
IN most contexts, I don't really see how a struct is preferable to a
multiply, but either way...
And I can't see how an array of arrays is harder for your compiler than
an array of structs. C's indexing requires the compiler to know that
size of the items pointed to.
I suspect that there is something amiss with your design if you are
considering this limiting in order to simplify the compiler. A simple
compiler should not care what kind of thing p points to in
p[i]
only what size of object p points to.
When I designed the compiler code, the initial approach for internal type layout was to bit-pack it into 32 bits, say (a lot of this is from memory, so maybe wrong):
Basic1
(31:28): Layout of Type (0=Basic)
(27:16): Array Size
(15:12): Pointer Level Count
(11: 0): Base Type
Basic2
(31:28): Layout of Type (1=Basic2)
(27: 8): Array Size
( 7: 6): Pointer Level Count
( 5: 0): Base Type
Basic3
(31:28): Layout of Type (2=Basic3)
(27:24): Array Size
(23:20): Pointer Level Count
(19: 0): Base Type
Overflow
(31:28): Layout of Type (3=Overflow)
(27:24): MBZ
(23: 0): Index into Type-Overflow Table
And, a few other cases...
Basic1 was the default, able to express arrays from 0..4095 elements, with 0..7 levels of pointer indirection, and 0..4095 for the base type.
Where, 0=T, 1=T*, 2=T**, ..., 7=T*******
8=T[], 9=T[][], A=T*[], B=T*[*], C=&T, ...
Note that at present, there is no way to express more than 7 levels of pointer indirection, but this issue hasn't come up in practice.
Basic2 is for big arrays of a primitive type, 0..3 pointer levels. May only encode low-numbered primitive types.
Basic3 is the opposite, able to express a wider range of types, but only small arrays.
There is another variant of Basic1 that splits the Array Size field in half, with a smaller array limit, but able to encode const/volatile/restrict/etc (but only in certain combinations).
Overflow would be used if the type couldn't fit into one of the above, the type is then expressed in a table. It is avoided when possible, as overflow entry tables are comparably expensive.
Type Numbering space:
0.. 63: Primitive Types, Higher priority
64.. 255: Primitive Types, Lower priority
256 .. 4095: Complex Types, Index into Literals Table
4096..1048575: Complex Types, Index into Literals Table
Small numbered base types were higher priority:
00=Int, 01=Long(64bit), 02=Float, 03=Double,
04=Ptr(void*), 05=Void, 06=Struct(Abstract), 07=NativeLong
08=SByte, 09=UByte, 0A=Short, 0B=UShort,
0C=UInt, 0D=ULong, 0E=UNativeLong, 0F=ImplicitInt
Followed by, say:
10=Int128, 11=UInt128, 12=Float128/LongDouble, 13=Float16,
...
Where, Type Number 256 would map to index 0 in the Literal Table.
An index into the literals table will generally be used to encode a Struct or Function Pointer or similar. This table will hold a structure describing the fields of a struct, or the arguments and return value of a function pointer (in my BS2 language, it may also define class members, a superclass, implemented interfaces, ...).
It could also be used to encode another type, which was needed for things like multidimensional arrays and some other complex types. But, this seemed like an ugly hack... (And was at odds with how I imagined types working, but seemed like a technical necessity).
These would often be packed inside of a 64-bit register/variable descriptor.
Local Variables:
(63:56): Descriptor Type
(55:24): Variable Type
(23:12): Sequence Number
(11: 0): Identity Number
Global Variables:
(63:56): Descriptor Type
(55:24): Variable Type
(23: 0): Index into Global Table
Integer Literal:
(63:56): Descriptor Type
(55:32): Compact Type
(31: 0): Value
String Literal:
(63:56): Descriptor Type
(55:32): Compact Type
(31: 0): Offset into String Table
There were various other types, representing larger integer and floating point types:
Long and Double literals, representing the value as 56 bits
Low 8 bits cut off for Double
An index into a table of raw 64-bit values (if it can't be expressed directly as one of the other options).
Values for 128-bit types were expressed as an index pair into the table of 64-bit values:
(63:56): Descriptor Type
(55:48): Primitive Type
(47:24): Index into Value Table (High 64 bits)
(23: 0): Index into Value Table (Low 64 bits)
One downside as-is, is that if a given variable is assigned more than 4096 times in a given function, it can no longer be given a unique ID. Though uncommon, this is not entirely implausible (with sufficiently large functions), and there isn't currently any good way to deal with this (apart from raising a compiler error).
This can happen potentially in large functions. Taking away bits from the base ID isn't good either, as functions pushing 1000+ local variables aren't entirely implausible either (though, thus far, not really seen any with much more than around 400 local variables, but still...).
Can note though, that generally, there seems to be an upper limit of 18 to 23 for the maximum number of function arguments in the code I have tested.
Sequence numbers don't currently apply to global variables, where only one instance of a global variable can exist at a time and global variables are always synchronized at the end of a basic block.
Thus far, not dealt with any programs that push the current hard limit of 16M global declarations (at present, ~ 30-60k seems more typical).
Where, there would be multiple such fields in each 3AC op, say (copy pasting):
struct BGBCC_CCXL_VirtOp_s {
byte opn; //opcode number
byte opr; //operator
byte prd; //predication mode
byte immty; //immediate type
byte tgt_mult; //branch target multiplier
byte llvl; //Loop Level
ccxl_type type; //destination type of operation
ccxl_type stype; //source type of operation
ccxl_register dst; //destination
ccxl_register srca; //first source operand
ccxl_register srcb; //second source operand
ccxl_register srcc; //third source operand
ccxl_register srcd; //fourth source operand
ccxl_value imm; //operation immediate (union)
};
These operations are held in arrays, with spans within this array used to define the various basic-blocks within a function.
The code for dealing with a lot of this got rather hairy...
But, as noted, the 3AC IR only exists in memory.
In the IR, the operations are expressed as a sort of linear bytecode operating on a virtual stack machine; with types expressed as ASCII strings.
Logically, the stack holds "ccxl_register" values, and the number of 3AC ops is typically less than the number of stack-machine operations (those which exist simply to shuffle registers around essentially disappear in the translation process).
Say, for example:
LOAD x
LOAD y
ADD
STORE z
Might turn into a single 3AC operation.
Note that (unlike a language like Forth or similar) generally control flow may not be used to convey varying sets of items on the stack.
Note that in most cases, typically the stack is empty during any control-flow operations.
Whole language design is still a hypothetical at this point anyways (and my
actual compiler does support multidimensional arrays).
Ah. I think that when you've written most of it, you will see that
ruling out arrays of arrays will have not simplifying effect.
I guess it is possible...
Well, and/or there might just be enough hair that something like this will disappear into the noise.