On 11/27/2024 4:52 AM, Thiago Adams wrote:
On 27/11/2024 06:53, David Brown wrote:
On 26/11/2024 20:42, Thiago Adams wrote:
On 26/11/2024 16:25, Bart wrote:
On 26/11/2024 19:11, Thiago Adams wrote:
On 26/11/2024 15:35, Thiago Adams wrote:
>
>
Then adding:
>
int strcmp();
>
int main() {
strcmp("a", "b");
}
>
it works in C99 / C11
>
I think in C23 empty parameter list means no args, while in the previous versions (void) means no args.
>
Considering that in previous versions of C we could call a function without its signature I think the compiler only needs the caller side. (of course I am not considering programmer mistakes)
>
So, I think one extra simplification for small compilers is to ignore function parameters.
>
I don't think so. But you are welcome to look at godbolt.org and see for yourself. Try this for example:
>
>
>
Yes..I realized now I am wrong. Considering function calls uses registers I think the old C model works only when passing everything on stack.
>
>
>
No, it should work for other calling conventions too. Passing everything on the stack has not been common practice for many decades, for most processor architectures.
>
What you have to consider here is the "default argument promotions". If a function is defined to take a "double", and you call it using an int expression without using a function prototype, the result is UB (in a real and practical sense, not just hypothetically). It doesn't matter if arguments are passed on the stack or registers, or if you have 32-bit or 64-bit or any other size of cpu (I don't know why Bart thinks that matters). It is still a disaster.
>
If you want to write or generate code that calls a function, you need to know /exactly/ what type the parameters are. And you need to call it with parameters of those types. You can do that by having a function prototype and letting the compiler make the appropriate implicit conversions (assuming they are allowed by the language), or you can manually add any required conversions (such as casts) before the call, or you can rely on the default argument promotions if you know the result will be the correct type.
>
Thanks for the comments. Very useful.
There is - to my knowledge - never a good reason for omitting a function prototype. Implicit function declaration was IMHO one of the biggest design flaws in pre-standard C, and allowing it to continue in C90 after prototypes were added to the language, was a serious mistake. Compilers should complain loudly if you try to call a function without a prototype declaration. (I believe Bart's compiler treats it as a fatal error - it is a non-conformity of which I approve.) And finally in C23 - some thirty years late - the standard finally requires proper prototypes.
>
I am not all sure why you are generating code C90 code here. I don't think anyone much cares about using strict C90 other than a couple of people in this newsgroup.
The objective is to leave all complications to a front end, and write a simpler C89 code that is used by the backend.
The advantage, comparing with a custom IL, is that the code will works in any C compiler.
Another option could be:
Front-end processes whatever language, producing an IR;
IR happens to support C as one of the targets.
If you want to produce C as output, then the IL/IR stage can be hidden within the compiler (though, less useful in my case, as C was the input language, and the target was to produce machine code).
The choice of IR design is an open issue.
For backend code generation, it is usually better to have some sort of Three-Address-Code, with SSA (Static Single Assignment) being a popular variant (my compiler uses a sort of "pseudo SSA" in the backend, *1).
*1: There is a split between "variable identity" and "sequence number", where all assignments to a given named variable (or temporary) have the same identity, but a different sequence number, where the sequence number distinguishes different "versions" of the variable (and phi operations are semi-implicit).
Conceptually, a phi operations would exist every time a variable is referenced within a basic block without first having been assigned within that basic block, with a list of all sequence numbers for the variable which "flow into" this basic block.
In BGBCC, I ended up with a Stack-IL for the main IL stage:
I did this first;
So, inertia.
Easier to produce from a language frontend;
Easier to serialize and reload;
SSA IR's are generally more complicated to work with here.
Fairly easy to convert into SSA form on decoding;
Backend works using 3AC/SSA.
...
Unlike free-form stack-languages (like Forth), generally in such a stack IR, the stack is always empty when branching or landing on a label (the .NET IL has a similar restriction). The stack doesn't actually push "values" so much as "the identity of the variable in question".
Stack manipulation operations like swapping stack items are essentially free.
Though, operations like DUP have two variants:
One duplicates the identity of the variable on the stack;
The other pushes a temporary holding a copy of the value.
Typically, stack operations do not encode type, as type information is carried along with the stack items (when converted to 3AC, each 3AC operator does have an associated type though).
One partial downside (vs a fully 3AC IL), is that the stack may impose extra awkwardness in some cases, and optimizations like "eliminating common sub-expressions" are harder to express in the front-end as then it needs to manage temporary values to some extent anyways (partially diminishing the simplicity advantage of a Stack-IL for the front-end).
Still better at the IL stage to use/ if-goto and labels as the primary abstraction, as structured constructs (like for or while loops) would make things much more of a pain.
...
There are a few differences, from JVM or .NET:
Function calls also involve pushing a "mark" onto the stack, which expresses the total argument count;
Arguments are evaluated/pushed in right-to-left order (partly for consistency with MS-DOS era compilers).
My existing format is also kind of crufty, but inertia has kept it as it is (and, in my case, is also mostly used for static libraries).
...
If designing an IR, it could possibly also make sense to make a simplified BASIC-like language:
c = a + b
c = func ( a , b )
...
While still leaving things like type-propagation and sequence numbering as internal (regenerated when the IR is loaded), in contrast to something like LLVM IR (which would mildly simplify the IR parser at the cost of making everything else more complicated).
Also, ideally, the binary representation of the IR should be defined in terms of a representation of the constructs in the IR, and *not* some scheme for binary serializing all of the structures that make up the IR internally.
Maybe also resist the temptation to be overly concerned with space used for the IR. Can probably use bytes and VLN, but avoid making the encoding overly complicated.
Say:
If c=a+b takes 4 or 5 bytes to encode when it could have been bit-twiddled down to 3 bytes, maybe better to stay with 4 or 5, say:
Opcode C A B
Opcode SubOp C A B
ADD C A B
SUB C A B
...
CALL C N Arg0 Arg1 .. ArgN-1
LABEL LBLID
GOTO LBLID
BEQ LBLID A B //if(a==b)goto LBLID;
...
With each using a VLN scheme, say:
00000000..0000007F: 1 byte
00000080..00003FFF: 2 bytes
00004000..001FFFFF: 3 bytes
...
Though, since opcodes are usually small numbers, one can maybe indulge:
0000..00EF: 1 byte
00F0..0FFF: 2 byte
1000..7FFF: 3 byte
Increasing the range of single-byte opcodes at the expense of larger valued numbers (where, 2-byte encodings overlapping with the 1-byte range encode the 3 byte range; and/or not bother, reasoning that one is not going to have that many opcodes).
And, for signed values, say:
uvl=(val<<1)^(val>>63); // fold sign into LSB
...
Variables are encoded as an index into an array of variables, then, say:
VAR A T //Creates local variable A with a type given in T
VARI A T V //creates and initializes a variable
LITI A T V //declare a local constant or literal value.
VARG A T GV //declare a reference to a Global Variable (Index|Name)
ARG A T V //declare a function argument (V = index in arg list).
V is the value given as a VLN
How V is interpreted depends on the type.
GV Global Variable
T, Type:
Types and string literals could be given as an offset into a strings table. Within a function, the variables list would be declared before the first non-variable in the function.
...