Re: transpiling to low level C

Liste des GroupesRevenir à cl c  
Sujet : Re: transpiling to low level C
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.lang.c
Date : 18. Dec 2024, 19:50:20
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vjv5hg$2ds8r$1@dont-email.me>
References : 1 2 3 4 5 6 7 8
User-Agent : Mozilla Thunderbird
On 12/18/2024 6:08 AM, bart wrote:
On 17/12/2024 18:51, BGB wrote:
On 12/17/2024 6:04 AM, bart wrote:
 
C can apparently compile to WASM via Clang, so I tried this program:
>
  void F(void) {
     int i=0;
     while (i<10000) ++i;
  }
>
which compiled to 128 lines of WASM (technically, some form of 'WAT', as WASM is a binary format). The 60 lines correspondoing to F are shown below, and below that, is my own stack IL code.
 I'm not even sure what format that code is in, as WAT is supposed to use S-expressions. The generated code is flat. It differs in other ways from examples of WAT.
 
Dunno there...
It looks like WASM has changed slightly from what I remember when I originally looked at it, so it could be "possible" if it could be made to support separate compilation and similar.

Hmm... It looks like the WASM example is already trying to follow SSA rules, then mapped to a stack IL... Not necessarily the best way to do it IMO.
 I hadn't considered that SSA could be represented in stack form.
 But couldn't each push be converted to an assignment to a fresh variable, and the same with pop?
 As for Phi functions, the only similar thing I encounter (but could be mistaken), is when there is a choice of paths to yield a value (such as (c ? a : b) in C; my language has several such constructs).
 
I was mostly noting that it appeared that every operation was creating a new variable and only assigning to it once.
I didn't look too much more closely than this, only to note that it was different.

With stack code, the result conveniently ends up on top of the stack whichever path is taken, which is a big advantage. Unless you then have to convert that to register code, and need to ensure the values end up in the same register when the control paths join up again.
 
With JVM, the rule was that all paths landing at the same label need to have the same stack depth and same types.
With .NET, the rule was that the stack was always empty, any merging would need to be done using variables.
BGBCC is sorta mixed:
In most cases, it follows the .NET rule;
A special-case exception exists mostly for implementing the ?: operation (which in turn has special stack operations to signal its use).
BEGINU  // start a ?: operator
L0:
...  //one case
SETU
JMP L2
L1:
... //other case
SETU
JMP L2
ENDU
L2:
This is a bit of wonk, if I were designing it now, would likely do it the same as .NET, and use temporary variables.
Actually, I might be tempted to use a 3AC IR as well (though, probably non-SSA). And, probably design things a bit differently.
In this case, if I did a 3AC IR, might design a textual syntax along similar lines to BASIC or FORTRAN 77 (albeit probably without the fixed-column formatting or line numbers).
Though, the nominal format for use in the compiler would remain binary.

>
But, yeah, in BGBCC I am also using a stack-based IL (RIL), which follows rules more in a similar category to .NET CIL (in that, stack items carry type, and the stack is generally fully emptied on branch).
>
>
In my IL, labels are identified with a LABEL opcode (with an immediate), and things like branches work by having the branch target and label having the same immediate (label ID).
 So, you jump to label L123, and the label looks like:
    L123:
 
Yeah, in textual form.
Though, the label is internally represented as, say:
   LABEL 123
IIRC, usually numbering starts over from 0 for each function, though in the backend IR all labels get a unique number within a 24-bit numbering space.
The labels are then split into several categories:
Global labels, used to identify functions/variables, with an associated name;
IL labels, which were mapped over from the RIL bytecode;
Temporary labels, which exist solely in the backend;
Line numbers, not true labels, mostly exist to convey line-number info (associated with a file-name and line number);
Special/Architectural, used as placeholders for things like CPU registers (for variable load/store).

I think that is pretty standard! But it sounds like you use a very tight encoding for bytecode, while mine uses a 32-byte descriptor for each IL instruction.
 (One quibble with labels is whether a label definition occupies an actual IL instruction. With my IL used as a backend for static languages, it does. And there can be clusters of labels at the same spot.
 With dynamic bytecode designed for interpretation, it doesn't. It uses a different structure. This means labels don't need to be 'executed' when encountered.)
 
In my interpreters, it always uses a bytecode operation.
However, apart from my very early interpreters, typically the stack IL is not used directly.
So, a personal timeline was like:
   2003/2004: BGBScript came into existence
     First version used DOM and directly walked the DOM tree.
     Used a GC, generated lots of garbage objects;
     Syntax was based on JavaScript with some wonk;
     Was horridly slow.
   2006:
     BGBScript VM (BS-VM) was rewritten to S-Expressions internally;
       Dropped some of the original wonk, moving to a cleaner JS syntax;
     Went to a bytecode interpreter.
   2007:
     BGBCC was written using the frontend from the 2003 VM as a base;
     The IL design was based on 2006 BS-VM;
     Replaced the original DOM with a custom stand-in;
       Used parts of the 2006 VM as well.
   2009:
     The BS-VM was modified to turn the stack IL into 3AC and run this;
     Also had a JIT and similar by this point;
     Using 3AC and JIT made things significantly faster;
     Also tended to leak a lot less garbage,
       operating mostly at "steady state".
     Syntactically, it had become more like ActionScript3 or HaXE.
   2013: Created BGBScript2 (BS2)
     This mostly resembled a Java/C#/AS3 hybrid;
       Eliminated the GC in favor of primarily static + manual MM.
   2015/2016: Created the BGBTech2 3D engine
     Partly written in a mix of C and BGBScript2
     Was my biggest project to use BS2
Then:
   2017: Started on my BJX1 project
     Revived BGBCC, used it as the compiler.
   2019: Rebooted the project to BJX2.
     BJX1 quickly turned into a huge mess
       which was non-viable to implement in an FPGA.
   Until now, BJX2 project has continued.
Some stuff following the design of the BS2 VM was back-ported onto BGBCC, but in many ways, BGBCC has a lot more cruft.
In the BS2 VM, the image format is a TLV container.
   There is a string table, data area for functions/etc;
   Index tables;
   ...
Generally, functions could be loaded and converted to 3AC on demand.
The IL in the BS2 VM was not a pure stack machine, but more like:
   OP with 2 stack args, stack dest (common with BGBCC)
   OP with 2 stack args, local dest (common with BGBCC)
   OP with 2 local args, stack dest
   OP with 2 local args, local dest (like in 3AC)
   OP with local and immediate, stack dest
   OP with local and immediate, local dest
   OP with local and stack, stack dest
   OP with local and stack, local dest
This was more complicated, but reduced the number of IL operations. Internally, it all converted to 3AC for the backend interpreter.
The incentive to do this for BGBCC was less, as folding the local-variable or constant-loads into the operator is less immediately beneficial to a compiler; but does make the bytecode loader more complicated. Folding the destination register into the bytecode ops in many cases is still relevant, as it is comparably harder to fold the destination-store into the 3AC op than to fold a source load.
Generally, bytecode ops and operands were encoded with VLNs (variable length numbers).
Generally (numberic VLN):
   00..7F: 0..127
   00..BF XX: 128..16383
   C0..DF XX XX: 16384..2M
   ...
These values were encoded in MSB first order, and could directly represent values up to 64 bits (in both the BS2VM and BGBCC, 128-bit values tend to be represented as pairs of 64-bit values).
For signed integer values, the sign was folded into the LSB.
   Floating point values were represented as a base/exponent VLN pair.
   Basically, an integer value scaled by a power-of-2 exponent.
Opcodes were different, IIRC:
   00..DF: Single Byte
   E0..EF: Two Byte (224..4095)
   F0..F7: Three Byte
   ...
But, generally, only 1 and 2 byte cases were used.
IIRC, did not define a textual notation for the BS2VM's ASM.
Local variables, labels, etc, were all identified as numeric indices.
   Typically a single byte.
Like JVM, and unlike BGBCC, in the BS2VM, all the variables (including arguments) were held in an array of local variables (BGBCC has locals, arguments, and temporaries, as 3 separate spaces).
IIRC, BS2VM had still used variable type-tagging (like BGBCC and .NET), rather than the untyped variables with typed operators scheme (what JVM had used).
But, typed operators more make sense if you intend to interpret the stack bytecode directly, which was generally not done in my VMs (except in very early versions). Otherwise, implicitly typed operators probably make more sense.
...

Date Sujet#  Auteur
15 Dec 24 * transpiling to low level C132Thiago Adams
15 Dec 24 +* Re: transpiling to low level C10Lawrence D'Oliveiro
15 Dec 24 i`* Re: transpiling to low level C9Thiago Adams
15 Dec 24 i `* Re: transpiling to low level C8Lawrence D'Oliveiro
16 Dec 24 i  `* Re: transpiling to low level C7Thiago Adams
16 Dec 24 i   `* Re: transpiling to low level C6BGB
16 Dec 24 i    +- Re: transpiling to low level C1Thiago Adams
16 Dec 24 i    +- Re: transpiling to low level C1bart
16 Dec 24 i    +- Re: transpiling to low level C1Lawrence D'Oliveiro
16 Dec 24 i    `* Re: transpiling to low level C2Keith Thompson
17 Dec 24 i     `- Re: transpiling to low level C1bart
15 Dec 24 +* Re: transpiling to low level C3Chris M. Thomasson
15 Dec 24 i`* Re: transpiling to low level C2Thiago Adams
15 Dec 24 i `- Re: transpiling to low level C1Chris M. Thomasson
15 Dec 24 +* Re: transpiling to low level C3bart
15 Dec 24 i`* Re: transpiling to low level C2Thiago Adams
15 Dec 24 i `- Re: transpiling to low level C1Thiago Adams
15 Dec 24 `* Re: transpiling to low level C115Bonita Montero
15 Dec 24  +* Re: transpiling to low level C112bart
16 Dec 24  i`* Re: transpiling to low level C111BGB
16 Dec 24  i +- Re: transpiling to low level C1David Brown
16 Dec 24  i +* Re: transpiling to low level C22Thiago Adams
17 Dec 24  i i`* Re: transpiling to low level C21BGB
17 Dec 24  i i `* Re: transpiling to low level C20Thiago Adams
17 Dec 24  i i  +* Re: transpiling to low level C15Thiago Adams
17 Dec 24  i i  i`* Re: transpiling to low level C14Thiago Adams
17 Dec 24  i i  i `* Re: transpiling to low level C13bart
17 Dec 24  i i  i  `* Re: transpiling to low level C12Thiago Adams
17 Dec 24  i i  i   `* Re: transpiling to low level C11bart
18 Dec 24  i i  i    `* Re: transpiling to low level C10BGB
18 Dec 24  i i  i     `* Re: transpiling to low level C9Thiago Adams
19 Dec 24  i i  i      `* Re: transpiling to low level C8BGB
19 Dec 24  i i  i       `* Re: transpiling to low level C7bart
19 Dec 24  i i  i        `* Re: transpiling to low level C6BGB
19 Dec 24  i i  i         +* Re: transpiling to low level C3bart
19 Dec 24  i i  i         i`* Re: transpiling to low level C2BGB
20 Dec 24  i i  i         i `- Re: transpiling to low level C1BGB
23 Dec 24  i i  i         `* Re: transpiling to low level C2Lawrence D'Oliveiro
23 Dec 24  i i  i          `- Re: transpiling to low level C1BGB
17 Dec 24  i i  `* Re: transpiling to low level C4BGB
17 Dec 24  i i   +* Re: transpiling to low level C2Thiago Adams
18 Dec 24  i i   i`- Re: transpiling to low level C1BGB
21 Dec 24  i i   `- Re: transpiling to low level C1Lawrence D'Oliveiro
16 Dec 24  i +* Re: transpiling to low level C74Janis Papanagnou
16 Dec 24  i i+* Re: transpiling to low level C16bart
16 Dec 24  i ii`* Re: transpiling to low level C15Janis Papanagnou
17 Dec 24  i ii `* Re: transpiling to low level C14bart
17 Dec 24  i ii  +* Re: transpiling to low level C12Keith Thompson
17 Dec 24  i ii  i+- Re: transpiling to low level C1BGB
17 Dec 24  i ii  i`* Re: transpiling to low level C10bart
17 Dec 24  i ii  i +- Re: transpiling to low level C1Janis Papanagnou
17 Dec 24  i ii  i +* Re: transpiling to low level C6Waldek Hebisch
17 Dec 24  i ii  i i+* Re: transpiling to low level C4bart
18 Dec 24  i ii  i ii`* Re: transpiling to low level C3Waldek Hebisch
18 Dec 24  i ii  i ii `* Re: transpiling to low level C2bart
18 Dec 24  i ii  i ii  `- Re: transpiling to low level C1Waldek Hebisch
18 Dec 24  i ii  i i`- Re: transpiling to low level C1Janis Papanagnou
17 Dec 24  i ii  i `* Re: transpiling to low level C2Keith Thompson
18 Dec 24  i ii  i  `- Re: transpiling to low level C1Janis Papanagnou
17 Dec 24  i ii  `- Re: transpiling to low level C1Janis Papanagnou
21 Dec 24  i i`* Re: transpiling to low level C57Tim Rentsch
21 Dec 24  i i `* Re: transpiling to low level C56Janis Papanagnou
21 Dec 24  i i  +* Re: transpiling to low level C2Tim Rentsch
22 Dec 24  i i  i`- Re: transpiling to low level C1Janis Papanagnou
21 Dec 24  i i  +* Re: transpiling to low level C20Michael S
22 Dec 24  i i  i+* Re: transpiling to low level C16Janis Papanagnou
22 Dec 24  i i  ii`* Re: transpiling to low level C15Michael S
22 Dec 24  i i  ii `* Re: transpiling to low level C14Janis Papanagnou
22 Dec 24  i i  ii  `* Re: transpiling to low level C13Michael S
22 Dec 24  i i  ii   +* Re: transpiling to low level C10Janis Papanagnou
23 Dec 24  i i  ii   i`* Re: transpiling to low level C9Tim Rentsch
23 Dec 24  i i  ii   i `* Re: transpiling to low level C8Waldek Hebisch
23 Dec 24  i i  ii   i  +* Re: transpiling to low level C3David Brown
25 Dec 24  i i  ii   i  i`* Re: transpiling to low level C2BGB
28 Dec 24  i i  ii   i  i `- Re: transpiling to low level C1Tim Rentsch
4 Jan21:12  i i  ii   i  `* Re: transpiling to low level C4Tim Rentsch
4 Jan21:53  i i  ii   i   +- Re: transpiling to low level C1Chris M. Thomasson
5 Jan12:18  i i  ii   i   `* Re: transpiling to low level C2Ben Bacarisse
5 Jan18:04  i i  ii   i    `- Re: transpiling to low level C1James Kuyper
22 Dec 24  i i  ii   `* Re: transpiling to low level C2James Kuyper
22 Dec 24  i i  ii    `- Re: transpiling to low level C1Janis Papanagnou
23 Dec 24  i i  i`* Re: transpiling to low level C3Tim Rentsch
23 Dec 24  i i  i `* Re: transpiling to low level C2Chris M. Thomasson
24 Dec 24  i i  i  `- Re: transpiling to low level C1Chris M. Thomasson
22 Dec 24  i i  +* Re: transpiling to low level C27Waldek Hebisch
22 Dec 24  i i  i+* Re: transpiling to low level C2Michael S
22 Dec 24  i i  ii`- Re: transpiling to low level C1bart
22 Dec 24  i i  i+* Re: transpiling to low level C3Tim Rentsch
22 Dec 24  i i  ii`* Re: transpiling to low level C2Waldek Hebisch
4 Jan20:18  i i  ii `- Re: transpiling to low level C1Tim Rentsch
22 Dec 24  i i  i`* Re: transpiling to low level C21Janis Papanagnou
22 Dec 24  i i  i +* Re: transpiling to low level C4Michael S
23 Dec 24  i i  i i+- Re: transpiling to low level C1bart
23 Dec 24  i i  i i+- Re: transpiling to low level C1Michael S
23 Dec 24  i i  i i`- Re: transpiling to low level C1Tim Rentsch
23 Dec 24  i i  i +- Re: transpiling to low level C1Waldek Hebisch
23 Dec 24  i i  i +* Re: transpiling to low level C14David Brown
23 Dec 24  i i  i i+* Re: transpiling to low level C2bart
23 Dec 24  i i  i ii`- Re: transpiling to low level C1David Brown
23 Dec 24  i i  i i+* Re: transpiling to low level C10Michael S
23 Dec 24  i i  i ii+- Re: transpiling to low level C1David Brown
23 Dec 24  i i  i ii`* Re: transpiling to low level C8Tim Rentsch
23 Dec 24  i i  i i`- Re: transpiling to low level C1Chris M. Thomasson
23 Dec 24  i i  i `- Re: transpiling to low level C1Tim Rentsch
22 Dec 24  i i  +* Re: transpiling to low level C2Ben Bacarisse
22 Dec 24  i i  `* Re: transpiling to low level C4Kaz Kylheku
16 Dec 24  i `* Re: transpiling to low level C13Lawrence D'Oliveiro
16 Dec 24  `* Re: transpiling to low level C2Lawrence D'Oliveiro

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal