On 7/5/2024 7:24 PM, Janis Papanagnou wrote:
On 05.07.2024 15:28, BGB wrote:
>
It is not so much a dislike of multidimensional arrays as a concept, but
rather, the hair they add to the compiler and typesystem.
>
Granted, one would still have other complex types, like structs and
function pointers, so potentially the complexity savings would be limited.
>
[...]
>
Though, the major goal for this sort of thing is mostly to try to limit
the complexity required to write a compiler (as opposed to programmer
convenience).
>
Like, for example, I had tried (but failed) to write a usable C compiler
in less than 30k lines (and also ideally needing less than 4MB of RAM).
But, if the language design is simplified some, this might be a little
closer. Might still be doable, but a C compiler in 50-75k lines is much
less impressive.
>
[...]
Well, when reading that there's immediately faint memories forming
in my mind about an (only 90 page) booklet of Nicklaus Wirth with
the title "Compilerbau" (don't know about English translations). It
illustrates how to construct a simple (Pascal like) compiler based
on a Recursive Descent parser. It's development is also implying a
bootstrap mechanism, IIRC. Given the large numbers of lines of code
you mention above this might be a sensible alternative. Maybe you
can find some stimulus for your project even if you started your
own project differently. Your above stated requirements to limit
the complexity of writing a compiler is certainly addressed there
in several ways; simple LL(1) parser, bootstrapping, table driven
approaches, complete but extensible language as an example, etc.
Yeah.
My first attempt had some missteps.
I have generally used hand-written recursive descent parsers.
Early origin of my compiler:
Originally it started as an interpreter for a language I called BGBScript, which was more or less a clone of JavaScript (with some differences).
The first version of this VM (~ 2003-2005) was based around using an XML DOM implementation for an AST (inspired by XML-RPC), with an AST walking interpreter. This VM was horribly slow.
There was then an attempt to rewrite this VM to use a bytecode interpreter in place of the AST walker. At this stage, BGBCC was forked off into its own thing (~ 2007).
BGBCC was modified to parse C, initially with the idea of being a C interpreter (this part didn't work out).
On the BGBScript side of things, around the same time (~ 2007), the VM was rewritten, to reuse parts of an older Scheme interpreter I had written as a core; but partly migrating the core over to static typing and 3AC. Its syntax migrated towards being closer to ActionScript3 and Haxe.
The scheme core was written back when I was in high-school (at around the 2001/2002 timeframe); initially using a 32-bit tag-pointer format with tags in the low-order bits, something like:
x000: Object
x100: CONS Cell
xx01: Fixnum (30 bit)
xx10: Flonum (30 bit)
yy11: Various small value types
Its AST format went over to being based on CONS-cell lists (conventionally expressed using S-Expression syntax), with an internal structure very similar to a mutilated version of Scheme.
It ended up translating this to a stack bytecode, which was then unpacked into 3AC. Similarly, it ended up going from a 32-bit tagref format to a 64-bit tagref format (very similar to the one I am still using, with the tags generally held in the high order bits).
My BGBScript language ended up used in my projects up until around the 2012/2013 timeframe.
I later started work on a new language, which I had called BGBScript2 (BS2), which borrowed more from Java and C#. It was intended to be a higher performance than its predecessor. In this language, I had abandoned the use of a garbage collector (instead going over to a mix of automatic lifetimes, new/delete, and zone allocation).
Initially (~ 2014 timeframe), I started prototyping the BS2 language on BGBCC; but soon ended up abandoning this for writing a new and more specialized VM. In the BS2 VM, I had ended up going over to using "objects" to represent the AST nodes (conceptually, the AST for this language would resemble JSON).
In this VM, also, the ASTs were first compiled to a stack-based bytecode (with a design fairly similar to JVM bytecode), which was then stored into images. When the images were loaded, functions would be unpacked from the stack-based bytecode into an internal 3AC format (which was used for the interpreter).
The BS2 VM was used for a lot of the game logic in a 3D engine I wrote in the 2015/2016 timeframe (with the core 3D engine and renderer still written in C). Goal was to do like Doom 3 and use a custom language for most of the game logic and UI.
In the "BS2 in BGBCC" prototyping stage, I had experimented with doing a 3AC VM IR (partly inspired by Android's Dalvik VM). It sort of worked, and the VM was rather fast for an interpreter, but the VM code ended up suffering from "combinatorial explosion" issues (and the VM backend itself was too heavyweight). This VM effort was called "FRVM" (Fast Register VM).
Sometime around this era, I started floating the idea of a language I had called "C-Aux" which would have been essentially a C / BS2 hybrid (also targeting FRVM). This idea ended up fizzling out (and it also became obvious that FRVM was a dead end).
Around 2016, I started messing around with making BGBCC target the Hitachi SuperH / SH-4 ISA (most famously used in the Sega Dreamcast). I ended up creating a heavily modified version of SH-4, with an emulator sort of lazily copying the hardware interfaces from the Dreamcast (but, it could also run an SH-2 variant).
Initially, chunks of the former FRVM effort were repurposed for the SuperH backend (and middle end). Initial idea was to go directly from 3AC to machine code; and SH4 had used a fairly simple encoding with fixed-length 16-bit instructions.
Approach was then to write giant "switch()" blocks to emit every possible instruction. The results would then be dumped into a PE/COFF.
I had initially intended to eliminate the stack IR stage, but soon ended up reversing on this after realizing I still needed static libraries.
Loading in static libraries as C source code would have been stupid, and I didn't have a COFF linker, so most obvious choice at the time was to keep using the stack-based bytecode IR.
Soon enough, it also became obvious I still needed an assembler, so this part was written and just sort of hacked onto the side of the SH backend.
Around 2018, the hacked up SH-4 variant (BJX1) had become an ugly mess, and it soon became obvious that this would not be viable to implement in an FPGA.
The effort had internally fragmented:
BJX1-32: Basically SH4 with stuff glued on.
It was hacked to support 32 GPRs, but only on certain instructions.
It used variable length 16/32 instructions.
BJx1-64A: BJX1-32 but with GPRs expanded to 64-bits;
There were modal flag bits to bank in/out parts of the ISA;
This sucked, as there was too much mode-twiddling.
BJX1-64C:
Dropped some parts of the SH-4 ISA to make space for
64-bit instructions (eliminating the mode twiddling).
Essentially, the relation between SH4 and BJX1-64C could be compared with the difference between 32 x86 and x86-64; using basically the same instruction encodings, but with some instructions dropped/replaced.
Some encodings had been borrowed from SH-2A/SH-2E, others were new. Most of the 32-bit instructions were just sort of shoved into odd corners of the 16-bit encoding space.
Types of instructions:
Load/Store with a displacement (from SH-2A/2E);
Load with a 20-bit immediate (MOVI20S, also from SH-2A);
ALU instructions for 3R and 3RI forms (BJX1):
ADD Rm, Rt, Rn
ADD Rm, Imm8u, Rn
ADD Rm, Imm8n, Rn
(Pretty much all normal SH instructions were 2R or 2RI).
...
...
I did a partial reboot of the ISA design (which became known as BJX2):
Most of the high-level aspects of the ISA remained similar (to BJX1-64C), but the encoding scheme was entirely redesigned (and a lot of "cleanup" was made).
The BJX2 backend in BGBCC started out as a copy/paste of the BJX1 backend.
With the various changes to the instruction formats, I ended up adding layers of cruft to repack the instructions from the older format to the newer formats (still using the giant "switch()" blocks to emit instructions, and starting the encoding in terms of 16-bit instruction units). This part sucks, and is an ugly mess...
Though, in this process, BGBCC's original DOM based AST format was replaced by a partially object-based AST format, which continues to mimic XML DOM in some regards, but is generally faster and doesn't eat as much memory (things like node tags and attribute numbers being expressed as 16-bit tags rather than as strings, etc). Also generally most strings are interned rather than allocated/freed individually (except for big ones).
Typically, dumping the ASTs is still generally done as XML syntax though.
The BJX2 project has continued until now.
The design has shifted some over the years:
Initial:
16/32/48 bit instructions, 32 GPRs, 16 FPRs
Dropped FPRs, FPU moved into GPR space
Made CPU cheaper
Made various instruction sequences shorter
Gained predicated ops;
Instructions may conditionally execute based on the SR.T flag.
Replaced the 48-bit ops with Jumbo prefixes:
Instructions were then 16/32/64/96 bits.
Gained an extension to support 64 GPRs
"Baseline" form only for a subset of the ISA.
Added a newer "XG2" encoding mode.
Drops 16-bit encodings in this mode;
All of the instructions now have access to all 64 GPRs;
For many instructions, the immediate values gain 1 bit.
Added a RISC-V decoder mode.
The CPU can also run 64-bit RISC-V (currently RV64G).
X0..X31 mostly map to R0..R31, but some map to CRs;
F0..F31 map to R32..R63.
As I can note, despite GCC's cleverness, "GCC -O3" seemingly can't win with RV64G in terms of performance.
Though, in most areas, my ISA is a superset of RISC-V, the main area where RISC-V holding an advantage being (ironically) that its basic immediate fields tend to be bigger (but is counterbalanced by not having any good fallbacks). Something like a "LI Xd, Imm17s" instruction could help a fair bit here.
In my ill-fated new compiler, things were changed:
There would now be a "proper" assembler and linker;
Albeit, plan had been to use a hacked version of the WAD format for object files, rather than COFF, noting as this could allow for a simpler object format.
In the new design, the code generator would have first emitted instructions into a "binary ASM" format, which would then be assembled into machine code.
So, pipeline was generally:
Preprocess;
Parse (into ASTs);
Translate to IR;
Translate IR to Binary ASM;
Assemble into machine code (in Object Modules);
Link;
Emit EXE or DLL.
To save memory, one idea was to interleave the parser and frontend compiler, effectively working one function at a time. This could avoid needing to parse the entirely translation unit into an AST, or needing to keep IR around for the entire program.
Though, my current compiler does some optimizations (such as culling unreachable parts of the program), which likely need the ability to keep the entire program's declaration and IR space in RAM.
Janis
PS: Google found a PDF online[*] (which is obviously more extensive
than my 1st ed. book), so you can inspect it and inform yourself to
see whether it's of any use to you.
PPS: Note also that even if the book uses a Pascal-like design the
described mechanisms are not bound to that language. Use it for a
"C"-like language if you think that's a good idea. :-)
[*]
http://pascal.hansotten.com/uploads/books/Compiler%20Bau%20Nwirth%201986.pdf
May need to look into this...