On 4/5/2025 10:34 AM, David Brown wrote:
On 04/04/2025 21:18, Keith Thompson wrote:
David Brown <david.brown@hesbynett.no> writes:
[...]
It is easy to write code that is valid C23, using a new feature copied
from C++, but which is not valid C++ :
>
constexpr size_t N = sizeof(int);
int * p = malloc(N);
>
It's much easier than that.
>
int class;
>
Every C compiler will accept that. Every C++ compiler will reject
it. (I think the standard only requires a diagnostic, which can
be non-fatal, but I'd be surprised to see a C or C++ compiler that
generates an object file after encountering a syntax error).
>
Muttley seems to think that because, for example, "gcc -c foo.c"
will compile C code and "gcc -c foo.cpp" will compile C++ code,
the C and C++ compilers are the same compiler. In fact they're
distinct frontends with shared backend code, invoked differently
based on the source file suffix. (And "g++" is recommended for C++
code, but let's not get into that.)
>
For the same compiler to compile both C and C++, assuming you don't
unreasonably stretch the meaning of "same compiler", you'd have to
have a parser that conditionally recognizes "class" as a keyword or
as an identifier, among a huge number of other differences between
the two grammars. As far as I know, nobody does that.
Mr. Flibble's universal compiler? :-)
FWIW, BGBCC does actually handle multiple languages in the same parser (*1), but it is a hand-written recursive descent parser, so I don't need to care as much about minor syntactic differences (it can check which language is being compiled and enable/disable features as needed).
Though, language-specific behavior is a little more tricky in the tokenizer because this tends to be more performance-sensitive, but luckily the languages have mostly similar tokenization rules.
*1: Namely:
C (main language used in current projects)
C89, C99 (mostly), assorted newer features.
BGBScript (not used as much now, kinda resembles ActionScript)
It is more or less in the JavaScript/ECMAScript family.
Most directly tied back to ECMAScript3 and ActionScript3.
Later JS went in different directions.
No GC, may need to "delete" stuff to not leak.
Supports automatic and zoned memory lifetimes.
Default is to use default or auto-typing, but supports static.
"var i:int;" or "var i:Integer;" or similar.
BGBScript2
Switched to a more Java-like syntax, primarily static-typed.
Also has some features in common with C#.
Retains ability to use dynamic types or similar if desired.
No GC.
Uses a mix of manually, automatic, and zoned memory management.
Java (sorta, not really used)
Mostly covers Java 1.1 syntax;
Eg: It predates the addition of generics, etc.
Lacks a garbage collector though.
Will need an Object.delete() to not leak.
Had previously wrote a JavaME style library.
C# (also sorta, also not used)
Also limited to early C# syntax (no generics/...).
Also no GC.
Nor any of the runtime library.
C++ (also sorta)
It is similar to Embedded C++, but with namespaces.
Lacks proper multiple inheritance.
Single inheritence with abstract base classes
(the abstract classes are understood as interfaces).
Mostly doesn't support templates (partly added, mostly untested).
Roughly early 90s level features.
Pretty much none of the standard library.
In the case of BS2 / Java / C# / EC++, there was enough overlap to where most of the same mechanisms could address the languages with minor syntactic reskinning. Most of the rest is common among C-family languages.
Going further up the ladders (for Java, C#, or C++) would be a big uphill battle, and not really worth it.
Doesn't make as much sense to advertise support for these languages if limited to very limited forms.
Can note that the original ancestor language (what BGBCC was before it was a C compiler) was originally BGBScript, which, as can be noted, was originally inspired by JavaScript, which along with XML, were hot new things back when I was in high-school, where the origins of what became BGBCC got started, though BGBCC proper didn't get started until I was taking college classes. However, it didn't initially live up to my hopes (*2), so I had partly shelved it until my current project got started (nearly a decade ago now), where BGBCC has ended up playing a much more prominent role.
*2: Early on, I had wanted to use it like a dynamic C compiler for scripting inside of my 3D engine (partly inspired by Quake 3 and Doom 3; before pivoting over to copying Minecraft). I soon realized that C wasn't a great language for this sort of scripting, and returned at the time mostly to using my JS based language (which had since forked off into a separate VM), which then started pulling some features from ActionScript.
I then later stopped working on this 3D engine, and wrote another shorter-lived 3D engine with the purpose of it being smaller, faster, and less complicated. For this, had created the BS2 language, but in some ways BS2 was less good as a scripting language than its predecessor (but, was better at things like "implementing stuff", so a lot of the higher-level parts of the engine ended up using this; with mostly the renderer, VFS, and VM and similar, remaining in C).
Then I switched to doing CPU ISA design and FPGA stuff.
Wrote another smaller 3rd Minecraft like engine, but mostly to try to be lightweight enough to run on a 50 MHz CPU and under 60MB of RAM (and using raycasts for visibility determination rather than drawing everything within a given radius).
This one was plain C, and used a similar chunk format to the prior, just mostly switching to a smaller region size (and switching from a Rice-coded LZ scheme to a byte-oriented LZ).
Where:
Chunks were 16x16x16 with an index into a table of blocks.
We can assume fewer that 256 unique blocks per chunk.
Each block was 64 bits in 2nd engine, 32 bits in 3rd engine.
In storage, these are LZ compressed and stored into a "region file".
My 2nd engine used 16x16x16 chunks per region (256x256x256 cube).
Chunk LZ: AdRice+STF based.
My 3rd engine used 8x8x8 chunks per region (128x128x128 cube).
Chunk LZ: RP2 (byte based, sorta like LZ4)
Regions being theoretically in a 3D grid, but 2D in practice.
2nd engine: generated/drew meshes for every chunk in-radius.
Every in-radius chunk also was fully loaded into RAM.
3rd engine: Used raycasts, only built mesh for visible terrain.
Mostly only visible/active chunks get fully loaded.
If no mobs are there, and no raycast reaches it, it is not loaded.
Data Storage:
2nd engine: EXPAK (custom, like simplified ZIP, Deflate based).
Images: BMP, custom codecs.
Audio: WAV (modified IMA ADPCM)
3rd engine: WAD4 (like WAD2, but 32-byte names, and directories).
Images: DDS (mostly DXT1), BMP
Audio: WAV (A-Law, IMA ADPCM)
Contrast, 1st engine:
Chunks were 16x16x128, regions 32x32, mostly like older Minecraft.
Chunks were compressed with RLEW (same algo as Wolf3D and ROTT).
Data was stored in ZIP;
Textures were mostly stored in a modified T.81 JPEG
(extended with optional Alpha channel).
Animated textures used repurposed AVI (sometimes MJPG).
Audio: Also ADPCM? (I forget now.)
Memory use and performance for first engine was terrible though.
Though, can mostly note that similar tech does seem to come up between one project and another.
Note that, technically, the C mode supports dynamic types and similar, but with non-standard syntax, eg:
__variant obj = (__variant) { .x=3, .y=4 };
For a JavaScript style ex-nihilo object.
Though, there is an implicit downside:
Using dynamic types will come with a performance penalty, as there is basically no way to avoid this stuff being slow (if static types are used, it is faster).
Most scenarios where people tended to use generics, I had used dynamic types, and my languages had not used generics.
The native class/instance objects are also by-reference by default.
In the C++ case, they are faked with automatic-scoped objects.
POD types may decay into C style structs.
Can note that the general handling of automatic scoping is effectively:
Objects are heap-allocated, and added to a linked list.
When the owner frame exists, everything in the linked-list is deleted;
Backing memory generally comes from "malloc()".
Though, in this case, the malloc implementation also supports type-tags and zones.
Note that things like C99 VLA's sorta exist, but are kinda playing with fire here (effectively, these are dynamic heap-allocated arrays).
So, in C syntax:
int arr[n];
Is the equivalent of (BS2 syntax):
int[] arr=new! int[n];
Technically, also "alloca()" uses the same mechanism (alloca is heap backed). Generally also used for any large arrays or structs, mostly because in my runtime I am using 128K as the default stack size (so, if you try to create a local array that is 16K or more, or the frame-size exceeds 16K, compiler gives a warning and it goes to the heap).
Usually though, 128K is sufficient for the stack (unless recursion gets out of control or program is using lots of stack arrays).
Note that, while "malloc()" backed memory isn't the fastest possible option, it isn't that slow either (in most cases, the allocation size is quantized and then served from a free-list).
My general heap allocator strategy looking sorta like:
Small objects (under 1K or so):
Rounded up to a multiple of 16 bytes;
If corresponding free-list is empty, allocated using a cell-bitmap;
Medium objects (up to 128K):
Size is padded up an turned into an E5.M2 (IIRC) micro-float;
This is also used as the free-list index;
If this fails, uses a linked-list / memory-block allocator.
This part resembles a more traditional malloc().
Large objects:
Generally allocated as pages (falls back to mmap or similar).
When doing an allocator on a more modern system (with multithreading and similar), it generally makes sense to have free-lists per-thread, allowing allocations to be served without needing to use a mutex/critical-section, which is mostly reserved for cases where the free-list misses.
I generally don't do eager block coalescing, as most often, if a given size chunk of memory is freed, it will soon be needed again. Instead, it makes sense to delay this option until there would be a potential need to expand the heap (then a process is used to reclaim everything from the corresponding free-lists, coalesce free space, and if this fails to find something, expand the heap).
Though, some care is needed with multithreading, if dealing with targets with a weak memory model (and multiple cores). You can't just null-out the per-thread free-lists, as the threads will not necessarily see the changed memory (until locking/unlocking a mutex, which may involve flushing the L1 caches). Generally, it is necessary to set a semaphore in a way that the thread can see, where the thread can then evict its free-lists (on the next alloc/free call). Though, the L1 flush and similar can be skipped if everything is running on a single-core, ...
Though, my compiler mostly targets my own ISA (well, along with RISC-V and some extended RISC-V variants), but as noted, generally assumes the use of a weak memory model (my FPGA implementation uses a weak model).
My ISA's tend to be faster than RISC-V, but with a bunch of extensions, RV64 can be made more competitive (the main heavyweight features being: register-indexed load/store ops, large-immediate prefixes, and load/store pair).
Can also note that regarding performance in RISC-V, my compiler seems to be mostly competitive with GCC in this case (though, GCC still holds performance advantage if both are targeting plain RV64G). Though there is also a practical difference in this case that GCC targets ELF whereas mine targets PE/COFF.
The attempt to add Verilog support is a little harder, as Verilog's execution model is very different from the C family languages (it is organized into modules and mostly driven by clock-edges and similar).
Implicitly, some aspects of Verilog have leaked over to the other languages (things may bleed over unless there is specific reason to exclude them from the other languages).
Though, some of the features are useful, as explicit bit-manipulation notation is easier to optimize than shifts and masks (though, granted, is limited to constant bit values; and, if supported, the advantages would effectively disintegrate with non-constant values).
Also, performance is helped some here by having added some special instructions to my ISA to help with bitfield operations (where, code using Verilog style bit-manipulation is helped notably by things like bitfield instructions). Though, in my case, I came up with a single instruction that can do extract/insert and bitfield move (vs, say, ARM having separate instructions for each scenario).
Though, one limiting factor is that currently it can only do one bitfield move at a time (and Verilog style code will quickly saturate things with bitfield moves).
Though, for sake of wanting to do Verilog sims on it, may require putting effort again into getting my emulators JIT back into working order (so it can be used as a semi-fast VM). It broke a while ago and I didn't bother fixing it, as the interpreter was still fast enough to keep up with real-time emulation vs a 50 MHz FPGA version on my PC.
Likely, running Verilog will still be painfully slow in any case...
I guess, at least if it isn't orders of magnitude slower than Verilator (which, for a partial sim of the CPU core, runs between 0.1 to 0.2 MHz; a full SoC simulation being more around 0.02 MHz), possibly good enough.
Main reason this is tempting is because the ability to debug stuff generated by Verilator is basically non-existent (even if compiling the generated C++ with "-g" and similar, and uses GDB, one doesn't get that far as it had basically the original Verilog has been turned into confetti).
At least, if done well, there could be some hope in theory of being able to source-level debug this stuff.
Where, it doesn't take much to start running into the limitations of trying to debug stuff with "$display()" statements and similar (and where the bug is something like "something somewhere in the CPU is misbehaving").
One starts really wishing for the ability to set a breakpoint at a certain instruction encoding or similar and then start stepping line-by-line and inspecting registers like what one can do in something like Visual studio.
Granted, this also means likely needing to implement a Visual Studio style debugger (my existing implementation has a dump on VM exit, and at runtime, an internal GDB style debugger triggered by special keyboard shortcuts).
>
You and I know he's wrong. Arguing with him is a waste of everyone's
time.
>
Yes, it seems that way. Sometimes he makes posts that are worth answering or correcting, but the threads with him inevitably go downhill.
This is the sense I am getting as well...
I can only hope I come off a little better, but I don't know sometimes.
...