Newsportal USENET - Re: transpiling to low level C

On 12/19/2024 2:36 PM, BGB wrote:

On 12/19/2024 5:27 AM, bart wrote:
On 19/12/2024 05:46, BGB wrote:
On 12/18/2024 6:35 PM, bart wrote:
On 19/12/2024 00:27, BGB wrote:
>
By-Value Structs smaller than 16 bytes are passed as-if they were a 64 or 128 bit integer type (as a single register or as a register pair, with a layout matching their in-memory representation).
>
...
>
>
But, yeah, at the IL level, one could potentially eliminate structs and arrays as a separate construct, and instead have bare pointers and a generic "reserve a blob of bytes in the frame and initialize this pointer to point to it" operator (with the business end of this operator happening in the function prolog).
>
The problem with this, that I mentioned elsewhere, is how well it would work with SYS V ABI, since the rules for structs are complex, and apparently recursive.
>
Having just a block of bytes might not be enough.
>
In my case, I am not bothering with the SysV style ABI's (well, along with there not being any x86 or x86-64 target...).
>
I'd imagine it's worse with ARM targets as there are so many more registers to try and deconstruct structs into.
>
Not messed much with the ARM64 ABI or similar, but I will draw the line in the sand somewhere.
Struct passing/return is enough of an edge case that one can just sort of declare it "no go" between compilers with "mostly but not strictly compatible" ABIs.

>
For my ISA, it is a custom ABI, but follows mostly similar rules to some of the other "Microsoft style" ABIs (where, I have noted that across multiple targets, MS tools have tended to use similar ABI designs).
>
When you do your own thing, it's easy.
>
In the 1980s, I didn't need to worry about call conventions used for other software, since there /was/ no other software! I had to write everything, save for the odd calls to DOS which used some form of SYSCALL.
>
Then, arrays and structs were actually passed and returned by value (not via hidden references), by copying the data to and from the stack.
>
However, I don't recall ever using the feature, as I considered it efficient. I always used explicit references in my code.
>
Most of the time, one is passing/returning structures as pointers, and not by value.
By value structures are usually small.
When a structure is not small, it is both simpler to implement, and usually faster, to internally pass it by reference.
If you pass a large structure to a function by value, via an on-stack copy, and the function assigns it to another location (say, a global variable):
   Pass by reference: Only a single copy operation is needed;
   Pass by value on-stack: At least two copy operations are needed.
One also needs to reserve enough space in the function arguments list to hold any structures passed, which could be bad if they are potentially large.
   But, on my ISA, ABI is sort of like:
   R4 ..R7 : Arg0 ..Arg3
   R20..R23: Arg4 ..Arg7
   R36..R39: Arg8 ..Arg11 (optional)
   R52..R55: Arg12..Arg15 (optional)
Return Value:
   R2, R3:R2 (128 bit)
   R2 is also used to pass in the return value pointer.
'this':
   Generally passed in either R3 or R18, depending on ABI variant.
Where, callee-save:
   R8 ..R14, R24..R31,
   R40..R47, R56..R63
   R15=SP
Non-saved scratch:
   R2 ..R7 , R16..R23,
   R32..R39, R48..R55
Arguments beyond the first 8/16 register arguments are passed on stack. In this case, a spill space for the first 8/16 arguments (64 or 128 bytes) is provided on stack before the first non-register argument.
If the function accepts a fixed number of arguments and the number of argument registers is 8 or less, spill space need only be provided for the first 8 arguments (calling vararg functions will always reserve space for 16 registers in the 16-register ABI). This spill space effectively belongs to the callee rather than the caller.
Structures (by value):
   1.. 8 bytes: Passed in a single register
   9..16 bytes: Passed in a pair, padded to the next even pair
   17+: Pass as a reference.
Things like 128-bit types are also passed/returned in register pairs.
   Contrast, RV ABI:
   X10..X17 are used for arguments;
   No spill space is provided;
   ...
My variant uses similar rules to my own ABI for passing/returning structures, with:
   X28, structure return pointer
   X29, 'this'
Normal return values go into X10 or X11:X10.
   Note that in both ABI's, passing 'this' in a register would mean that class instances and COM objects are not equivalent (COM object methods always pass 'this' as the first argument).
The 'this' register is implicitly also used by lambdas to pass in the pointer to the captured bindings area (which mostly resembles a structure containing each variable captured by the lambda).
Can note though that in this case, capturing a binding by reference means the lambda is limited to automatic lifetime (non-automatic lambdas may only capture by value). In this case, capture by value is the default.

For my compiler targeting RISC-V, it uses a variation of RV's ABI rules.
Argument passing is basically similar, but struct pass/return is different; and it passes floating-point values in GPRs (and, in my own ISA, all floating-point values use GPRs, as there are no FPU registers; though FPU registers do exist for RISC-V).
>
Supporting C's variadic functions, which is needed for many languages when calling C across an FFI, usually requires different rules. On Win64 ABI for example, by passing low variadic arguments in both GPRs and FPU registers.
>
I simplified things by assuming only GPRs are used.

/Implementing/ variadic functions (which only occurs if implementing C) is another headache if it has to work with the ABI (which can be assumed for a non-static function).
>
I barely have a working solution for Win64 ABI, which needs to be done via stdarg.h, but wouldn't have a clue how to do it for SYS V.
>
(Even Win64 has problems, as it assumes a downward-growing stack; in my IL interpreter, the stack grows upwards!)
>
Most targets use a downward growing stack.
Mine is no exception here...

Not likely a huge issue as one is unlikely to use ELF and PE/COFF in the same program.
>
>
For the "OS" that runs on my CPU core, it is natively using PE/COFF, but
>
That's interesting: you deliberately used one of the most complex file formats around, when you could have devised your own?
>
For what I wanted, I would have mostly needed to recreate most of the same functionality as PE/COFF anyways.
When one considers the entire loading process (including DLLs/SOs), then PE/COFF loading is actually simpler than ELF loading (ELF subjects the loader to needing to deal with symbol and relocation tables), similar to PIE loading.

My wording there sucked...
PIE loading is the same as the case for ELF shared object loading, so is fairly complex.
For normal loading, they try to make it simpler for the kernel loader by having a special "interpreter" program deal with it. The process it then uses to bootstrap itself is rather convoluted.

Things like the MZ stub are optional in my case, and mostly ignored if present (in my LZ compressed PE variants, the MZ stub is omitted entirely).

My loader will accept multiple sub-variants:
   With MZ stub (original format);
   Without MZ stub (but uncompressed);
   With LZ4 compression (no MZ stub allowed).
The format for the no-stub case is basically the same as the with-stub case, except that the stub is absent and thus the 'PE' sig is still present.
Note that in my variants, omitting the MZ stub does cause it to change to a different checksum algorithm (the original PE/COFF checksum being unacceptably weak).

I had at one point considered doing a custom format resembling LZ compressed MachO, but ended up not bothering, as it wouldn't have really saved anything over LZ compressed PE/COFF.

The core process is still:
Read stuff into memory;
Apply post-load fixups.
This part of the process was essentially unavoidable.

Some "unneeded cruft" like the Resource Section was discarded, mostly replaced by an embedded WAD2 image. The header was modified some to allow for backwards compatibility with the Windows format (mostly creating a dummy header in the original format that points to the WAD2 directory).

Note that the change of resource section format was more because the original approach to the resource section made little sense to me.
Identifying things with short names made a lot more sense than magic numbers.
The WAD approach Worked for Doom and similar, probably sufficient for things like inline bitmap images and icons.

Idea is that icons, bitmaps, and other things, would mostly be held in WAD lumps. Though, resources which may be accessed via symbols in the EXE/DLL need to be stored uncompressed (where "__rsrc_lumpname" may be used to access the contents of resource-section lumps as an extern symbol).

Note that it can also load blobs of text or binary data.
Though, BGBCC provides less in terms of format converters for arbitrary data.
A special text format is used both to define files to pull into the resource section (and what lump name to use), as well as format conversions to apply.

Say, for example:
   extern byte __rsrc_mybitmap[]; //resolves to a DIB/BMP or similar
For now, resource formats:
   Images:
     BMP (various settings)
       4, 8, and 16 bpp typical
       Supports a non-standard 16-bpp alpha-blended mode (*1).
       Supports non-standard 16 color and 256 color with transparent.
       Supports CRAM BMP as well (2 bpp)
     QOI (assumes RGBA32, nominally lossless)
       QOI is a semi-simplistic non-entropy-coded format.
       Can give PNG-like compression in some cases.
       Reasonably fast/cheap to decode.
     LCIF, custom lossy format, color-cell compression.
       OK Q/bpp but mostly only on the low-end.
       Resembles a QOI+CRAM hybrid.
     UPIC, lossy or lossless, JPEG-like (*2)
*1:
   0rrrrrgggggbbbbb Normal/Opaque
   1rrrraggggabbbba With 3 bit alpha (4b/ch RGB).
For 16 and 256 color, a variant is supported with a transparent color.
Generally the high intensity magenta is reused as the transparent color. This is encoded in the color palette (if all colors apart from one have the alpha bits set to FF, and one color has 00, then that color is assumed to be a transparent color).
CRAM bpp: Uses a limited form of the 8-bit CRAM format:
   16 bits, 4x4 pixels, 1 bit per pixel
   2x 8 bits: Color Endpoints
The rest of the format being unsupported, so it can simply assume a fixed 32-bits per 4x4 pixel cell.

There being cases where one may want this...
If an image doesn't have more than 2 colors per 4x4 cell, it may give an acceptable image (and is often less space than 16-color).
Though, for small images, 16 color may use less space due to a smaller color palette (but, in theory, could add a special case to allow omitting the color palette when it is the default palette).
Say:
biBitCount=8, biClrUsed=0, biClrImportant=256
Encoding a special "palette is absent, use fixed OS palette" case.
As the BMP format burns 1K just to encode a 256-color palette.

*2: The UPIC format is structurally similar to JPEG, but:
   Uses TLV packaging (vs FF-escape tagging);
   Uses Rice coding (vs Huffman)
   Uses Z3.V5 VLC, vs Z4.V4
   Uses Block-Haar and RCT
     Vs DCT and YCbCr.
   Supports an alpha channel.
     Y    1       (*2A)
     YA   1:1     (*2A)
     YUV 4:2:0
     YUV 4:4:4   (*2A)
     YUVA 4:2:0:4
     YUVA 4:4:4:4 (*2A)
   *2A: May be used in the lossless modes, depending on image.
VLC coding resembles Deflate's natch distance encoding, with sign-folded values. Runs of zero coefficients have a shorter limit, but similar. Like with JPEG, an 0x00 symbol encodes an early EOB.

^ match. Also, UPIC is a custom format.
Add context:
Actually, it is using an entropy coding scheme I call STF+AdRice:
   Swap towards front, with Adaptive Rice Coding.
The Rice coding parameter (k) is adapted based on Q:
   0: k--;
   1: no change;
   2..7: k++
   8: k++; Symbol index encoded as a raw 8 bits.
Symbols are encoded as indices into a table. Whenever an index is encoded, the symbol swaps places with the symbol at (I*15)/16, causing more commonly used symbols to migrate towards 0.
Theoretically, the decoding process is more complex than a table-driven static Huffman decoder (as well as worse compression), but:
   Less memory is needed;
   Faster to initialize;
   On average, it is speed competitive.
   Lookup table initialization for static Huffman is expensive;
   Decode speed hindered by high L1 miss rates.
With a 15-bit symbol-length limit, Huffman has a very high L1 miss rate. Generally, to be fast, one needs to impose a 12 or 13 bit symbol length limit, reducing compression, but greatly reducing the number of L1 misses. Though, 12 bits is a lower limit in practice (going much less than this, and Huffman coding becomes ineffective).

In tests, on my main PC:
   Vs JPEG: It is a little faster
     Q/bpp is similar, better/worse depends on image.
       Slightly worse on photos, but "similar".
       Generally somewhat better on artificial images.
   Vs PNG:
     Faster to decode (with less memory overhead);
     Better compression on many images (particularly photo-like).
Note that UPIC was designed to not require any large intermediate buffers, so will decode directly to an RGB555 or RGBA32 output buffer (decoding happens in terms of individual 16x16 pixel macroblocks).
It was designed to be moderately fast and to try to minimize memory overhead for decoding (vs either PNG or JPEG, which need a more significant chunk of working memory to decode).
Block-Haar is a Haar transform made to fit the same 8x8 pixel blocks as DCT, where Haar maps (A,B)->(C,D):
   C=(A+B)/2 (*: X/2 here being defined as (X>>1))
   D=A-B
But, can be reversed exactly, IIRC:
   B=C-(D/2)
   A=B+D
By doing multiple stages of Haar transform, one can build an 8-pixel version, and then use horizontal and vertical transforms for an 8x8 block. It is computationally fairly cheap, and lossless.
The Walsh-Hadamard transform can give similar properties, but generally involves a few extra steps that make it more computationally expensive.
It is possible to use a lifting transform to make a Reversible DCT, but it is slow...

Also, the code-size footprint for UPIC is smaller than a JPEG decoder.

BGBCC accepts JPEG and PNG for input and can convert them to BMP/QOI/ UPIC as needed.
For audio storage, generally using the RIFF WAV format. For bulk audio, both A-Law and IMA ADPCM work OK. Granted, IMA ADPCM is not space efficient for stereo, but mostly OK for mono (most common use-case for sound effects).

This isn't used much yet in this project.
In general, for other cases where I use audio, 16kHz is a typical default.
Where:
   8 and 11 kHz sound poor.
   Also 8-bit linear PCM sounds poor.
I am less a fan of MP3:
   Very complex decoder;
   Much under 96 or 128 kbps, has very obvious audio distortions...
   At lower bitrates, the audio quality is decidedly unpleasant.
   IMHO: 16 kHz ADPCM sounds better than 64 kbps MP3.
Not sure why it is so possible, when, as noted, at lower bitrates it sounds pretty broken (but, then again, it mostly sounds much fine at 128 kbps or beyond, so dunno).
ADPCM's property of sounding tinny is still preferable to sounding like one is rattling a steel can full of broken glass, IMHO.
Did experimentally create an MP3-like audio codec (but much simpler), also using Block-Haar (rather than MDCT), and reused some amount of code from UPIC, which seems to avoid some of MP3's more obvious artifacts. But, the design did have a few of its own issues (might need to revisit later).
Mostly, it uses a half-cubic spline to approximate the low-frequency components (and try to reduce blocking artifacts; the spline is subtracted out so only higher frequency components use the Block-Haar), but seemingly the spline was too coarse (one sample per block), and I would likely need a higher effective sampling rate for the spline to avoid blocking artifacts in some cases (mostly, with sounds at roughly the same frequency as the block size effectively resulting in square waves, which sound bad).

I did exactly that at a period when my generated DLLs were buggy for some reason (it turned out to be two reasons). I created a simple dynamic library format of my own. Then I found the same format worked also for executables.
>
But I needed a loader program to run them, as Windows obviously didn't understand the format. Such a program can be written in 800 lines of C, and can dynamically libraries in both my format, and proper DLLs (not the buggy ones I generated!).
>
A hello-world program is under 300 bytes compared with 2 or
2.5KB of EXE. And the format is portable to Linux, so no need to generate ELF (but I haven't tried). Plus the format might be transparent to AV software (haven't tried that either).
>
OK.
By design, my PEL format (PE+LZ) isn't going to get under 2K (1K for headers, 1K for LZ'ed sections).
But, usually this is not a problem.

Date	Sujet	#	Auteur
15 Dec 24	transpiling to low level C	139	Thiago Adams
15 Dec 24	Re: transpiling to low level C	10	Lawrence D'Oliveiro
15 Dec 24	Re: transpiling to low level C	9	Thiago Adams
15 Dec 24	Re: transpiling to low level C	8	Lawrence D'Oliveiro
16 Dec 24	Re: transpiling to low level C	7	Thiago Adams
16 Dec 24	Re: transpiling to low level C	6	BGB
16 Dec 24	Re: transpiling to low level C	1	Thiago Adams
16 Dec 24	Re: transpiling to low level C	1	bart
16 Dec 24	Re: transpiling to low level C	1	Lawrence D'Oliveiro
16 Dec 24	Re: transpiling to low level C	2	Keith Thompson
17 Dec 24	Re: transpiling to low level C	1	bart
15 Dec 24	Re: transpiling to low level C	5	Chris M. Thomasson
15 Dec 24	Re: transpiling to low level C	4	Thiago Adams
15 Dec 24	Re: transpiling to low level C	3	Chris M. Thomasson
16 Feb 25	Re: transpiling to low level C	2	Chris M. Thomasson
16 Feb 25	USENET and spam (Was: Re: transpiling to low level C)	1	Salvador Mirzo
15 Dec 24	Re: transpiling to low level C	3	bart
15 Dec 24	Re: transpiling to low level C	2	Thiago Adams
15 Dec 24	Re: transpiling to low level C	1	Thiago Adams
15 Dec 24	Re: transpiling to low level C	118	Bonita Montero
15 Dec 24	Re: transpiling to low level C	115	bart
16 Dec 24	Re: transpiling to low level C	114	BGB
16 Dec 24	Re: transpiling to low level C	1	David Brown
16 Dec 24	Re: transpiling to low level C	22	Thiago Adams
17 Dec 24	Re: transpiling to low level C	21	BGB
17 Dec 24	Re: transpiling to low level C	20	Thiago Adams
17 Dec 24	Re: transpiling to low level C	15	Thiago Adams
17 Dec 24	Re: transpiling to low level C	14	Thiago Adams
17 Dec 24	Re: transpiling to low level C	13	bart
17 Dec 24	Re: transpiling to low level C	12	Thiago Adams
17 Dec 24	Re: transpiling to low level C	11	bart
18 Dec 24	Re: transpiling to low level C	10	BGB
18 Dec 24	Re: transpiling to low level C	9	Thiago Adams
19 Dec 24	Re: transpiling to low level C	8	BGB
19 Dec 24	Re: transpiling to low level C	7	bart
19 Dec 24	Re: transpiling to low level C	6	BGB
19 Dec 24	Re: transpiling to low level C	3	bart
19 Dec 24	Re: transpiling to low level C	2	BGB
20 Dec 24	Re: transpiling to low level C	1	BGB
23 Dec 24	Re: transpiling to low level C	2	Lawrence D'Oliveiro
23 Dec 24	Re: transpiling to low level C	1	BGB
17 Dec 24	Re: transpiling to low level C	4	BGB
17 Dec 24	Re: transpiling to low level C	2	Thiago Adams
18 Dec 24	Re: transpiling to low level C	1	BGB
21 Dec 24	Re: transpiling to low level C	1	Lawrence D'Oliveiro
16 Dec 24	Re: transpiling to low level C	77	Janis Papanagnou
16 Dec 24	Re: transpiling to low level C	16	bart
16 Dec 24	Re: transpiling to low level C	15	Janis Papanagnou
17 Dec 24	Re: transpiling to low level C	14	bart
17 Dec 24	Re: transpiling to low level C	12	Keith Thompson
17 Dec 24	Re: transpiling to low level C	1	BGB
17 Dec 24	Re: transpiling to low level C	10	bart
17 Dec 24	Re: transpiling to low level C	1	Janis Papanagnou
17 Dec 24	Re: transpiling to low level C	6	Waldek Hebisch
17 Dec 24	Re: transpiling to low level C	4	bart
18 Dec 24	Re: transpiling to low level C	3	Waldek Hebisch
18 Dec 24	Re: transpiling to low level C	2	bart
18 Dec 24	Re: transpiling to low level C	1	Waldek Hebisch
18 Dec 24	Re: transpiling to low level C	1	Janis Papanagnou
17 Dec 24	Re: transpiling to low level C	2	Keith Thompson
18 Dec 24	Re: transpiling to low level C	1	Janis Papanagnou
17 Dec 24	Re: transpiling to low level C	1	Janis Papanagnou
21 Dec 24	Re: transpiling to low level C	60	Tim Rentsch
21 Dec 24	Re: transpiling to low level C	59	Janis Papanagnou
21 Dec 24	Re: transpiling to low level C	3	Tim Rentsch
22 Dec 24	Re: transpiling to low level C	2	Janis Papanagnou
13 Jan 25	Re: transpiling to low level C	1	Tim Rentsch
21 Dec 24	Re: transpiling to low level C	22	Michael S
22 Dec 24	Re: transpiling to low level C	18	Janis Papanagnou
22 Dec 24	Re: transpiling to low level C	17	Michael S
22 Dec 24	Re: transpiling to low level C	16	Janis Papanagnou
22 Dec 24	Re: transpiling to low level C	15	Michael S
22 Dec 24	Re: transpiling to low level C	11	Janis Papanagnou
23 Dec 24	Re: transpiling to low level C	10	Tim Rentsch
23 Dec 24	Re: transpiling to low level C	9	Waldek Hebisch
23 Dec 24	Re: transpiling to low level C	3	David Brown
25 Dec 24	Re: transpiling to low level C	2	BGB
28 Dec 24	Re: transpiling to low level C	1	Tim Rentsch
4 Jan 25	Re: transpiling to low level C	5	Tim Rentsch
4 Jan 25	Re: transpiling to low level C	1	Chris M. Thomasson
5 Jan 25	Re: transpiling to low level C	3	Ben Bacarisse
5 Jan 25	Re: transpiling to low level C	1	James Kuyper
8 Jan 25	Re: transpiling to low level C	1	Tim Rentsch
22 Dec 24	Re: transpiling to low level C	2	James Kuyper
22 Dec 24	Re: transpiling to low level C	1	Janis Papanagnou
6 Jun 25	Re: transpiling to low level C	1	Tim Rentsch
23 Dec 24	Re: transpiling to low level C	3	Tim Rentsch
23 Dec 24	Re: transpiling to low level C	2	Chris M. Thomasson
24 Dec 24	Re: transpiling to low level C	1	Chris M. Thomasson
22 Dec 24	Re: transpiling to low level C	27	Waldek Hebisch
22 Dec 24	Re: transpiling to low level C	2	Michael S
22 Dec 24	Re: transpiling to low level C	1	bart
22 Dec 24	Re: transpiling to low level C	3	Tim Rentsch
22 Dec 24	Re: transpiling to low level C	2	Waldek Hebisch
4 Jan 25	Re: transpiling to low level C	1	Tim Rentsch
22 Dec 24	Re: transpiling to low level C	21	Janis Papanagnou
22 Dec 24	Re: transpiling to low level C	4	Michael S
23 Dec 24	Re: transpiling to low level C	1	bart
23 Dec 24	Re: transpiling to low level C	1	Michael S
23 Dec 24	Re: transpiling to low level C	1	Tim Rentsch
23 Dec 24	Re: transpiling to low level C	1	Waldek Hebisch
23 Dec 24	Re: transpiling to low level C	14	David Brown
23 Dec 24	Re: transpiling to low level C	1	Tim Rentsch
22 Dec 24	Re: transpiling to low level C	2	Ben Bacarisse
22 Dec 24	Re: transpiling to low level C	4	Kaz Kylheku
16 Dec 24	Re: transpiling to low level C	13	Lawrence D'Oliveiro
16 Dec 24	Re: transpiling to low level C	2	Lawrence D'Oliveiro
9 Feb 25	Re: transpiling to low level C	2	User One