On 3/30/2025 3:14 PM, MitchAlsup1 wrote:
On Sun, 30 Mar 2025 17:47:59 +0000, BGB wrote:
On 3/30/2025 7:16 AM, Robert Finch wrote:
Just got to thinking about stack canaries. I was going to have a special
purpose register holding the canary value for testing while the program
was running. But I just realized today that it may not be needed. Canary
values could be handled by the program loader as constants, eliminating
the need for a register. Since the value is not changing while the
program is running, it could easily be a constant. This may require a
fixup record handled by the assembler / linker to indicate to the loader
to place a canary value.
>
Prolog code would just store an immediate to the stack. On return a TRAP
instruction could check for the immediate value and trap if not present.
But the process seems to require assembler / linker support.
>
>
They are mostly just a normal compiler feature IME:
Prolog stores the value;
Epilog loads it and verifies that the value is intact.
Agreed.
Using a magic number
Remove excess words.
It is possible that the magic number could have been generated by the CPU itself, or specified on the command-line by the user, or, ...
Rather than, say, the compiler coming up with a magic number for each function (say, based on a hash function or "rand()" or something).
Nothing fancy needed in the assemble or link stages.
They remain blissfully ignorant--at most they generate the magic
number, possibly at random, possibly per link-module.
Yes.
In my case, canary behavior is one of:
Use them in functions with arrays or similar (default);
Use them everywhere (optional);
Disable them entirely (also optional).
>
In my case, it is only checking 16-bit magic numbers, but mostly because
a 16-bit constant is cheaper to load into a register in this case
(single 32-bit instruction, vs a larger encoding needed for larger
values).
>
....
( Well, anyways, going off on a tangent here... )
Meanwhile, in my own goings on... It took way to much effort to figure out the specific quirks in that RIFF/WAVE headers to get Audacity to accept IMA-ADPCM output from BGBCC's resource converter.
It was like:
Media Player Classic: Yeah, fine.
VLC Media Player: Yeah, fine.
Audacity: "I have no idea what this is...".
Turns out Audacity is not happy unless:
The size of the 'fmt ' is 20 bytes, cbSize is 2,
with an additional 16 bit member specifying the samples per block.
With a 'fact' chunk, specifying the overall length of the WAV in samples.
Pretty much everything else accepted the 16-byte PCMWAVEFORMAT with no 'fact' chunk (and calculating the samples per block based on nBlockAlign).
...
Though, in this case, I am mostly poking at stuff for "Resource WADs", typically images/etc that are intended to be hidden inside EXE or DLL files (where size matters more than quality, and any sound effects are likely to be limited to under 1 second).
Say, one has a sound effect that is, say:
0.5 seconds;
8kHz
2 bits/sample
This is roughly 1kB of audio data.
I also defined a 2-bit ADPCM variant (ADLQ), and ended up using a customized simplified header for it (using a similar structure to the BMP format; where the full RIFF format adds unnecessary overhead; though the savings here are debatable).
Say:
Full RIFF in this case:
60 bytes of header.
Simplified format:
32 bytes of header.
So, saving roughly 28 bytes of overhead vs RIFF/WAVE.
Though, drops to 12 bytes in the absence of 'fact',
and using the 16-byte PCMWAVEFORMAT structure vs WAVEFORMATEX.
While theoretically 2-bit IMA ADPCM already exists for WAV, seemingly not much supports it. I also implemented support for this, as it does at least "exist in the wild".
As for the 2-bit version of IMA ADPCM:
Media Player Classic: Opens it and shows correct length,
but sounds broken.
Sounds like it is trying to play it with the 4 bit decoder.
VLC Media Player:
Basically works, though progress bar and time display is wonky.
Does figure out mostly the correct length at least.
Audacity: Claims to not understand it.
I had discovered the "adpcm-xq" library, and looked at this as a reference for the 2-bit IMA format. Since VLC plays it, I will assume my code is probably generating "mostly correct" output (at least WRT the 2b ADPCM part; possible wonk may remain in the WAVEFORMATEX header, and/or VLC is just a little buggy here).
So, thus far:
ADLQ:
Slightly higher quality;
Needs a slightly more complicated encoder for good results;
Decoder needs to ensure values don't go out of range.
Software support: Basically non existent.
Could in theory allow a cheap-ish hardware decoder.
2-bit IMA ADPCM:
Slightly simpler encoder;
More is needed on the decoder side;
Requires using multiply and range clamping.
Slightly worse audio quality ATM.
Around 0.8% bigger for mono due to header differences.
Block Headers:
ADLQ:
( 7: 0): Initial Sample, A-Law
(11: 8): Initial Step Index
( 12): Interpolation Hint
(15:13): Block Size (Log2)
IMA, 2b:
(15: 0): Initial Sample, PCM16
(23:16): Step Index
(31:24): Zero
ADLQ is 1016 samples in 256 bytes, IMA is 1008.
Sample Format is common:
00: Small Positive
01: Large Positive
10: Small Negative
11: Large Negative
Both have a scale-ratio of 1 or 3 (if normalized).
ADLQ has a narrower range of steps, with stepping of -1/+1.
Each step in ADLQ is 1/2 bit, so each 2 steps is a power of 2.
So, curve of around 1.414214**n
IMA has more steps, with a per-sample step of -1/+2.
Doesn't map cleanly to power of 2,
but around 8 steps per power of 2.
Seems to be build around a curve of 1.1**n.
But, more aggressive stepping makes sense with 2-bit samples IMO...
I went with not doing any range clamping in the decoder, so the encoder would be responsible that values don't go out of range. This does increase encoder complexity some (it needs to evaluate possible paths multiple samples in advance to make sure the path doesn't go out of range).
Potentially, 1/4-bit step with -1/+2 could have made sense. Would need a 5-bit index though to have enough dynamic range.
Both use a different strategy for stereo:
ADLQ:
Splits center and side, encoding side at 1/4 sample rate;
So, stereo increases bitrate by 25%.
2b IMA:
Encodes both the left and right channel independently.
So, stereo doubles the bitrate.
As for why 2b:
Where one cares more about size than audio quality...
8kHz : 16 kbps
11kHz: 24 kbps
16kHz: 32 kbps
Also IMHO, 16kHz at 2b/s sounds better than 8kHz at 4b/s.
At least speech is mostly still intelligible at 16 kHz.
Basic sound effects still mostly work at 8kHz though.
Like, if one needs a ding or chime or similar.
Not really any good/obvious way here to reach or go below 1 bit/sample while still preserving passable quality (2 bit/sample is the lower limit for ADPCM, only real way to go lower would be to match blocks of 4 or 8 samples to a pattern table).
Had previously been making some use of A-Law, but as can be noted, A-Law requires 8 bits per sample.
Though, ending up back at poking around with ADPCM is similar territory to my projects from a decade ago...
But, OTOH: APDCM is/was an effective format for sound effects; even if not given much credit (and seemingly most people see it as obsolescent).
As for image formats, I have a few options for low bpp, while also being cheap to decode:
BMP+CRAM: 4x4x1, limited variant of CRAM ("MS Video 1")
Roughly 2 bpp (and repurposed as a graphics format...).
BMP+CQ: 8x8x1, similar design to CRAM.
Roughly 1.25 bpp
Where, these can work well for images with no more than 2 colors per 4x4 or 8x8 pixel block (otherwise, YMMV). As it so happens, lots of UI graphics fit this pattern, and/or are essentially monochrome. CQ can deal well with monochrome or almost-monochrome graphics without too much space overhead.
Though, in some other cases, monochrome or 4-color images could be a better fit. These default to black/white or black/white/cyan/magenta, but don't necessarily need to be limited to this (but, may need to add options in BGBCC for 2/4/16 color dynamic-palette).
Say, for example, if an image is only black/white/red/blue or similar, 4-color could make sense (vs using CRAM or CQ and picking from the 256 color palette; but not being able to have different sets of colors in close proximity). Often, 16-color works, but 16-color is rather bulky if compared with CRAM or CQ.
For the CRAM and CQ formats, I ended up adding an option by which the color palette can be skipped (it is replaced by a palette hash value; OS can use the color palette associated with the corresponding hash number).
Mostly this was because, say, for 32x32 or 64x64 CRAM images, the 256 color palette was bigger than the image itself.
Note that much below 32x32, it is more compact to use hi-color BMP images than 256-color due to the color palette issue (making the optional omission for small image formats desirable).
Though, generally, these are generated with BGBCC, which can include the palette in the generated resource WAD, though TBD the best format. For the kernel, it is stored as a 256x256 indexed color bitmap (which also encodes a set of dither-aware RGB555 lookup tables).
For normal EXE/DLL files, could either store a dummy 16x16 256-color image, or more compactly, as a 16x16 hi-color image (with no dither table). Since, it is possible that it could make sense that EXEs/DLLs use a different default color palette from the OS kernel.
Note that neither PNG, JPEG, nor even QOI, are a good fit for these use cases. Wonky BMP variants are a better fit.
For SDF font images, had also used BMP, say a 256x256 8bpp image covering CP-1252, with a specialized color palette (X/Y distances are encoded in the in the pixels). Needed a full 8bpp here as CRAM doesn't work for this.
PNG compresses them, but overhead is too high; and QOI is not so effective for this scenario. Though, as 8bpp images, they do LZ compress pretty OK.
But, would not be reasonable to specially address every scenario.
...