On 5/20/2024 2:58 PM, MitchAlsup1 wrote:
BGB wrote:
On 5/20/2024 7:36 AM, Michael S wrote:
>
For subnormal x subnormal you don't need result of multiplication at
all. All you need to know is if it's zero or not and what sign.
Even that is needed only in non-default rounding modes and for inexact
flag in default mode.
>
For most non-tiny formats, the seeming advantage of subnormal numbers seems small, in any case.
There is, it is called Posit (or UNUM depending).
No subnormals, wider range then IEEE, more precision than IEEE (most of the time). Whether it is better overall is still a matter of debate. It is harder to implement than IEEE but
just barely.
Not interchange compatible though, so that is a drawback.
Granted, might make sense for small-format converter ops, with the working data in a larger format.
Granted, they are not necessarily the option one would go if they wanted "cheapest possible FPU that is still good enough to be usable".
Though, the point at which an FPU manages to suck badly enough that one needs to resort to software emulation to make software work, is probably a lower limit.
Luckily, "uses 754 formats, but with aggressive cost cutting" can be "good enough", and so long as they more-or-less deliver a full width mantissa, and can exactly compute exact-value calculations, most software is generally going to work.
But OTOH, if 1.0+2.0 gives 2.999999, that is, not good enough, so there is a lower limit here.
But, yeah, in any case I would almost prefer if there could be a separate/cheaper standard, probably mostly aimed at embedded/microcontroller style use-cases (rather than "general purpose"), and would likely relax the requirements a fair bit.
Say, likely target might be, say:
FADD/FSUB/FMUL;
Binary16 and Binary32 as high-priority formats;
Binary64 as optional (but nice to have);
Probably DAZ/FTZ;
Potentially allow for truncate-only rounding.
Assumption being that larger or higher precision cases would fall back to software emulation.
Though, truncate-only is pros/cons, as while it is easier to make it deterministic, it is also prone to lead to numerical drift.
Apparently it was popular (along with DAZ+FTZ) on the Nintendo N64, as apparently they had an FPU which would operate entirely in hardware if set to DAZ+FTZ with truncate, but (if it was enabled) would invoke emulation traps to deal with subnormals and rounding.
Then again, this is still a possibility: could add a "Full IEEE" flag, where, say:
Denormal inputs, underflow, or rounding carry propagation outside the low 8 bits, could be dealt with by a trap.
Then have programmers probably leave it disabled, as it isn't likely to be worth the overhead.
Could optionally have some 8-bit FP formats, but 8-bit FP is a little bit too limited for general-purpose use.
Likely main candidates being:
S.E4.F3 (Bias=7)
S.E3.F4 (Bias=7|8, ~ Unit Range)
More or less A-Law without the XOR.
Though, A-Law can also be interpreted as a ~ 12 bit integer value.
Annoyingly, exact bias depends on context for this one
(eg: 8/7/3/0)...
I had also used:
E4.F4
E4.F3.S
But, this is wonky (and the possible merit of E4.F3.S is defeated once one also needs S.E4.F3 or S.E3.F4, as these are the "actually used in the wild" formats, so may have been a mistake).
I spent some of my youth trying to push against immovable objects
(i.e., standards) don't do it, it is a waste of effort and time,
similar to putting lipstick on a pig.
It is debatable...
On the FP8 side of things, I went and added the wonky formats, but then mostly ended up just (ab)using the A-Law format instead, mostly as it is "slightly less useless" (slightly more accurate, also used a fair bit for audio processing, *, ...).
*: Though, within "RIFF WAVE" files, it is typically stored XOR'ed with 0x55, but this is easy enough to resolve (the version used by my audio hardware does not use the XOR).
But, A-Law does suffer from a lack of dynamic range.
If anything, there seems to be a need for an "Add/Subtract bias from SIMD exponents" instruction:
PADJEXP.H Rm, Imm5s, Rn //Add Imm5s to 4x Binary16 exponents
Say, the audio pathway goes PCM16 -> Binary16 -> A-Law (or, alternatively, A-Law -> Binary16 -> PCM16 for WAV loading) as this was generally the fastest way to get between them.
Well, also using A-Law in some of the AVI tests as:
Faster to deal with than ADPCM;
Less bulky than 16-bit PCM;
Better audio quality than 8-bit PCM.
Also, unlike Mu-Law, it is effectively an otherwise straightforward microfloat format.
But, I guess it is similar that one could argue for using ELF (and RV64G) because it is more standard than a "hacked to crap PE/COFF" variant and a custom ISA, but all the pain I had been dealing with recently trying to get RV64G PIE binaries generated, and then trying to figure out how to get my ELF loader working with them, doesn't really lead to warm fuzzies...
So, PE/COFF is flatter:
Up-front the static-image case is slightly more involved than with ELF;
But, the base relocs and DLL import/export mechanism is less pain to deal with than trying to deal with full relocation and symbol tables.
But, still don't have stuff working, and don't know if the relocs are mostly correct (for example: the R_RISCV_JUMP_SLOT reloc seemingly does not use the "r_addend" field from what information I can find, but the addend field tends to be non-zero for these relocs, so like, what exactly is going on here? ...).
Though, do at least have it to where the binary manages to print something before crashing (while trying to access an invalid memory address), at least implying that the system call mechanism is seemingly working (and enough is working that it manages a debug "puts()" call).
Still using my own C library. Theoretically, GCC could use GLIBC, but not exactly sure of the specifics of the system call mechanism.
Looking it up, it seems both my stuff and Linux are using the EBREAK instruction, but Linux was using "magic NOP" instructions around the EBREAK to encode the system call number (whereas TestKern was passing/returning values in registers and using normal NOPs), I guess it could be possible to detect and emulate the Linux style syscalls via the magic NOPs (would assume the syscall numbering, etc, is probably the same as for other Linux variants).
Probably faking the Linux syscalls might be an easier option though for trying to get userland software ported (albeit PIC/PIE builds would still be mandated in this case).
Well, assuming at least I manage to get the ELF loader fully working.
...