On 8/30/2024 4:38 AM, Thomas Koenig wrote:
John Dallman <jgd@cix.co.uk> schrieb:
In article <vaqgtl$3526$1@dont-email.me>, cr88192@gmail.com (BGB) wrote:
>
On 8/29/2024 11:23 AM, MitchAlsup1 wrote:
>
With differing instructions, how does a software vendor write
software such that it can run near optimally on any implementation?
>
They presumably target whatever is common, or the least common
denominator (such as RV64G or RV64GC), and settle with "probably
good enough"...
>
ISVs can be proactive or passive about adopting a new ISA.
What is an ISV? I assume "SV" is for "software vendor", but what
does the I stand for?
[...]
For the most part, in a project like this, one could largely ignore that commercial software exists as a thing.
On a PC, one needs to care, since the OS and games and similar are generally not FOSS. Where, the FOSS community tends not really to result in "good" games, just sort of lazy clones of other more popular commercial games (often built on forks of the Quake 1/2/3 engines).
But, then FOSS ends up winning for compilers and tools, but Linux has yet to beat windows in terms of general usability as a main OS (though, MS is seemingly going hard right now at trying to convince me to jump ship...).
Variant ISAs create fear, uncertainty and doubt, and that means delay.
ISA promotors fear delay, because their investors will run out of
patience.
Which makes me wonder why companies such as Intel introduce new
instructions all the time. For people who compile their own code
(scientists and engineers) that can be OK, they can just use
-march=native (or equivalent), and it can even make sense to have
architecture-optimized core libraries such as BLAS, or switch on
availability of features such as AVX512 (but that again has many
sub-features and highly different performance characteristics,
depending on the micro-arch).
Feature fragmentation has always been a thing.
Now we have AVX512, which may or may not exist;
Or AVX, which may be fast, or annoyingly slow.
On the Zen+ I am running on my main PC, AVX isn't particularly fast.
Back in the past, it was "MMX" vs "SSE" vs "3DNow!", but SSE2 basically "fixed" this mess (but, annoying if one still sometimes uses a system that lacks SSE2).
And, pretty much any software that actually used "3DNow!" is no longer usable (since Intel never adopted it, and AMD dropped it in Bulldozer and Zen).
...
The main problem is more when feature fragmentation disrupts the core ISA in a way that hinders people from writing portable software or having a common subset to target.
But, say, for example, if a person were to take RISC-V, and then declare that JAL and JALR were only valid when the destination register was ZERO, RA, or X5/T0; with any other registers being reclaimed as more opcode space, etc.
This is unlikely to go over well...
Even if pretty much no software actually uses any of the other registers, and having any more than 1 or 2 bits to encode this is kind of a waste (there being a significant chunk of encoding space eaten up by the JAL instruction).
But standard software (office applications, browsers...) should
just run everywhere, and there it gets hard to justify.
Yeah.
I think in this category, one could just assume basic RV64G or RV64GC or similar.
Just, it still sorta remains an issue when:
This form is ~ 40% slower than it could be;
Most of the extensions are ignoring the core issues;
They are trying to address pretty much everything else...
...
Seemingly the only "major party" to try to address this issue is Qualcomm, but it appears they were also getting a lot of hostility from the way they went about it (and the general consensus was that RV64GC would be the baseline).
Similarly, the main authorities on RISC-V seemingly remain opposed to things like register-indexed load/store, or load/store pair, etc...
Yet. they were OK with the 'A' extension adding the ability to do ALU operations directly on memory, ...
The 'Zba' extension helps, partly...
Still sort of annoying that a theoretically open ecosystem needs authority/leaders, vs everyone being free to just do their own thing.
Granted, on the other side, if one just ends up with a bunch of mutually incompatible ISAs, this isn't really ideal either...
Though, as I see it, it could have been possible to allow both SiFive and friends, and the Qualcomm proposal (of dropping C and reusing the space for more 32-bit ops), to coexist:
Just have them as different operating modes.
Like, I am mostly having luck allowing RV64 and BJX2 to coexist in the same CPU.
I was having an issue for a while with the RV64 + Virtual Memory thing, but seem to have figured it out:
It was an issue I had seemingly already encountered and hacked over in my PE loader, but was being an issue for RV64 mostly because it lacked the workaround in the ELF loader...
Where, it seems in-general, loading a program into virtual memory was unstable. I had just sort of added a hack to "memset()" the allocated memory to make sure it was paged-in and the pages "committed" before loading the program image into memory, which seemed to "fix" whatever was going wrong here.
I remember now that there was an issue with PE where (without the memset) trying to load the PE images would result in a checksum failure. In the case of ELF loading, there is no checksum verification (AFAICT, there is no checksum in ELF).
So, this could potentially be some other sort of virtual memory bug, rather than something specific to RV Mode (or multi-ISA operation).
Otherwise, went and implemented support for the Linux system-call mechanism, but still need to fill in a lot of the syscalls (and Linux has several hundred syscalls).
Ended up slightly tweaking the TestKern mechanism to make it easier to tell which was being used.
Say, TestKern mechanism:
A0 = call object (COM-like calls, NULL for plain syscalls)
A1 = syscall / method number
A2 = return value pointer
A3 = argument list pointer
A4..A6 = Unused, set to 0
A7 = -1 (new, to distinguish from Linux mechanism)
And, Linux mechanism:
A0 = first arg / return value
A1 = second arg
A2..A5 = arguments
A6 = unused / zero
A7 = system call number (>=0 for Linux calls)
Where A0..A7 = X10..X17
Where, Linux syscalls are seemingly limited to 6 arguments, and behave more like bare C function calls, rather than COM objects. Would have preferred leave 0 as the existing mechanism, but it seems 0 is a valid syscall number in Linux ("io_setup", apparently part of AIO).
I guess next step would be to try to get some "closer to native" binaries working (IOW: ones built using GLIBC or Newlib or similar, not entirely sure which GCC would be using in this case).
...