On 5/24/2024 7:40 PM, Lawrence D'Oliveiro wrote:
On Fri, 24 May 2024 17:57:35 +0200, David Brown wrote:
Why would anyone want a variable that exists for /all/ threads in a
program, but independently per thread? The only use I can think of is
for errno (which is, IMHO, a horror unto itself) but since that is
defined by the implementation, it does not need to use _Thread_local.
errno is indeed the example that immediately comes to mind for the use of
this feature. It is supposed to have the semantics of an assignable
variable, so how else would you implement it, if not by some (possibly
implementation-specific or special-case equivalent of) the _Thread_local
mechanism?
I am in two minds over whether errno is a hack or not. On the one hand, it
makes more sense for system calls (and library ones, too) to return an
error status directly; on the other hand, sometimes maybe you want to
“accumulate” an error status after a series of calls, and errno is a
convenient way of doing this.
As for other uses of thread-local, I think most of them have to do with
optimizations, like threading itself. For example, imagine a bunch of
threads all contributing increments to a common counter: instead of
continually blocking on access to that counter, they could each have their
own thread-local counter, which periodically has its current value added
to the global counter and then zeroed.
In my case (a niche/hobby custom ISA project), it is something like:
int *__get_errno(void);
#define errno (*__get_errno())
Where, say, errno is internally mapped to a thread-local variable.
For vaguely related reasons, stdin/stdout/stderr, are also implemented with a similar mechanism, albeit without being thread-local.
Though, partly the original purpose was "more" that this allows sharing these across DLL boundaries in an implementation where:
* Global variables can't be shared directly across DLL boundaries;
* __declspec(dllimport) / __declspec(dllexport) does not work on global variables.
Note that this is more a limitation of my own stuff (Windows proper does allow sharing global variables via the declspec's; but non-shared is the default, at least with MSVC). But, MSVC does not allow sharing malloc/free pairs across DLL boundaries, or "FILE *" pointers, which will work in my case.
Though in my case the C library linkage works differently:
The C library is static linked to the main EXE;
The DLL's will internally work as satellites off the main EXE, sharing relevant parts of the C library via a COM style interface.
While possibly seeming a little backwards, this structure has the approach of still being usable for bare metal booting (so can still be used for kernel modules; whereas having the C library as a DLL would not work for bare-metal booting). Similarly, the use of a COM interface (rather than DLL imports) allows loading DLLs after the fact without mandating that the main binary had been loaded with a DLL aware loader (the loader I have in the Boot ROM is not aware of or capable of dealing with dynamic linking).
Well, also it avoids mandating two separate versions of the binary format (and runtime libraries) for bare-metal and hosted operation (though, a hosted-only C library exists, which is significantly smaller).
Where, I can note that my project was natively using a modified PE/COFF variant (no MZ stub, optional LZ4 compression, ...) for a custom ISA.
I can make a contrast here with ELF, say:
My stuff allows loading RV64 ELF binaries at boot, or (very recently) as application binaries, but:
Boot time operation requires static-linked ET_EXEC binaries;
Hosted requires PIE / ET_DYN.
The way I am doing the address space excludes using ET_EXEC here.
So, for example, a bootable kernel-like binary could not be launched on top of the kernel as an application nor an application made directly bootable.
Can note that my CPU design is capable of running both my own ISA, and RV64G / RV64imafd (though not implementing the Privileged Spec; differs in terms of the hardware-level interfaces, so will not boot existing RV64 OS's, etc). However, it allows running application level code in both ISA's at the same time (sort of like how x86-64 can also run 32-bit code).
Between them, RV64 tends to be a little bit slower, and the loader also ends up needing more memory mostly due to ELF's less efficient metadata (needs significantly more space for symbol tables and relocs, etc, vs what is needed for the PE/COFF variant).
Technically it is more RV64imfd, but the 'A' just happens to be "there", and on its own GCC doesn't seem to generate any 'A' instructions. No 'C' yet, but mostly because the 'C' encodings were dog-chewed enough to diminish my willingness to bother with them for now. Both could be mapped fairly closely to the same logic in the CPU (though with differences as to ISA-level presentation, for example, my BJX2 ISA is 64 GPRs and no FPRs, wheres RV64G is 32 GPRs + 32 FPRs, so F0..F31 map to R32..R63 in the CPU, along with some other registers being shuffled around, ...). Things are not strictly 1:1 though (as functionality may exist in one ISA that lacks a direct equivalent in the other; but my own ISA is mostly a direct superset of RV64 in terms of functionality).
I ended up choosing to add an RV64 decoder mostly as it was both open and (initially) happened to map pretty close to 1:1 (unlike some other ISA's, which would not map over so cleanly).
Granted, ELF does theoretically allow sharing global variables directly across Shared Object boundaries.
As of yet, it is not possible to do mixed processes (say, an ELF binary using DLLs or PE binary using SO's; across ISA boundaries). The CPU could support this, in theory, but the ABI's are incompatible so there would need to be some sort of thunking layer, and there is no way to autogenerate the thunks absent also knowing the function signatures (or full struct layouts, in the case of any by-value passing).
But, it would be possible in theory if a person writes the thunks manually (in assembly code).
Though, in a more limited extent, the ability to call across ISA boundaries is being used by the system-call mechanism.
There is another hybrid mode, which uses my ISA's encoding but RV64's register space, intended as a possible way to do mixed-ISA (with RV64) programs without thunks, but support in my compiler is still lacking (would effectively need to make my compiler able to use RV64's ABI, which is admittedly a harder part of the challenge, as it is somewhat different in many areas from my own ABI, which loosely descended from the WinCE SH4 ABI).
...