On 12/23/2024 3:18 PM, Tim Rentsch wrote:
Michael S <already5chosen@yahoo.com> writes:
On Mon, 23 Dec 2024 09:46:46 +0100
David Brown <david.brown@hesbynett.no> wrote:
>
And Tim did not rule out using the standard library,
>
Are you sure?
I explicitly called out setjmp and longjmp as being excluded.
Based on that, it's reasonable to infer the rest of the
standard library is allowed.
Furthermore I don't think it matters. Except for a very small
set of functions -- eg, fopen, fgetc, fputc, malloc, free --
everything else in the standard library either isn't important
for Turing Completeness or can be synthesized from the base
set. The functionality of fprintf(), for example, can be
implemented on top of fputc and non-library language features.
If I were to choose a set of primitive functions, probably:
malloc/free and/or realloc
could define, say:
malloc(sz) => realloc(NULL, sz)
free(ptr) => realloc(ptr, 0)
Maybe _msize and _mtag/..., but this is non-standard.
With _msize, can implement realloc on top of malloc/free.
For basic IO:
fopen, fclose, fseek, fread, fwrite
printf could be implemented on top of vsnprintf and fputs
fputs can be implemented on top of fwrite (via strlen).
With a temporary buffer buffer being used for the printed string.
...
Though, one may still end up with various other stuff over the interface as well. Though, the interface can be made open-ended if one has a GetInterface call or similar, which can request other interfaces given an ID, such as, FOURCC/EIGHTCC pair, a SIXTEENCC, or GUID (*1). IMHO, generally preferable over a "GetProcAddress" mechanism due to lower overheads; tough, with an annoyance that interface vtables generally have a fixed layout (generally can't really add or change anything without creating binary compatibility issues; so a lot of tables/structures need to be kept semi-frozen).
Though, APIs like DirectX had dealt with the issue of having version numbers for vtables and then one requests a specific version of the vtable (within the range of versions supported by the major version of DirectX). But, this is crufty.
*1: Say: QWORD qwMajor, QWORD qwMinor.
qwMajor:
Major ID (FOURCC, EIGHTCC)
Or: First 8 bytes of SIXTEENCC or GUID
qwMinor:
SubID/Version (FOURCC or EIGHTCC)
Second 8 bytes of SIXTEENCC or GUID.
Where:
High 32 bits are 0, assume FOURCC.
Else, look at bits to determine EIGHTCC vs GUID.
Assume if both are EIGHTCC, value represents a SIXTEENCC.
Bit patterns for valid SIXTEENCCs vs GUIDs are mutually exclusive.
Names make more sense for public interfaces.
Leaving GUIDs mostly for private/internal interfaces.
Well, unlike Windows, where they use GUIDs for pretty much everything here (and also, I didn't bother with an IDL compiler; generally doing all this directly in C).
Well, and some wonk, like the exact contents of structures like BITMAPINFOHEADER being interpreted based on using biSize as a magic number (well, sometimes with other stuff glued onto the end, as understood based the use of the biCompression field), ...
But, it has held up well, this structure being almost as old as I am...
In a few cases, one might also take the option of using a "DriverProc()" style interface, where one provides a pair of context-dependent pointers and uses magic numbers to identify the desired operation, or, intermediate:
(*ifvt)->QueryProc(ifvt, iHdl, lParm, pParm1, pParm2);
(*ifvt)->ModifyProc(ifvt, iHdl, lParm, pParm1, pParm2);
Where, QueryProc is intended for non-destructive operations, and ModifyProc for destructive operations.
iHdl: Context-dependent integer handle;
lParm: Magic command number.
pParm1/pParm2: Magic pointers, often:
pParm1: Input data address;
pParm2: Output data address.
Where, vtable is usually provided in "VT **" form, hence the need to deref the table before a method can be invoked.
Actually, some of this overlaps with how I had implemented the C library for DLLs in my project:
Only the main binary has the full C library;
DLL's generally use a C library which calls back to the main C library via a COM style interface (things like malloc/free and stdio calls are routed over this interface).
Note that this is partly because in my case:
1, DLLs only allow an acyclic dependency graph;
2, The mechanism does not currently allow sharing global variables;
3, There was a desire to allow dlopen/dlsym to dynamically load libraries.
1 & 3 mean that if a statically-linked C library is used for the main binary:
One needs to also statically link a C library to each DLL;
The C library needs to operate over a COM interface for shared interfaces.
Or, alternatively, that only a DLL may be used for the C library, and all DLLs would need to use the same C library DLL.
Note that neither 1 nor 2 traditionally apply with ELF Shared Objects (which usually both shared everything and allow for cyclic dependency graphs). But, traditionally ELF has other drawbacks, like needing to access variables and call functions via a GOT (which has higher overhead than direct calls, or accessing global variables as a fixed offset relative to a known base register, ...).
Note that having the kernel inject DLLs into a running process wouldn't really mix well with the way glibc approaches shared objects (where, it manages this stuff in userland, rather than having this left up to the kernel's program loader).
May not matter as much though as if providing an COM-like interface, one doesn't necessarily actually need dlopen/dlsym to be able to see the symbols in the library that the interface came from.
Where, in this case, COM-like interfaces may be used in ways that deviate from usual dependency ordering; and was more flexible. They are awkward to use directly, so it may make sense to provide C API wrappers (thus far, usually statically linked, but they can fetch the interfaces they need from the main C library or the OS).
Where, in my case, the OS interface is a mix of conventional syscalls and object-method-calls routed over the syscall interface (the target being either in the kernel or in another process; or the OS might load a DLL into the client process and return a process-local vtable).
If non-local, generally the method pointers are generic, and serve to forward the call over the syscall mechanism (the syscall interface being used in a somewhat different way from how it would be used in something like Linux; where Linux generally just does not do things this way...).
...