Liste des Groupes | Revenir à c arch |
On 3/31/2025 3:52 PM, MitchAlsup1 wrote:---------------------On Mon, 31 Mar 2025 18:56:32 +0000, BGB wrote:
Say I load a copy of the binary text at 0x24680000 and its data at>PC-Rel not being used as PC-Rel doesn't allow for multiple process>
instances of a given loaded binary within a shared address space.
As long as the relative distance is the same, it does.
>
Can't happen within a shared address space.
>
Say, if you load a single copy of a binary at 0x24680000.
Process A and B can't use the same mapping in the same address space,
with PC-rel globals, as then they would each see the other's globals.
You can't do a duplicate mapping at another address, as this both wastesA 64-bit VAS is a wasteable address space, whereas a 48-bit VAS is not.
VAS, and also any Abs64 base-relocs or similar would differ.
You also can't CoW the data/bss sections, as this is no longer a sharedYou are trying to "get at" something here, but I can't see it (yet).
address space.
>I just use 32-bit of 64-bit displacement constants. Does not matter
So, alternative is to use GBR to access globals, with the data/bss
sections allocated independently of the binary.
>
This way, multiple processes can share the same mapping at the same
address for any executable code and constant data, with only the data
sections needing to be allocated.
>
>
Does mean though that one needs to save/restore the global pointer, and
there is a ritual for reloading it.
>
EXE's generally assume they are index 0, so:
MOV.Q (GBR, 0), Rt
MOV.Q (Rt, 0), GBR
Or, in RV terms:
LD X6, 0(X3)
LD X3, Disp33(X6)
Or, RV64G:
LD X6, 0(X3)
LUI X5, DispHi
ADD X5 X5, X6
LD X3, DispLo(X5)
>
>
For DLL's, the index is fixed up with a base-reloc (for each loaded
DLL), so basically the same idea. Typically a Disp33 is used here to
allow for a potentially large/unknown number of loaded DLL's. Thus far,
a global numbering scheme is used.
>
Where, (GBR+0) gives the address of a table of global pointers for every
loaded binary (can be assumed read-only from userland).
>
>
Generally, this is needed if:
Function may be called from outside of the current binary and:
Accesses global variables;
And/or, calls local functions.
>This is just::
Though, still generally lower average-case overhead than the strategy
typically used by FDPIC, which would handle this reload process on the
caller side...
SD X3, Disp(SP)
LD X3, 8(X18)
LD X6, 0(X18)
JALR X1, 0(X6)
LD X3, Disp(SP)
>You are 40 years late on that.
Though, execl() effectively replaces the current process.
>
IMHO, a "CreateProcess()" style abstraction makes more sense than
fork+exec.
If RISC-V removed its 16-bit instructions, there is room in its ISA>>>
But, invariably, someone will want "compressed" instructions with a
subset of the registers, and one can't just have these only having
access to argument registers.
Brian had little trouble using My 66000 ABI which does have contiguous
register groupings.
>
But, My66000 also isn't like, "Hey, how about 16-bit ops with 3 or 4 bit
register numbers".
>
Not sure the thinking behind the RV ABI.
So, you loose 6 cycles on just under ½ of all subroutine calls,>>>
Prolog needs a call, but epilog can just be a branch, since no need to
return back into the function that is returning.
Yes, but this means My 66000 executes 3 fewer transfers of control
per subroutine than you do. And taken branches add latency.
>
Granted.
>
Each predicted branch adds 2 cycles.
>My solution gets rid of the delimma:>Needs to have a lower limit though, as it is not worth it to use a>
call/branch to save/restore 3 or 4 registers...
>
But, say, 20 registers, it is more worthwhile.
ENTER saves as few as 1 or as many as 32 and remains that 1 single
instruction. Same for EXIT and exit also performs the RET when LDing
R0.
>
Granted.
>
My strategy isn't perfect:
Non-zero branching overheads, when the feature is used;
Per-function load/store slides in prolog/epilog, when not used.
>
Then, the heuristic mostly becomes one of when it is better to use the
inline strategy (load/store slide), or to fold them off and use
calls/branches.
Does technically also work for RISC-V though (though seemingly GCC
always uses inline save/restore, but also the RV ABI has fewer
registers).
Les messages affichés proviennent d'usenet.