Liste des Groupes | Revenir à c arch |
On Tue, 1 Apr 2025 19:34:10 +0000, BGB wrote:OK.
On 3/31/2025 3:52 PM, MitchAlsup1 wrote:---------------------On Mon, 31 Mar 2025 18:56:32 +0000, BGB wrote:Say I load a copy of the binary text at 0x24680000 and its data at>PC-Rel not being used as PC-Rel doesn't allow for multiple process>
instances of a given loaded binary within a shared address space.
As long as the relative distance is the same, it does.
>
Can't happen within a shared address space.
>
Say, if you load a single copy of a binary at 0x24680000.
Process A and B can't use the same mapping in the same address space,
with PC-rel globals, as then they would each see the other's globals.
0x35900000 for a distance of 0x11280000 into the address space of
a process.
Then I load another copy at 0x44680000 and its data at 55900000
into the address space of a different process.
PC-rel addressing works in both cases--because the distance (-rel)
remains the same,
and the MMU can translate the code to the same physical, and map
each area of data individually.
Different virtual addresses, same code physical address, different
data virtual and physical addresses.
You can't do a duplicate mapping at another address, as this both wastesA 64-bit VAS is a wasteable address space, whereas a 48-bit VAS is not.
VAS, and also any Abs64 base-relocs or similar would differ.
Shared address space assumes all processes have the same page tables and shared address mappings and TLB contents (though, ACL checking can be different, as the ACL/KRR stuff is not based on having separate contents in the page tables or TLB, *).You also can't CoW the data/bss sections, as this is no longer a sharedYou are trying to "get at" something here, but I can't see it (yet).
address space.
GBR or GP is specially designated as a global pointer though.>I just use 32-bit of 64-bit displacement constants. Does not matter
So, alternative is to use GBR to access globals, with the data/bss
sections allocated independently of the binary.
>
This way, multiple processes can share the same mapping at the same
address for any executable code and constant data, with only the data
sections needing to be allocated.
>
>
Does mean though that one needs to save/restore the global pointer, and
there is a ritual for reloading it.
>
EXE's generally assume they are index 0, so:
MOV.Q (GBR, 0), Rt
MOV.Q (Rt, 0), GBR
Or, in RV terms:
LD X6, 0(X3)
LD X3, Disp33(X6)
Or, RV64G:
LD X6, 0(X3)
LUI X5, DispHi
ADD X5 X5, X6
LD X3, DispLo(X5)
>
>
For DLL's, the index is fixed up with a base-reloc (for each loaded
DLL), so basically the same idea. Typically a Disp33 is used here to
allow for a potentially large/unknown number of loaded DLL's. Thus far,
a global numbering scheme is used.
>
Where, (GBR+0) gives the address of a table of global pointers for every
loaded binary (can be assumed read-only from userland).
>
>
Generally, this is needed if:
Function may be called from outside of the current binary and:
Accesses global variables;
And/or, calls local functions.
how control arrived at this subroutine, it accesses its data as the
linker resolved addresses--without wasting a register.
OK.>This is just::
Though, still generally lower average-case overhead than the strategy
typically used by FDPIC, which would handle this reload process on the
caller side...
SD X3, Disp(SP)
LD X3, 8(X18)
LD X6, 0(X18)
JALR X1, 0(X6)
LD X3, Disp(SP)
CALX [IP,,#GOT[funct_num]-.]
In the 32-bit linking mode this is a 2 word instruction, in the 64-bit
linking mode it is a 3 word instruction.
----------------
I am just doing it the Windows (or Cygwin) way...>You are 40 years late on that.
Though, execl() effectively replaces the current process.
>
IMHO, a "CreateProcess()" style abstraction makes more sense than
fork+exec.
---------------Yeah, errm, how do you think XG3 came about?...If RISC-V removed its 16-bit instructions, there is room in its ISA>>>
But, invariably, someone will want "compressed" instructions with a
subset of the registers, and one can't just have these only having
access to argument registers.
Brian had little trouble using My 66000 ABI which does have contiguous
register groupings.
>
But, My66000 also isn't like, "Hey, how about 16-bit ops with 3 or 4 bit
register numbers".
>
Not sure the thinking behind the RV ABI.
to put my entire ISA along with all the non-compressed RISC-V inst-
ructions.
---------------Possibly, but I don't think it is quite that bad on average...So, you loose 6 cycles on just under ½ of all subroutine calls,>>>
Prolog needs a call, but epilog can just be a branch, since no need to
return back into the function that is returning.
Yes, but this means My 66000 executes 3 fewer transfers of control
per subroutine than you do. And taken branches add latency.
>
Granted.
>
Each predicted branch adds 2 cycles.
while also executing 2-5 instructions manipulating your global
pointer.
OK.>My solution gets rid of the delimma:>Needs to have a lower limit though, as it is not worth it to use a>
call/branch to save/restore 3 or 4 registers...
>
But, say, 20 registers, it is more worthwhile.
ENTER saves as few as 1 or as many as 32 and remains that 1 single
instruction. Same for EXIT and exit also performs the RET when LDing
R0.
>
Granted.
>
My strategy isn't perfect:
Non-zero branching overheads, when the feature is used;
Per-function load/store slides in prolog/epilog, when not used.
>
Then, the heuristic mostly becomes one of when it is better to use the
inline strategy (load/store slide), or to fold them off and use
calls/branches.
a) the call code is always smaller
b) the call code never takes more cycles
In addition, there is a straightforward way to elide the STs of ENTER
when the memory unit is still executing the previous EXIT.
Does technically also work for RISC-V though (though seemingly GCC
always uses inline save/restore, but also the RV ABI has fewer
registers).
Les messages affichés proviennent d'usenet.