Re: Constant Stack Canaries

Liste des GroupesRevenir à c arch 
Sujet : Re: Constant Stack Canaries
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 01. Apr 2025, 20:34:10
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vshf6a$3smcv$1@dont-email.me>
References : 1 2 3 4 5 6 7 8
User-Agent : Mozilla Thunderbird
On 3/31/2025 3:52 PM, MitchAlsup1 wrote:
On Mon, 31 Mar 2025 18:56:32 +0000, BGB wrote:
 
On 3/31/2025 1:07 PM, MitchAlsup1 wrote:
-------------
Another option being if it could be a feature of a Load/Store Multiple.
>
Say, LDM/STM:
   6b Hi (Upper bound of register to save)
   6b Lo (Lower bound of registers to save)
   1b LR (Flag to save Link Register)
   1b GP (Flag to save Global Pointer)
   1b SK (Flag to generate a canary)
>
ENTER and EXIT have 2 of those flags--but also note use of SP and CSP
are implicit.
>
Likely (STM):
   Pushes LR first (if bit set);
   Pushes GP second (if bit set);
   Pushes registers in range (if Hi>=Lo);
   Pushes stack canary (if bit set).
>
EXIT uses its 3rd flag used when doing longjump() and THROW()
so as to pop the call-stack but not actually RET from the stack
walker.
>
>
OK.
>
I guess one could debate whether an LDM could treat the Load-LR as "Load
LR" or "Load address and Branch", and/or have separate flags (Load LR vs
Load PC, with Load PC meaning to branch).
>
>
Other ABIs may not have as much reason to save/restore the Global
Pointer all the time. But, in my case, it is being used as the primary
way of accessing globals, and each binary image has its own address
range here.
 I use constants to access globals.
These comes in 32-bit and 64-bit flavors.
 
Typically 16-bit, most are within a 16-bit range of the Global Pointer.

PC-Rel not being used as PC-Rel doesn't allow for multiple process
instances of a given loaded binary within a shared address space.
 As long as the relative distance is the same, it does.
 
Can't happen within a shared address space.
Say, if you load a single copy of a binary at 0x24680000.
Process A and B can't use the same mapping in the same address space, with PC-rel globals, as then they would each see the other's globals.
You can't do a duplicate mapping at another address, as this both wastes VAS, and also any Abs64 base-relocs or similar would differ.
You also can't CoW the data/bss sections, as this is no longer a shared address space.
So, alternative is to use GBR to access globals, with the data/bss sections allocated independently of the binary.
This way, multiple processes can share the same mapping at the same address for any executable code and constant data, with only the data sections needing to be allocated.
Does mean though that one needs to save/restore the global pointer, and there is a ritual for reloading it.
EXE's generally assume they are index 0, so:
   MOV.Q (GBR, 0), Rt
   MOV.Q (Rt, 0), GBR
Or, in RV terms:
   LD    X6, 0(X3)
   LD    X3, Disp33(X6)
Or, RV64G:
   LD    X6, 0(X3)
   LUI   X5, DispHi
   ADD   X5  X5, X6
   LD    X3, DispLo(X5)
For DLL's, the index is fixed up with a base-reloc (for each loaded DLL), so basically the same idea. Typically a Disp33 is used here to allow for a potentially large/unknown number of loaded DLL's. Thus far, a global numbering scheme is used.
Where, (GBR+0) gives the address of a table of global pointers for every loaded binary (can be assumed read-only from userland).
Generally, this is needed if:
   Function may be called from outside of the current binary and:
     Accesses global variables;
     And/or, calls local functions.
Though, still generally lower average-case overhead than the strategy typically used by FDPIC, which would handle this reload process on the caller side...
   SD    X3, Disp(SP)
   LD    X3, 8(X18)
   LD    X6, 0(X18)
   JALR  X1, 0(X6)
   LD    X3, Disp(SP)
With generally every function pointer existing as a pair with the actual function pointer, and its associated global pointer.
Though, caller side handling does arguably avoid the need to perform relocs for the table index.
Though, seemingly no one wants to add FDPIC for RV64G, seeing it mostly as a 32-bit microcontroller thing.
For normal PIE though, absent CoW, it is necessary to load a new copy of the binary each time a new process instance is created.

Vs, say, for PIE ELF binaries where it is needed to load a new copy for
each process instance because of this (well, excluding an FDPIC style
ABI, but seemingly still no one seems to have bothered adding FDPIC
support in GCC or friends for RV64 based targets, ...).
>
Well, granted, because Linux and similar tend to load every new process
into its own address space and/or use CoW.
 CoW and execl()
 
Though, execl() effectively replaces the current process.
IMHO, a "CreateProcess()" style abstraction makes more sense than fork+exec.
Though, one tricky way to handle it is:
   vfork: effectively spawns a thread in the same address space as the caller, with a provisional PID, and semi-copied stack;
   exec: Creates a new process copying the PID and file-descriptors;
     Internally uses CreateProcess;
     Temporary thread disappears once exec is called.
True "fork()" is more of an issue though...
The true "fork()" semantics are not possible on single-address-space or NoMMU systems. Nor fully emulated in things like Cygwin IIRC.
Though, the usual alternative is to give them "vfork()" semantics, and things will probably explode if they do anything other than call exec or similar.

--------------
Other ISAs use a flag bit for each register, but this is less viable
with an ISA with a larger number of registers, well, unless one uses a
64 or 96 bit LDM/STM encoding (possible). Merit though would be not
needing multiple LDM's / STM's to deal with a discontinuous register
range.
>
To quote Trevor Smith:: "Why would anyone want to do that" ??
>
>
Discontinuous register ranges:
Because pretty much no ABI's put all of the callee save registers in a
contiguous range.
>
Granted, I guess if someone were designing an ISA and ABI clean, they
could make all of the argument registers and callee save registers
contiguous.
>
Say:
   R0..R3: Special
   R4..R15: Scratch
   R16..R31: Argument
   R32..R63: Callee Save
....
>
But, invariably, someone will want "compressed" instructions with a
subset of the registers, and one can't just have these only having
access to argument registers.
 Brian had little trouble using My 66000 ABI which does have contiguous
register groupings.
 
But, My66000 also isn't like, "Hey, how about 16-bit ops with 3 or 4 bit register numbers".
Not sure the thinking behind the RV ABI.
In the BJX ABI, the layout directly grew out of the SH ABI mapping, effectively just mirroring the original SH layout 4 times for 64 registers.
The SH layout was contiguous, at least for 16 registers, though a mirrored layout is no longer contiguous.
The RV ABI is not contiguous, but at least still less chaotic than the x86-64 ABIs.

Well, also excluding the possibility where the LDM/STM is essentially
just a function call (say, if beyond certain number of registers are to
be saved/restored, the compiler generates a call to a save/restore
sequence, which is also generates as-needed). Granted, this is basically
the strategy used by BGBCC. If multiple functions happen to save/ restore
the same combination of registers, they get to reuse the prior
function's save/restore sequence (generally folded off to before the
function in question).
>
Calling a subroutine to perform epilogues is adding to the number of
branches a program executes. Having an instruction like EXIT means
when you know you need to exit, you EXIT you don't branch to the exit
point. Saving instructions.
>
>
Prolog needs a call, but epilog can just be a branch, since no need to
return back into the function that is returning.
 Yes, but this means My 66000 executes 3 fewer transfers of control
per subroutine than you do. And taken branches add latency.
 
Granted.
Each predicted branch adds 2 cycles.

Needs to have a lower limit though, as it is not worth it to use a
call/branch to save/restore 3 or 4 registers...
>
But, say, 20 registers, it is more worthwhile.
 ENTER saves as few as 1 or as many as 32 and remains that 1 single
instruction. Same for EXIT and exit also performs the RET when LDing
R0.
 
Granted.
My strategy isn't perfect:
   Non-zero branching overheads, when the feature is used;
   Per-function load/store slides in prolog/epilog, when not used.
Then, the heuristic mostly becomes one of when it is better to use the inline strategy (load/store slide), or to fold them off and use calls/branches.
Does technically also work for RISC-V though (though seemingly GCC always uses inline save/restore, but also the RV ABI has fewer registers).

>
Granted, the folding strategy can still do canary values, but doing so
in the reused portions would limit the range of unique canary values
(well, unless the canary magic is XOR'ed with SP or something...).
>
Canary values are in addition to ENTER and EXIT not part of them
IMHO.
OK.
It sorta made sense to treat canary values as part of the process of saving/restoring the registers, since their main purpose is to protect the saved registers, and particularly the saved PC.
Granted, canary values are not a perfect strategy.
They can provide some added resistance against buffer overflow exploits if the value can be made unknown to the attacker.
This means, ideally:
   Unique to each function, and does not repeat across builds.
     But, by itself, insufficient if a single build is used.
   Is mangled in some other way to avoid repeats.
     Say, XOR'ing with SP and also ASLR'ing the SP.
But, yeah, if the canary value is, say:
(SP XOR Magic) with SP being ASLR'ed, it offers at least some added protection.

Date Sujet#  Auteur
30 Mar 25 * Constant Stack Canaries50Robert Finch
30 Mar 25 `* Re: Constant Stack Canaries49BGB
30 Mar 25  `* Re: Constant Stack Canaries48MitchAlsup1
31 Mar 25   +- Re: Constant Stack Canaries1Robert Finch
31 Mar 25   +- Re: Constant Stack Canaries1BGB
31 Mar 25   `* Re: Constant Stack Canaries45Stephen Fuld
31 Mar 25    `* Re: Constant Stack Canaries44BGB
31 Mar 25     +- Re: Constant Stack Canaries1Stephen Fuld
31 Mar 25     `* Re: Constant Stack Canaries42MitchAlsup1
31 Mar 25      `* Re: Constant Stack Canaries41BGB
31 Mar 25       `* Re: Constant Stack Canaries40MitchAlsup1
1 Apr 25        +* Re: Constant Stack Canaries10Robert Finch
1 Apr 25        i+* Re: Constant Stack Canaries6MitchAlsup1
1 Apr 25        ii`* Re: Constant Stack Canaries5Robert Finch
2 Apr 25        ii `* Re: Constant Stack Canaries4MitchAlsup1
2 Apr 25        ii  `* Re: Constant Stack Canaries3Robert Finch
2 Apr 25        ii   +- Re: Constant Stack Canaries1MitchAlsup1
4 Apr 25        ii   `- Re: Constant Stack Canaries1MitchAlsup1
1 Apr 25        i`* Re: Constant Stack Canaries3BGB
1 Apr 25        i `* Re: Constant Stack Canaries2Robert Finch
2 Apr 25        i  `- Re: Constant Stack Canaries1BGB
1 Apr 25        `* Re: Constant Stack Canaries29BGB
2 Apr 25         `* Re: Constant Stack Canaries28MitchAlsup1
2 Apr 25          +* Re: Constant Stack Canaries26Stefan Monnier
2 Apr 25          i`* Re: Constant Stack Canaries25BGB
3 Apr 25          i `* Re: Constant Stack Canaries24Stefan Monnier
3 Apr 25          i  `* Re: Constant Stack Canaries23BGB
4 Apr 25          i   `* Re: Constant Stack Canaries22Robert Finch
4 Apr 25          i    +- Re: Constant Stack Canaries1BGB
4 Apr 25          i    `* Re: Constant Stack Canaries20MitchAlsup1
5 Apr 25          i     `* Re: Constant Stack Canaries19Robert Finch
5 Apr 25          i      `* Re: Constant Stack Canaries18MitchAlsup1
5 Apr 25          i       +* Re: Constant Stack Canaries3Robert Finch
6 Apr 25          i       i+- Re: Constant Stack Canaries1MitchAlsup1
6 Apr 25          i       i`- Re: Constant Stack Canaries1Robert Finch
6 Apr 25          i       `* Re: Constant Stack Canaries14MitchAlsup1
7 Apr 25          i        `* Re: Constant Stack Canaries13MitchAlsup1
9 Apr 25          i         +- Re: Constant Stack Canaries1MitchAlsup1
15 Apr 25          i         `* Re: Constant Stack Canaries11MitchAlsup1
15 Apr 25          i          `* Re: Constant Stack Canaries10MitchAlsup1
16 Apr 25          i           `* Re: Constant Stack Canaries9MitchAlsup1
16 Apr 25          i            +* Virtualization layers (was: Constant Stack Canaries)2Stefan Monnier
16 Apr 25          i            i`- Re: Virtualization layers1MitchAlsup1
16 Apr 25          i            `* Re: Constant Stack Canaries6Stephen Fuld
17 Apr 25          i             `* Re: virtualization, Constant Stack Canaries5John Levine
17 Apr 25          i              +- Re: virtualization, Constant Stack Canaries1Stefan Monnier
17 Apr 25          i              +- Re: virtualization, Constant Stack Canaries1Stephen Fuld
17 Apr 25          i              `* Re: virtualization, Constant Stack Canaries2MitchAlsup1
17 Apr 25          i               `- Re: virtualization, Constant Stack Canaries1MitchAlsup1
2 Apr 25          `- Re: Constant Stack Canaries1BGB

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal