Re: Constant Stack Canaries

Liste des GroupesRevenir à c arch 
Sujet : Re: Constant Stack Canaries
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 01. Apr 2025, 22:21:30
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vshlfk$32p7$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9
User-Agent : Mozilla Thunderbird
On 3/31/2025 11:58 PM, Robert Finch wrote:
On 2025-03-31 4:52 p.m., MitchAlsup1 wrote:
On Mon, 31 Mar 2025 18:56:32 +0000, BGB wrote:
>
On 3/31/2025 1:07 PM, MitchAlsup1 wrote:
-------------
Another option being if it could be a feature of a Load/Store Multiple.
>
Say, LDM/STM:
   6b Hi (Upper bound of register to save)
   6b Lo (Lower bound of registers to save)
   1b LR (Flag to save Link Register)
   1b GP (Flag to save Global Pointer)
   1b SK (Flag to generate a canary)
 Q+3 uses a bitmap of register selection with four more bits selecting overlapping groups. It can work with up to 17 registers.
 
OK.
If I did LDM/STM style ops, not sure which strategy I would take.
The possibility of using a 96-bit encoding with an Imm64 holding a bit-mask of all the registers makes some sense...

>
ENTER and EXIT have 2 of those flags--but also note use of SP and CSP
are implicit.
>
Likely (STM):
   Pushes LR first (if bit set);
   Pushes GP second (if bit set);
   Pushes registers in range (if Hi>=Lo);
   Pushes stack canary (if bit set).
>
EXIT uses its 3rd flag used when doing longjump() and THROW()
so as to pop the call-stack but not actually RET from the stack
walker.
>
>
OK.
>
I guess one could debate whether an LDM could treat the Load-LR as "Load
LR" or "Load address and Branch", and/or have separate flags (Load LR vs
Load PC, with Load PC meaning to branch).
>
>
Other ABIs may not have as much reason to save/restore the Global
Pointer all the time. But, in my case, it is being used as the primary
way of accessing globals, and each binary image has its own address
range here.
>
I use constants to access globals.
These comes in 32-bit and 64-bit flavors.
>
PC-Rel not being used as PC-Rel doesn't allow for multiple process
instances of a given loaded binary within a shared address space.
>
As long as the relative distance is the same, it does.
>
Vs, say, for PIE ELF binaries where it is needed to load a new copy for
each process instance because of this (well, excluding an FDPIC style
ABI, but seemingly still no one seems to have bothered adding FDPIC
support in GCC or friends for RV64 based targets, ...).
>
Well, granted, because Linux and similar tend to load every new process
into its own address space and/or use CoW.
>
CoW and execl()
>
--------------
Other ISAs use a flag bit for each register, but this is less viable
with an ISA with a larger number of registers, well, unless one uses a
64 or 96 bit LDM/STM encoding (possible). Merit though would be not
needing multiple LDM's / STM's to deal with a discontinuous register
range.
>
To quote Trevor Smith:: "Why would anyone want to do that" ??
>
>
Discontinuous register ranges:
Because pretty much no ABI's put all of the callee save registers in a
contiguous range.
>
Granted, I guess if someone were designing an ISA and ABI clean, they
could make all of the argument registers and callee save registers
contiguous.
>
Say:
   R0..R3: Special
   R4..R15: Scratch
   R16..R31: Argument
   R32..R63: Callee Save
....
>
But, invariably, someone will want "compressed" instructions with a
subset of the registers, and one can't just have these only having
access to argument registers.
>
Brian had little trouble using My 66000 ABI which does have contiguous
register groupings.
>
Well, also excluding the possibility where the LDM/STM is essentially
just a function call (say, if beyond certain number of registers are to
be saved/restored, the compiler generates a call to a save/restore
sequence, which is also generates as-needed). Granted, this is basically
the strategy used by BGBCC. If multiple functions happen to save/ restore
the same combination of registers, they get to reuse the prior
function's save/restore sequence (generally folded off to before the
function in question).
>
Calling a subroutine to perform epilogues is adding to the number of
branches a program executes. Having an instruction like EXIT means
when you know you need to exit, you EXIT you don't branch to the exit
point. Saving instructions.
>
>
Prolog needs a call, but epilog can just be a branch, since no need to
return back into the function that is returning.
>
Yes, but this means My 66000 executes 3 fewer transfers of control
per subroutine than you do. And taken branches add latency.
>
Needs to have a lower limit though, as it is not worth it to use a
call/branch to save/restore 3 or 4 registers...
>
But, say, 20 registers, it is more worthwhile.
>
ENTER saves as few as 1 or as many as 32 and remains that 1 single
instruction. Same for EXIT and exit also performs the RET when LDing
R0.
>
>
Granted, the folding strategy can still do canary values, but doing so
in the reused portions would limit the range of unique canary values
(well, unless the canary magic is XOR'ed with SP or something...).
>
Canary values are in addition to ENTER and EXIT not part of them
IMHO.
 In Q+3 there are push and pop multiple instructions. I did not want to add load and store multiple on top of that. They work great for ISRs, but not so great for task switching code. I have the instructions pushing or popping up to 17 registers in a group. Groups of registers overlap by eight. The instructions can handle all 96 registers in the machine. ENTER and EXIT are also present.
 It is looking like the context switch code for the OS will take about 3000 clock cycles to run. Not wanting to disable interrupts for that long, I put a spinlock on the system’s task control block array. But I think I have run into an issue. It is the timer ISR that switches tasks. Since it is an ISR it pushes a subset of registers that it uses and restores them at exit. But when exiting and switching tasks it spinlocks on the task control block array. I am not sure this is a good thing. As the timer IRQ is fairly high priority. If something else locked the TCB array it would deadlock. I guess the context switching could be deferred until the app requests some other operating system function. But then the issue is what if the app gets stuck in an infinite loop, not calling the OS? I suppose I could make an OS heartbeat function call a requirement of apps. If the app does not do a heartbeat within a reasonable time, it could be terminated.
 Q+3 progresses rapidly. A lot of the stuff in earlier versions was removed. The pared down version is a 32-bit machine. Expecting some headaches because of the use of condition registers and branch registers.
 
OK.
Ironically, I seem to have comparably low task-switch cost...
However, each system call is essentially 2 task switches, and it is still slow enough to negatively effect performance if they happen at all frequently.
So, say, one needs to try to minimize the number of unnecessary system calls (say, don't implement "fputs()" by sending 1 byte at a time, ...).
Unlike on a modern PC, one generally needs to care more about efficiency.
Hence, all the fiddling with low bit-depth graphics formats, and things like my recent fiddling with 2-bit ADPCM audio.
And, online, one is (if anything) more likely to find people complaining about how old/obsolescent ADPCM is (and/or arguing that people should store all their sound effects as Ogg/Vorbis or similar; ...).
Then again, I did note that I may need to find some other "quality metric" for audio, as RMSE isn't really working...
At least going by RMSE, the "best" option would be to use 8-bit PCM and then downsample it.
Say, 4kHz 8-bit PCM has a lower RMSE score than 2-bit ADPCM, but subjectively the 2-bit ADPCM sounds significantly better.
Say: for 16kHz, and a test file (using a song here):
   PCM8, 16kHz     : 121 (128 kbps)
   A-Law, 16kHz    : 284 (128 kbps)
   IMA 4bit, 16kHz : 617 (64 kbps)
   IMA 2bit, 16kHz : 1692 (32 kbps, *)
   ADLQ 2bit, 16kHz: 2000 (32 kbps)
   PCM8, 4kHz      : 242  (32 kbps)
However, 4kHz PCM8 sounds terrible vs either 2-bit IMA or ADLQ.
   Basically sounds muffled, speech is unintelligible.
   But, it would be the "best" option if going solely by RMSE.
Also A-Law sounds better than PCM8 (at the same sample rate).
   Even with the higher RMSE score.
Seems like it could be possible to do RMSE on A-Law samples as a metric, but if anything this is just kicking the can down the road slightly.
Granted, A-Law sounds better than 4-bit IMA, and 4-bit IMA sounds better than the 2-bit ADPCM's at least...
*: Previously it was worse, around 4500, but the RMSE score dropped after switching it to using a similar encoder strategy to ADLQ, namely doing a brute-force search over the next 3 samples to find the values that best approximate the target samples.
Though, which is "better", or whether or not even lower RMSE "improves" quality here, is debatable (the PCM8 numbers clearly throw using RMSE as a quality metric into question for this case).
Ideally I would want some metric that better reflects hearing perception and is computationally cheap.
...

Date Sujet#  Auteur
30 Mar 25 * Constant Stack Canaries50Robert Finch
30 Mar 25 `* Re: Constant Stack Canaries49BGB
30 Mar 25  `* Re: Constant Stack Canaries48MitchAlsup1
31 Mar 25   +- Re: Constant Stack Canaries1Robert Finch
31 Mar 25   +- Re: Constant Stack Canaries1BGB
31 Mar 25   `* Re: Constant Stack Canaries45Stephen Fuld
31 Mar 25    `* Re: Constant Stack Canaries44BGB
31 Mar 25     +- Re: Constant Stack Canaries1Stephen Fuld
31 Mar 25     `* Re: Constant Stack Canaries42MitchAlsup1
31 Mar 25      `* Re: Constant Stack Canaries41BGB
31 Mar 25       `* Re: Constant Stack Canaries40MitchAlsup1
1 Apr 25        +* Re: Constant Stack Canaries10Robert Finch
1 Apr 25        i+* Re: Constant Stack Canaries6MitchAlsup1
1 Apr 25        ii`* Re: Constant Stack Canaries5Robert Finch
2 Apr 25        ii `* Re: Constant Stack Canaries4MitchAlsup1
2 Apr 25        ii  `* Re: Constant Stack Canaries3Robert Finch
2 Apr 25        ii   +- Re: Constant Stack Canaries1MitchAlsup1
4 Apr 25        ii   `- Re: Constant Stack Canaries1MitchAlsup1
1 Apr 25        i`* Re: Constant Stack Canaries3BGB
1 Apr 25        i `* Re: Constant Stack Canaries2Robert Finch
2 Apr 25        i  `- Re: Constant Stack Canaries1BGB
1 Apr 25        `* Re: Constant Stack Canaries29BGB
2 Apr 25         `* Re: Constant Stack Canaries28MitchAlsup1
2 Apr 25          +* Re: Constant Stack Canaries26Stefan Monnier
2 Apr 25          i`* Re: Constant Stack Canaries25BGB
3 Apr 25          i `* Re: Constant Stack Canaries24Stefan Monnier
3 Apr 25          i  `* Re: Constant Stack Canaries23BGB
4 Apr 25          i   `* Re: Constant Stack Canaries22Robert Finch
4 Apr 25          i    +- Re: Constant Stack Canaries1BGB
4 Apr 25          i    `* Re: Constant Stack Canaries20MitchAlsup1
5 Apr 25          i     `* Re: Constant Stack Canaries19Robert Finch
5 Apr 25          i      `* Re: Constant Stack Canaries18MitchAlsup1
5 Apr 25          i       +* Re: Constant Stack Canaries3Robert Finch
6 Apr 25          i       i+- Re: Constant Stack Canaries1MitchAlsup1
6 Apr 25          i       i`- Re: Constant Stack Canaries1Robert Finch
6 Apr 25          i       `* Re: Constant Stack Canaries14MitchAlsup1
7 Apr 25          i        `* Re: Constant Stack Canaries13MitchAlsup1
9 Apr 25          i         +- Re: Constant Stack Canaries1MitchAlsup1
15 Apr 25          i         `* Re: Constant Stack Canaries11MitchAlsup1
15 Apr 25          i          `* Re: Constant Stack Canaries10MitchAlsup1
16 Apr 25          i           `* Re: Constant Stack Canaries9MitchAlsup1
16 Apr 25          i            +* Virtualization layers (was: Constant Stack Canaries)2Stefan Monnier
16 Apr 25          i            i`- Re: Virtualization layers1MitchAlsup1
16 Apr 25          i            `* Re: Constant Stack Canaries6Stephen Fuld
17 Apr 25          i             `* Re: virtualization, Constant Stack Canaries5John Levine
17 Apr 25          i              +- Re: virtualization, Constant Stack Canaries1Stefan Monnier
17 Apr 25          i              +- Re: virtualization, Constant Stack Canaries1Stephen Fuld
17 Apr 25          i              `* Re: virtualization, Constant Stack Canaries2MitchAlsup1
17 Apr 25          i               `- Re: virtualization, Constant Stack Canaries1MitchAlsup1
2 Apr 25          `- Re: Constant Stack Canaries1BGB

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal