Re: Misc: Ongoing status...

Liste des GroupesRevenir à c arch 
Sujet : Re: Misc: Ongoing status...
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 02. Feb 2025, 04:58:06
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vnmqgj$hdlo$1@dont-email.me>
References : 1 2 3 4 5 6 7 8
User-Agent : Mozilla Thunderbird
On 2/1/2025 7:22 PM, MitchAlsup1 wrote:
On Sat, 1 Feb 2025 22:42:39 +0000, BGB wrote:
 
On 1/31/2025 10:05 PM, MitchAlsup1 wrote:
--------------------------------
Whereas, if performance is dominated by a piece of code that looks like,
say:
   v0=dytf_int2fixnum(123);
   v1=dytf_int2fixnum(456);
   v2=dytf_mul(v0, v1);
   v3=dytf_int2fixnum(789);
   v4=dytf_add(v2, v3);
   v5=dytf_wrapsymbol("x");
   dytf_storeindex(obj, v5, v4);
   ...
With, say, N levels of call-graph in each called function, but with this
sort of code still managing to dominate the total CPU ("Self%" time).
>
This seems to be a situation where callee-save registers are a big win
for performance IME.
 With callee save registers, the prologue and epilogue of subroutines
sees all the save/restore memory traffic; sometimes saving a register
that is not "in use" and restoring it later.
 
The compiler keeps track of which registers it uses in the function, and only needs to save/restore registers it needs to use.
So, say:
   foo(VM *ctx)
   {
      int i, n;
      n=ctx->numBar;
      for(i=0; i<n; i++)
        bar(ctx, i);
   }
The prolog/epilog only really needs to save 4 registers or so (Say: i, ctx, T0, and LR).
If there are 31 callee-save registers; we save and restore 4 of them, NOT all 31 of them.
Saving 3 registers in the prolog, and restoring 3 in the epilog, is (most likely) going to be cheaper than a spill/reload pair for 'i' and a load for 'ctx' and 'n' each time around the loop (assuming in this case that 'ctx->numBar>2').
Say, in an RV example, callee save:
   LI X18, 0
   LW  X20, 0(X19)
   .L0:
   MV    X10, X19
   MV    X11, X18
   JAL   X1, bar
   ADDI  X18, X18, 1
   BLT   .L0, X18, X20
Or, scratch:
   LI X15, 0
   LW  X13, 0(X14)
   SW    X14, SP(16)
   SW    X13, SP(8)
   .L0:
   SW    X15, SP(0)
   LW    X14, SP(16)
   MV    X10, X14
   MV    X11, X15
   JAL   X1, bar
   LW    X15, SP(0)
   LW    X13, SP(8)
   ADDI  X15, X15, 1
   BLT   .L0, X15, X13
One can place bets which is going to be more efficient here for semi-large 'n'...
Granted, scratch registers may be "in general" better for temporaries, as the lifetime of temporaries is generally very short and they don't usually need to have their values preserved between basic blocks.

With caller save registers, the caller saves exactly the registers
it needs preserved, while the callee saves/restores none. Moreover
it only saves registers currently "in use" and may defer restoring
since it does not need that value in that register for a while.
 
It needs to store any scratch it uses that still hold a live variable, as soon as ANY functions are called.
If you use solely scratch registers, it means that for function call heavy code, there may be excessive amounts of spill-and-reload activity (this is part of what I suspect hurts GCC perf with this sort of code).
This only really makes sense IMO in terms of performance if one assumes (as GCC seems to assume) that function calls are infrequent in hot path code.
Though, I guess it is possible that a compiler heuristic could be put in place to try to classify functions based on their function-call density and switch between strategies based on this (say, using different register allocator behavior for call-light versus call-heavy functions).

So, the instruction path length has a better story in caller saves
than callee saves. Nothing that was "Not live" is ever saved or
restored.
 
Only relevant if there is a high probability that much of the code in a function wont be reached, and if one assumes that different variables will not be assigned to the same registers in different parts of the function.
But, relatedly, I had noted before that "always use as many registers as possible" is not the most efficient strategy.
In BGBCC, it partly divided the callee-save registers into 4 major groups, with XGPR:
   R8..R14 (A)
   R24..R31 (B)
   R40..R47 (C)
   R56..R63 (D)
Where:
   Group A is always enabled;
   Group B is enabled after a low threshold in XG1,
     always enabled in XG2;
   Group C is enabled after a high threshold in XG2;
     XG1 + XGPR: Similar, but the threshold is higher.
     XG1, No XGPR: N/A
   Group D is enabled after a extra high threshold in XG2.
     XG1 + XGPR: Similar, but the threshold is higher.
     XG1, No XGPR: N/A
This is N/A for XG3 mostly as it uses the RV ABI, but:
   X8/X9, X18..X27:
     Enabled for RV modes and XG3
   F8/F9, F18..F27:
     RV Mode:
       Disabled for normal registers;
       Possible selective affinity for FP values;
     XG3:
       Enabled if a medium threshold is crossed.
Enabling a group does not mean all of it will be saved, but rather that it will allow allocating variables within that group (as opposed to forcing spill and reload within the set of already allocated registers).
Beyond the static-assigned variables, new registers will be reserved if:
    The variable isn't already in a register;
    There aren't any free spots within the currently reserved registers;
    None of the prior spots has gone out of lifetime scope;
    There are more more registers it is allowed to locate.
Otherwise, if this situation comes up, it will evict a prior value.

The arguments for callee save have to do with I cache footprint.
Maybe, but it effects performance as well.
As noted, IME, callee-save dominant (or exclusive) allocation does seem to be the more efficient strategy for call intensive code in my experience.

Date Sujet#  Auteur
30 Jan 25 * Misc: Ongoing status...25BGB
31 Jan 25 +* Re: Misc: Ongoing status...19MitchAlsup1
31 Jan 25 i`* Re: Misc: Ongoing status...18BGB
31 Jan 25 i `* Re: Misc: Ongoing status...17MitchAlsup1
1 Feb 25 i  `* Re: Misc: Ongoing status...16BGB
1 Feb 25 i   `* Re: Misc: Ongoing status...15MitchAlsup1
1 Feb 25 i    `* Re: Misc: Ongoing status...14BGB
2 Feb 25 i     `* Re: Misc: Ongoing status...13MitchAlsup1
2 Feb 25 i      +- Re: Misc: Ongoing status...1BGB
2 Feb 25 i      `* Caller-saved vs. callee-saved registers (was: Misc: Ongoing status...)11Anton Ertl
2 Feb 25 i       `* Re: Caller-saved vs. callee-saved registers10BGB
2 Feb 25 i        `* Re: Caller-saved vs. callee-saved registers9BGB
3 Feb 25 i         `* Re: Caller-saved vs. callee-saved registers8MitchAlsup1
3 Feb 25 i          `* Re: Caller-saved vs. callee-saved registers7BGB
3 Feb 25 i           `* Re: Caller-saved vs. callee-saved registers6MitchAlsup1
3 Feb 25 i            `* Re: Caller-saved vs. callee-saved registers5BGB
4 Feb 25 i             `* Re: Caller-saved vs. callee-saved registers4MitchAlsup1
4 Feb 25 i              `* Re: Caller-saved vs. callee-saved registers3BGB
4 Feb 25 i               `* Re: Caller-saved vs. callee-saved registers2MitchAlsup1
5 Feb 25 i                `- Re: Caller-saved vs. callee-saved registers1BGB
9 Mar 25 `* Instruction Parcel Size5Robert Finch
9 Mar 25  `* Re: Instruction Parcel Size4MitchAlsup1
9 Mar 25   +- Re: Instruction Parcel Size1Robert Finch
9 Mar 25   `* Re: Instruction Parcel Size2Robert Finch
9 Mar 25    `- Re: Instruction Parcel Size1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal