Newsportal USENET - Re: Cost of handling misaligned access

On 2/4/2025 4:17 PM, MitchAlsup1 wrote:

On Tue, 4 Feb 2025 20:49:14 +0000, BGB wrote:

On 2/4/2025 1:25 PM, Scott Lurndal wrote:
mitchalsup@aol.com (MitchAlsup1) writes:
-------------------
Comparing to the CISC architectures of the 60s and 70s,
it's not horrible.
>
>
Well, vs a modern RISC style ISA, say, caller side:
   MOV R20, R10 //0c (SSC with following)
   MOV R21, R11 //1c
   BSR func     //2c (typically)
Cost: 3 cycles.
    MOV R1,R30
     MOV R2,R28
     CALL func
3 instructions, might be 1 cycle on a 3-wide machine. And when
BRS/CALL is visible at FETCH 2 cycles before it DECODEs, the
call overhead is 0 cycles.

In my case, branch costs are:
   2c: Taken, correctly predicted.
   1c: Not taken, correctly predicted.
   10c: Not predicted.
   2c: Branch proper;
   8c: Pipeline flush.

func:
   ADD SP, -32, SP      //2c (1 c penalty)
   MOV.Q LR, (SP, 24) //1c
   MOV.X R18, (SP, 0) //1c
   ...
   MOV.Q (SP, 24), LR //2c (1c penalty)
   MOV.X (SP, 0), R18 //1c
   JMP    LR            //10c (*1)
>
*1: Insufficient delay since LR reload, so branch predictor fails to
handle this case.
This should be call/return predicted "just fine".
It should not be indirect predictor predicted.

No special call/return predictor here.
For JMP LR:
   2c: If no modification to LR exists in the ID2 or EXn stages.
   10c: If a modification exits.
For JMP Rn (generic):
   10 cycles.
LR is special and has a hot-path into the branch predictor.
But, it can only be used if LR has not modified within a certain window.
In this example, it fails mostly because there are not enough cycles of delay (need roughly 5 instructions between the LR reload and JMP).
This is basically needed to avoid predicting the branch using a potentially stale value.
For RV's JALR, additionally it also requires that the displacement be 0.
For a moment, I was left feeling unsure about the use-case for the displacement on JALR, but then remembered RV can compose longer-branches via AUIPC+JALR.

Cost: 16 cycles.
func:
     ENTER R30,R1,#32
     ...
     EXIT   R30,R1,#32
9 instructions on your machine, 5 on mine; also note: my ISA loads
the return address directly into IP so FETCH can begin while the
other LDs are in progress:: So, for the same amount of work, it
would take only 3 cycles (with a bunch of caveats).
But in any event, these are down about as low as one can expect
whereas 432 is close to 1000 cycles, we all complained about VAX
when it was in the 20-30 cycle range of overhead.
as to why:: 432 changed the capabilities maps at call and return,
and since these were not cached,... caller cannot see some of the
capabilities called has access to, and vice versa. With a lot bet-
ter caching of capabilities and modern bus widths, 432 might only
be in the 40-50 cycle range of overhead.
Moral:: Do not do way more work than required.

Yeah, basically.

....

Date	Sujet	#	Auteur
2 Feb 25	Re: Cost of handling misaligned access	19	Anton Ertl
2 Feb 25	Re: Cost of handling misaligned access	18	Thomas Koenig
2 Feb 25	Re: Fun with a Vax, Cost of handling misaligned access	2	John Levine
3 Feb 25	Re: Fun with a Vax, Cost of handling misaligned access	1	John Levine
3 Feb 25	Re: Cost of handling misaligned access	2	BGB
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	13	Terje Mathisen
3 Feb 25	Re: Cost of handling misaligned access	12	John Levine
3 Feb 25	Re: Cost of handling misaligned access	11	MitchAlsup1
4 Feb 25	Re: Cost of handling misaligned access	4	John Levine
4 Feb 25	Re: Cost of handling misaligned access	3	John Dallman
5 Feb 25	Re: Cost of handling misaligned access	2	Michael S
5 Feb 25	Re: Cost of handling misaligned access	1	John Dallman
4 Feb 25	Re: Cost of handling misaligned access	6	MitchAlsup1
4 Feb 25	Re: Cost of handling misaligned access	1	Stephen Fuld
4 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	3	BGB
4 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
5 Feb 25	Re: Cost of handling misaligned access	1	BGB