Newsportal USENET - Re: Instruction Tracing

On Sun, 11 Aug 2024 14:44:38 +0000, Anton Ertl wrote:

John Levine <johnl@taugh.com> writes:
As far as the delayed branches and such, they made sense in the narrow
time window when it was too expensive to put a cache on a workstation
but that time came and went by the time the RT shipped.
>
Delayed branches were put in the first commercial generation of RISCs
(except ARM), which all shipped with caches (except ARM). Delayed
branches are a natural consequence of the 5-stage (Or, in the 88100
case, four-stage) pipeline.

Delayed branches are wonderful to the pipeline, very much less so for
the architecture overall as it makes wide issue "all that much harder"
It was truly a pain in the ass on Mc88120 a 6-wide machine.
Neither nullification or inverse nullification helped much and both
hurt at wide issue, too. At least Mc88100 had a bit to indicate
the delay slot was not being used.
Looking back, I wish we had not been forced to do them--I think many
of the 1st generation architects wish similarly. Delayed branches
were supposed to bring a 16% gain in performance. After looking at
the utility rates slightly less than 50% useful instructions, with
something slightly over 70% fill rate; they only brought 8%-ish.
{{A useful instruction is useful in both taken and non-taken paths.}}

IIRC ARM used a 3-stage implementation for the ARM1/2, which may be a
consequence of them rejecting delayed branches; and they did not have
caches, so they could not have made use of the higher clock rate that
a longer pipeline could have affored. So it seems that the connection
between cache and delayed branches, if there is any, is the opposite
of what you suggest.
>
Delayed branches provided a speedup on these early 5-stage
implementations. They also provided a big headache for more
sophisticated implementations, and therefore soon fell out of favour.

Much like virtual caches...
The only thing that has persisted is LDs being longer than 2 cycles.
Squashing {forward, ADD, SRAM, LDalign} into 2 cycles is proving
to be a frequency headache in the simpler RISC-V implementations
even now. with wires getting slower and gates getting faster, that
trade off is getting worse. Many of the Intel x86s use 4 cycle LDs.
{the cost of frequency is efficiency}

Power (IIRC) and Alpha don't have delayed branches.

Non of the modern RISCs have them either.

- anton

Date	Sujet	#	Auteur
10 Aug 24	Instruction Tracing	31	Lawrence D'Oliveiro
10 Aug 24	Re: Instruction Tracing	29	Anton Ertl
10 Aug 24	Re: Instruction Tracing	1	MitchAlsup1
10 Aug 24	Re: Instruction Tracing	8	John Dallman
10 Aug 24	Re: Instruction Tracing	1	MitchAlsup1
10 Aug 24	Re: Instruction Tracing	6	BGB
11 Aug 24	Re: Instruction Tracing	4	Lawrence D'Oliveiro
11 Aug 24	Re: Instruction Tracing	3	BGB
11 Aug 24	Re: Instruction Tracing	2	George Neuner
11 Aug 24	Re: Instruction Tracing	1	BGB
12 Aug 24	Re: Instruction Tracing	1	Michael S
10 Aug 24	Re: Instruction Tracing	3	BGB
11 Aug 24	Re: Instruction Tracing	2	MitchAlsup1
11 Aug 24	Re: Instruction Tracing	1	BGB
11 Aug 24	Re: Instruction Tracing	16	John Levine
11 Aug 24	Re: Instruction Tracing	3	OrangeFish
11 Aug 24	Re: Instruction Tracing	2	John Levine
12 Aug 24	Re: Instruction Tracing	1	Lynn Wheeler
11 Aug 24	Re: Instruction Tracing	12	Anton Ertl
11 Aug 24	Re: Instruction Tracing	2	MitchAlsup1
12 Aug 24	Re: Instruction Tracing	1	Lawrence D'Oliveiro
12 Aug 24	Re: Instruction Tracing	9	Lawrence D'Oliveiro
12 Aug 24	Re: Instruction Tracing	2	Terje Mathisen
12 Aug 24	Re: Instruction Tracing	1	Anton Ertl
12 Aug 24	Re: Instruction Tracing	6	Anton Ertl
12 Aug 24	Re: Instruction Tracing	5	Lawrence D'Oliveiro
12 Aug 24	Re: Instruction Tracing	4	Michael S
12 Aug 24	Re: Instruction Tracing	3	Lawrence D'Oliveiro
12 Aug 24	Re: Instruction Tracing	2	Michael S
12 Aug 24	Re: Instruction Tracing	1	MitchAlsup1
10 Aug 24	Re: Instruction Tracing	1	MitchAlsup1