Newsportal USENET - Re: Tonights Tradeoff - Background Execution Buffers

Re: Tonights Tradeoff - Background Execution Buffers

Sujet : Re: Tonights Tradeoff - Background Execution Buffers
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 04. Oct 2024, 18:28:49

Autres entêtes

Organisation : A noiseless patient Spider
Message-ID : <vdp8kk$a94i$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : Mozilla Thunderbird

On 10/3/2024 11:04 PM, Robert Finch wrote:

Today I am wondering how many predicate registers are enough. Scanning webpages reveals a variety. The Itanium has 64-predicates, but they are used for modulo loops and rotated. Rotating register is Itaniums method of register renaming, so it needs more visible registers. In a classic superscalar design with a RAT where registers are renamed, it seems like 64 would be far too many. Cray had eight vector mask registers. I think the RISCV- Hwatcha has 16 if I looked at the diagram correctly.
I cannot see the compiler making use of very many predicate registers simultaneously. Since they are not used simultaneously, and register renaming is in effect, there should not be a great need for predicate registers.
Suppose one wants predicated logic in a loop with the predicate being set outside of the loop. It may be desirable to have several blocks of logic predicated by different predicates in the loop. It is likely desirable to have more than one predicate then.
->Reserved four bits in the instruction for predicates. Do not want to waste bits though. Using a 64-bit instruction.

I was getting along OK with a single predicate bit flag.
Had considered supporting an alternate predicate bit, but it didn't seem to gain enough to be worthwhile. Similar for possibly supporting 8 predicate registers, with dedicated logic-ops.
I had originally designed a predicate bit-stack where operations would push/pop the bits in a similar way to x87, but ended up not using it. But, a later idea would have used logic ops with effectively 3-bit register fields.
But, ironically, mostly ended up instead using GPRs for cases where conditional logic ops were needed, as both would end up needing roughly a similar number of instructions (and there were already ways of getting between the T bit and GPRs).
Ironically, have been internally debating whether to try to glue predication onto my tweaked extension of RISC-V (not done so yet, still debating it). With my recent jumbo-prefix extension, had defined a few bits for this, but as-is would mean that any conditional ops would effectively need a 64-bit encoding. Granted, predicated branches are typically only a few instructions.
BGBCC had limited the if/else branches to a single statement with a limit on the types and number of operators in the predicated expression (generally, as past a certain number of operators it becomes more efficient to branch rather than predicate). But, with the scope of predication being so limited, it does also limit the need for more than a single predicate bit.
In things like GL software rasterization, it is useful for things like Z and Alpha testing, which are often prone to eat a lot of cycles on unpredictable branches.
Though, ironically, had recently disabled branch hit/miss modeling in my emulator mostly to make it easier for the emulator to keep up with real-time. Things like modeling branch hit/miss and hit/miss in the cache hierarchy are not ideal for emulator performance (could try to model/detect stale memory accesses from the weak memory model, but this would make things slower than they are already; and by thins point I almost may as well implement a full mockup of the memory subsystem in the emulator, and performing memory accesses by shuffling cache lines around, ...).
As-is, emulator spends more time with the memory subsystem and similar than it spends actually running instructions (but, then again, this part is in turn limited by counting clock cycles and moderating things to try to keep it from running faster than it would in the FPGA version, unless options are given to disable this).
If let to run at full-speed with the cache modeling disabled, the interpreter can run at around 200-250 mips or so (getting much faster than this will require a JIT, but this part had atrophied and doesn't currently work; I don't really need it when trying to model a CPU that runs at 50 MHz...).
Had noted before when trying to run it on a RasPi that my interpreter does still seem to be somewhat faster than the one in DOSBox (which is too slow in this case to really run Doom effectively; but x86 is difficult here due to things like nearly every instruction potentially touching EFLAGS, etc).
Faster emulation is also possible if one can leverage the underlying hardware's address translation, but this depends on a lot of OS-specific stuff (OS isn't going to just give an application access to the underlying page tables or similar, ...). But, a few do go this route.
If doing an OS, one option here being to potentially allow an API to support "virtual nested page tables", where say the application could request that part of its virtual address range be mapped through a logical page-table controlled by the application; with logical page-faults transferred back to the program via the "signal()" mechanism or similar (this would likely be handled independently of however the OS/target implements virtual memory handling at the hardware level). May need to be a way to signal to the OS whenever the user-managed page-table has been updated though (well, and/or require syscalls for each PTE, but this may be slower than simply allowing user-code to update it and using a syscall to be like "hey, the page table has been updated" and let the OS figure out what needs to be invalidated/updated/etc).
...

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
7 Sep 24	Tonights Tradeoff	99	Robert Finch
7 Sep 24	Re: Tonights Tradeoff	98	MitchAlsup1
8 Sep 24	Re: Tonights Tradeoff	97	Robert Finch
8 Sep 24	Re: Tonights Tradeoff	96	MitchAlsup1
10 Sep 24	Re: Tonights Tradeoff	95	Robert Finch
10 Sep 24	Re: Tonights Tradeoff	17	BGB
10 Sep 24	Re: Tonights Tradeoff	12	Robert Finch
10 Sep 24	Re: Tonights Tradeoff	10	BGB
11 Sep 24	Re: Tonights Tradeoff	9	Robert Finch
11 Sep 24	Re: Tonights Tradeoff	7	Stephen Fuld
11 Sep 24	Re: Tonights Tradeoff	1	MitchAlsup1
12 Sep 24	Re: Tonights Tradeoff	5	Robert Finch
12 Sep 24	Re: Tonights Tradeoff	4	MitchAlsup1
12 Sep 24	Re: Tonights Tradeoff	3	Robert Finch
12 Sep 24	Re: Tonights Tradeoff	2	MitchAlsup1
13 Sep 24	Re: Tonights Tradeoff	1	MitchAlsup1
12 Sep 24	Re: Tonights Tradeoff	1	BGB
11 Sep 24	Re: Tonights Tradeoff	1	MitchAlsup1
11 Sep 24	Re: Tonights Tradeoff	4	MitchAlsup1
12 Sep 24	Re: Tonights Tradeoff	3	Thomas Koenig
12 Sep 24	Re: Tonights Tradeoff	2	BGB
12 Sep 24	Re: Tonights Tradeoff	1	Robert Finch
11 Sep 24	Re: Tonights Tradeoff	77	MitchAlsup1
15 Sep 24	Re: Tonights Tradeoff	76	Robert Finch
16 Sep 24	Re: Tonights Tradeoff	75	Robert Finch
24 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	74	Robert Finch
24 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	73	MitchAlsup1
26 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	72	Robert Finch
26 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	71	MitchAlsup1
27 Sep 24	Re: Tonights Tradeoff - Background Execution Buffers	70	Robert Finch
4 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	69	Robert Finch
4 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	66	Anton Ertl
4 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	65	Robert Finch
5 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	64	Anton Ertl
9 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	63	Robert Finch
9 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	3	MitchAlsup1
9 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	Robert Finch
12 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	BGB
12 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	58	Robert Finch
12 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	57	MitchAlsup1
12 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	56	BGB
12 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	55	Robert Finch
13 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	3	MitchAlsup1
13 Oct 24	Re: Tonights Tradeoff - ATOM	2	Robert Finch
13 Oct 24	Re: Tonights Tradeoff - ATOM	1	MitchAlsup1
13 Oct 24	Re: Tonights Tradeoff - Carry and Overflow	1	BGB
31 Oct 24	Page fetching cache controller	50	Robert Finch
31 Oct 24	Re: Page fetching cache controller	1	MitchAlsup1
6 Nov 24	Re: Q+ Fibonacci	48	Robert Finch
17 Apr 25	Re: register sets	47	Robert Finch
17 Apr 25	Re: register sets	46	Stephen Fuld
17 Apr 25	Re: register sets	1	Robert Finch
17 Apr 25	Re: register sets	44	MitchAlsup1
18 Apr 25	Re: register sets	43	Robert Finch
18 Apr 25	Re: register sets	42	MitchAlsup1
20 Apr 25	Re: register sets	41	Robert Finch
21 Apr 25	Re: auto predicating branches	40	Robert Finch
21 Apr 25	Re: auto predicating branches	39	Anton Ertl
21 Apr 25	Is an instruction on the critical path? (was: auto predicating branches)	1	Anton Ertl
21 Apr 25	Re: auto predicating branches	37	MitchAlsup1
22 Apr 25	Re: auto predicating branches	36	Anton Ertl
22 Apr 25	Re: auto predicating branches	1	MitchAlsup1
22 Apr 25	Re: auto predicating branches	34	Anton Ertl
22 Apr 25	Re: auto predicating branches	33	MitchAlsup1
23 Apr 25	Re: auto predicating branches	3	Stefan Monnier
23 Apr 25	Re: auto predicating branches	2	Anton Ertl
25 Apr 25	Re: auto predicating branches	1	MitchAlsup1
23 Apr 25	Re: auto predicating branches	29	Anton Ertl
23 Apr 25	Re: auto predicating branches	28	MitchAlsup1
24 Apr 25	Re: asynch register rename	27	Robert Finch
27 Apr 25	Re: fractional PCs	26	Robert Finch
27 Apr 25	Re: fractional PCs	25	MitchAlsup1
28 Apr 25	Re: fractional PCs	24	Robert Finch
28 Apr 25	Re: fractional PCs	13	MitchAlsup1
29 Apr 25	Re: fractional PCs	12	Robert Finch
5 May 25	Re: control co-processor	11	Robert Finch
5 May 25	Re: control co-processor	10	Al Kossow
5 May 25	Re: control co-processor	9	Stefan Monnier
6 May 25	Re: control co-processor	2	MitchAlsup1
7 May 25	Re: control co-processor	1	MitchAlsup1
7 May 25	Scan chains (was: control co-processor)	6	Stefan Monnier
7 May 25	Re: Scan chains (was: control co-processor)	2	Al Kossow
7 May 25	Re: Scan chains	1	Stefan Monnier
7 May 25	Re: Scan chains	3	MitchAlsup1
7 May 25	Re: Scan chains	2	Stefan Monnier
8 May 25	Re: Scan chains	1	MitchAlsup1
29 Apr 25	Re: fractional PCs	10	Robert Finch
29 Apr 25	Re: fractional PCs	9	MitchAlsup1
30 Apr 25	Re: fractional PCs	8	Robert Finch
30 Apr 25	Re: fractional PCs	6	Thomas Koenig
1 May 25	Re: fractional PCs	1	Robert Finch
2 May 25	Re: fractional PCs	4	moi
2 May 25	Re: millicode, extracode, fractional PCs	2	John Levine
2 May 25	Re: millicode, extracode, fractional PCs	1	moi
2 May 25	Re: fractional PCs	1	moi
30 Apr 25	Re: fractional PCs	1	MitchAlsup1
13 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	Anton Ertl
4 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	BGB
6 Oct 24	Re: Tonights Tradeoff - Background Execution Buffers	1	MitchAlsup1