Re: Tonights Tradeoff - Background Execution Buffers

Liste des GroupesRevenir à c arch 
Sujet : Re: Tonights Tradeoff - Background Execution Buffers
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 04. Oct 2024, 18:28:49
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vdp8kk$a94i$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : Mozilla Thunderbird
On 10/3/2024 11:04 PM, Robert Finch wrote:
Today I am wondering how many predicate registers are enough. Scanning webpages reveals a variety. The Itanium has 64-predicates, but they are used for modulo loops and rotated. Rotating register is Itaniums method of register renaming, so it needs more visible registers. In a classic superscalar design with a RAT where registers are renamed, it seems like 64 would be far too many. Cray had eight vector mask registers. I think the RISCV- Hwatcha has 16 if I looked at the diagram correctly.
I cannot see the compiler making use of very many predicate registers simultaneously. Since they are not used simultaneously, and register renaming is in effect, there should not be a great need for predicate registers.
Suppose one wants predicated logic in a loop with the predicate being set outside of the loop. It may be desirable to have several blocks of logic predicated by different predicates in the loop. It is likely desirable to have more than one predicate then.
->Reserved four bits in the instruction for predicates. Do not want to waste bits though. Using a 64-bit instruction.
 
I was getting along OK with a single predicate bit flag.
Had considered supporting an alternate predicate bit, but it didn't seem to gain enough to be worthwhile. Similar for possibly supporting 8 predicate registers, with dedicated logic-ops.
I had originally designed a predicate bit-stack where operations would push/pop the bits in a similar way to x87, but ended up not using it. But, a later idea would have used logic ops with effectively 3-bit register fields.
But, ironically, mostly ended up instead using GPRs for cases where conditional logic ops were needed, as both would end up needing roughly a similar number of instructions (and there were already ways of getting between the T bit and GPRs).
Ironically, have been internally debating whether to try to glue predication onto my tweaked extension of RISC-V (not done so yet, still debating it). With my recent jumbo-prefix extension, had defined a few bits for this, but as-is would mean that any conditional ops would effectively need a 64-bit encoding. Granted, predicated branches are typically only a few instructions.
BGBCC had limited the if/else branches to a single statement with a limit on the types and number of operators in the predicated expression (generally, as past a certain number of operators it becomes more efficient to branch rather than predicate). But, with the scope of predication being so limited, it does also limit the need for more than a single predicate bit.
In things like GL software rasterization, it is useful for things like Z and Alpha testing, which are often prone to eat a lot of cycles on unpredictable branches.
Though, ironically, had recently disabled branch hit/miss modeling in my emulator mostly to make it easier for the emulator to keep up with real-time. Things like modeling branch hit/miss and hit/miss in the cache hierarchy are not ideal for emulator performance (could try to model/detect stale memory accesses from the weak memory model, but this would make things slower than they are already; and by thins point I almost may as well implement a full mockup of the memory subsystem in the emulator, and performing memory accesses by shuffling cache lines around, ...).
As-is, emulator spends more time with the memory subsystem and similar than it spends actually running instructions (but, then again, this part is in turn limited by counting clock cycles and moderating things to try to keep it from running faster than it would in the FPGA version, unless options are given to disable this).
If let to run at full-speed with the cache modeling disabled, the interpreter can run at around 200-250 mips or so (getting much faster than this will require a JIT, but this part had atrophied and doesn't currently work; I don't really need it when trying to model a CPU that runs at 50 MHz...).
Had noted before when trying to run it on a RasPi that my interpreter does still seem to be somewhat faster than the one in DOSBox (which is too slow in this case to really run Doom effectively; but x86 is difficult here due to things like nearly every instruction potentially touching EFLAGS, etc).
Faster emulation is also possible if one can leverage the underlying hardware's address translation, but this depends on a lot of OS-specific stuff (OS isn't going to just give an application access to the underlying page tables or similar, ...). But, a few do go this route.
If doing an OS, one option here being to potentially allow an API to support "virtual nested page tables", where say the application could request that part of its virtual address range be mapped through a logical page-table controlled by the application; with logical page-faults transferred back to the program via the "signal()" mechanism or similar (this would likely be handled independently of however the OS/target implements virtual memory handling at the hardware level). May need to be a way to signal to the OS whenever the user-managed page-table has been updated though (well, and/or require syscalls for each PTE, but this may be slower than simply allowing user-code to update it and using a syscall to be like "hey, the page table has been updated" and let the OS figure out what needs to be invalidated/updated/etc).
...

Date Sujet#  Auteur
7 Sep 24 * Tonights Tradeoff99Robert Finch
7 Sep 24 `* Re: Tonights Tradeoff98MitchAlsup1
8 Sep 24  `* Re: Tonights Tradeoff97Robert Finch
8 Sep 24   `* Re: Tonights Tradeoff96MitchAlsup1
10 Sep 24    `* Re: Tonights Tradeoff95Robert Finch
10 Sep 24     +* Re: Tonights Tradeoff17BGB
10 Sep 24     i+* Re: Tonights Tradeoff12Robert Finch
10 Sep 24     ii+* Re: Tonights Tradeoff10BGB
11 Sep 24     iii`* Re: Tonights Tradeoff9Robert Finch
11 Sep 24     iii +* Re: Tonights Tradeoff7Stephen Fuld
11 Sep 24     iii i+- Re: Tonights Tradeoff1MitchAlsup1
12 Sep 24     iii i`* Re: Tonights Tradeoff5Robert Finch
12 Sep 24     iii i `* Re: Tonights Tradeoff4MitchAlsup1
12 Sep 24     iii i  `* Re: Tonights Tradeoff3Robert Finch
12 Sep 24     iii i   `* Re: Tonights Tradeoff2MitchAlsup1
13 Sep 24     iii i    `- Re: Tonights Tradeoff1MitchAlsup1
12 Sep 24     iii `- Re: Tonights Tradeoff1BGB
11 Sep 24     ii`- Re: Tonights Tradeoff1MitchAlsup1
11 Sep 24     i`* Re: Tonights Tradeoff4MitchAlsup1
12 Sep 24     i `* Re: Tonights Tradeoff3Thomas Koenig
12 Sep 24     i  `* Re: Tonights Tradeoff2BGB
12 Sep 24     i   `- Re: Tonights Tradeoff1Robert Finch
11 Sep 24     `* Re: Tonights Tradeoff77MitchAlsup1
15 Sep 24      `* Re: Tonights Tradeoff76Robert Finch
16 Sep 24       `* Re: Tonights Tradeoff75Robert Finch
24 Sep 24        `* Re: Tonights Tradeoff - Background Execution Buffers74Robert Finch
24 Sep 24         `* Re: Tonights Tradeoff - Background Execution Buffers73MitchAlsup1
26 Sep 24          `* Re: Tonights Tradeoff - Background Execution Buffers72Robert Finch
26 Sep 24           `* Re: Tonights Tradeoff - Background Execution Buffers71MitchAlsup1
27 Sep 24            `* Re: Tonights Tradeoff - Background Execution Buffers70Robert Finch
4 Oct 24             `* Re: Tonights Tradeoff - Background Execution Buffers69Robert Finch
4 Oct 24              +* Re: Tonights Tradeoff - Background Execution Buffers66Anton Ertl
4 Oct 24              i`* Re: Tonights Tradeoff - Background Execution Buffers65Robert Finch
5 Oct 24              i `* Re: Tonights Tradeoff - Background Execution Buffers64Anton Ertl
9 Oct 24              i  `* Re: Tonights Tradeoff - Background Execution Buffers63Robert Finch
9 Oct 24              i   +* Re: Tonights Tradeoff - Background Execution Buffers3MitchAlsup1
9 Oct 24              i   i+- Re: Tonights Tradeoff - Background Execution Buffers1Robert Finch
12 Oct 24              i   i`- Re: Tonights Tradeoff - Background Execution Buffers1BGB
12 Oct 24              i   +* Re: Tonights Tradeoff - Carry and Overflow58Robert Finch
12 Oct 24              i   i`* Re: Tonights Tradeoff - Carry and Overflow57MitchAlsup1
12 Oct 24              i   i `* Re: Tonights Tradeoff - Carry and Overflow56BGB
12 Oct 24              i   i  `* Re: Tonights Tradeoff - Carry and Overflow55Robert Finch
13 Oct 24              i   i   +* Re: Tonights Tradeoff - Carry and Overflow3MitchAlsup1
13 Oct 24              i   i   i`* Re: Tonights Tradeoff - ATOM2Robert Finch
13 Oct 24              i   i   i `- Re: Tonights Tradeoff - ATOM1MitchAlsup1
13 Oct 24              i   i   +- Re: Tonights Tradeoff - Carry and Overflow1BGB
31 Oct 24              i   i   `* Page fetching cache controller50Robert Finch
31 Oct 24              i   i    +- Re: Page fetching cache controller1MitchAlsup1
6 Nov 24              i   i    `* Re: Q+ Fibonacci48Robert Finch
17 Apr 25              i   i     `* Re: register sets47Robert Finch
17 Apr 25              i   i      `* Re: register sets46Stephen Fuld
17 Apr 25              i   i       +- Re: register sets1Robert Finch
17 Apr 25              i   i       `* Re: register sets44MitchAlsup1
18 Apr 25              i   i        `* Re: register sets43Robert Finch
18 Apr 25              i   i         `* Re: register sets42MitchAlsup1
20 Apr 25              i   i          `* Re: register sets41Robert Finch
21 Apr 25              i   i           `* Re: auto predicating branches40Robert Finch
21 Apr 25              i   i            `* Re: auto predicating branches39Anton Ertl
21 Apr 25              i   i             +- Is an instruction on the critical path? (was: auto predicating branches)1Anton Ertl
21 Apr 25              i   i             `* Re: auto predicating branches37MitchAlsup1
22 Apr 25              i   i              `* Re: auto predicating branches36Anton Ertl
22 Apr 25              i   i               +- Re: auto predicating branches1MitchAlsup1
22 Apr 25              i   i               `* Re: auto predicating branches34Anton Ertl
22 Apr 25              i   i                `* Re: auto predicating branches33MitchAlsup1
23 Apr 25              i   i                 +* Re: auto predicating branches3Stefan Monnier
23 Apr 25              i   i                 i`* Re: auto predicating branches2Anton Ertl
25 Apr 25              i   i                 i `- Re: auto predicating branches1MitchAlsup1
23 Apr 25              i   i                 `* Re: auto predicating branches29Anton Ertl
23 Apr 25              i   i                  `* Re: auto predicating branches28MitchAlsup1
24 Apr 25              i   i                   `* Re: asynch register rename27Robert Finch
27 Apr 25              i   i                    `* Re: fractional PCs26Robert Finch
27 Apr 25              i   i                     `* Re: fractional PCs25MitchAlsup1
28 Apr 25              i   i                      `* Re: fractional PCs24Robert Finch
28 Apr 25              i   i                       +* Re: fractional PCs13MitchAlsup1
29 Apr 25              i   i                       i`* Re: fractional PCs12Robert Finch
5 May 25              i   i                       i `* Re: control co-processor11Robert Finch
5 May 25              i   i                       i  `* Re: control co-processor10Al Kossow
5 May 25              i   i                       i   `* Re: control co-processor9Stefan Monnier
6 May 25              i   i                       i    +* Re: control co-processor2MitchAlsup1
7 May 25              i   i                       i    i`- Re: control co-processor1MitchAlsup1
7 May 25              i   i                       i    `* Scan chains (was: control co-processor)6Stefan Monnier
7 May 25              i   i                       i     +* Re: Scan chains (was: control co-processor)2Al Kossow
7 May 25              i   i                       i     i`- Re: Scan chains1Stefan Monnier
7 May 25              i   i                       i     `* Re: Scan chains3MitchAlsup1
7 May 25              i   i                       i      `* Re: Scan chains2Stefan Monnier
8 May 25              i   i                       i       `- Re: Scan chains1MitchAlsup1
29 Apr 25              i   i                       `* Re: fractional PCs10Robert Finch
29 Apr 25              i   i                        `* Re: fractional PCs9MitchAlsup1
30 Apr 25              i   i                         `* Re: fractional PCs8Robert Finch
30 Apr 25              i   i                          +* Re: fractional PCs6Thomas Koenig
1 May 25              i   i                          i+- Re: fractional PCs1Robert Finch
2 May 25              i   i                          i`* Re: fractional PCs4moi
2 May 25              i   i                          i +* Re: millicode, extracode, fractional PCs2John Levine
2 May 25              i   i                          i i`- Re: millicode, extracode, fractional PCs1moi
2 May 25              i   i                          i `- Re: fractional PCs1moi
30 Apr 25              i   i                          `- Re: fractional PCs1MitchAlsup1
13 Oct 24              i   `- Re: Tonights Tradeoff - Background Execution Buffers1Anton Ertl
4 Oct 24              +- Re: Tonights Tradeoff - Background Execution Buffers1BGB
6 Oct 24              `- Re: Tonights Tradeoff - Background Execution Buffers1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal