Liste des Groupes | Revenir à c arch |
On 2024-10-03 4:20 a.m., BGB wrote:Possible workarounds:On 10/1/2024 5:00 AM, Robert Finch wrote:Yes, that would work. But I still think postfixes are a little easier to work with. One can assume no register fetches are needed for the postfix, so the last decoder slot does not need to mux registers. If there was a prefix, there could be an extra set of register ports required. Unless one gets into muxing the ports for only instructions that need them.On 2024-09-29 10:19 p.m., BGB wrote:>On 9/29/2024 2:11 PM, MitchAlsup1 wrote:One reason I prefer postfix immediates. They are much easier to work with. Interrupts do not cause issues. The instruction plus postfix can be faked to be treated as one giant instruction. The bits following the instruction are often already present on the cache line. It is just a matter of checking for a postfix when decoding the immediate constants.On Sat, 28 Sep 2024 4:30:12 +0000, BGB wrote:>
>On 9/27/2024 7:43 PM, MitchAlsup1 wrote:>On Fri, 27 Sep 2024 23:53:22 +0000, BGB wrote:>
>
One of the reasons reservation stations became in vouge.
>
Possibly, but is a CPU feature rather than a compiler feature...
A good compiler should be able to make use of 98% of the instruction
set.
Yes, but a reservation station is not part of the ISA proper...
>
>>>------------>>
Saw a video not too long ago where he was making code faster by undoing
a lot of loop unrolling, as the code was apparently spending more in I$
misses than it was gaining by being unrolled.
I noticed this in 1991 when we got Mc88120 simulator up and running.
GBOoO chips are <nearly> best served when there is the smallest number
of instructions.
>
Looking it up, seems the CPU in question (MIPS R4300) was:
16K L1 I$ cache;
8K L1 D$ cache;
No L2 cache (but could be supported off-die);
1-wide scalar, 32 or 64 bit
Non pipelined FPU and multiplier;
...
>
>
Oddly, some amount of these older CPUs seem to have larger I$ than D$, whereas IME the D$ seems to have a higher miss rate (so is easier to justify it being bigger).
>
>>------------>
>
In contrast, a jumbo prefix by itself does not make sense; its meaning
depends on the thing that being is prefixed. Also the decoder will
decode a jumbo prefix and suffix instruction at the same time.
How many bits does one of these jumbo prefixes consume ?
The prefix itself is 32 bits.
In the context of XG3, it supplies 23 or 27 bits.
>
>
For RISC-V ops, they could supply 21 or 26 bits.
>
23+10 = 33 (XG3)
21+12 = 33 (RV op)
27+27+10 = 64 (XG3)
26+26+12 = 64 (RV op)
>
J27 could synthesize an immediate for non-immediate ops:
27+6 = 33 (XG3)
27+5 = 32 (RV)
>
>
For BJX2, the prefixes supply 24 bits (can be stretched to 27 bits in XG2).
24+ 9/10=33 (Base)
24+24+16=64 (Base)
27+27+10=64 (XG2)
>
>
>
But, yeah, perhaps unsurprisingly, the RISC-V people are not so optimistic about the idea of jumbo prefixes...
>
>
Also apparently it seems "yeah, here is a prefix whose primary purpose is just to make the immediate field bigger for the following instruction" is not such an obvious or intuitive idea as I had thought.
>
>
Well, and people obsessing on what happens if an interrupt somehow occurs "between" the prefix and prefixed instruction.
>
Q+ had postfixes that could override a register spec. as well as supply additional constant bits. If an interrupt occurs between the instruction and the postfix, the postfix can be treated as a NOP at the return point.
>
Interrupts also don't apply to prefixes either, if one assumes that the prefix and following instruction are always decoded at the same time (forming a 64-bit instruction), which also makes them faster.
>
Les messages affichés proviennent d'usenet.