Sujet : Re: Instruction Tracing
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.archDate : 11. Aug 2024, 08:08:34
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v99o1m$268ha$1@dont-email.me>
References : 1 2 3 4
User-Agent : Mozilla Thunderbird
On 8/10/2024 6:48 PM, MitchAlsup1 wrote:
On Sat, 10 Aug 2024 21:34:47 +0000, BGB wrote:
>
My rough ranking of instruction probabilities (descending probability,
*):
Load/Store (Constant Displacement, ~30%);
Branch (~ 14% of ops);
ALU, ADD/SUB/AND/OR (~ 13%);
Load/Store (Register Indexed, ~10%);
Compare and Test (~ 6%);
Integer Shift (~ 4%);
Register Move (~ 3%);
Sign/Zero Extension (~ 3%);
ALU, XOR (~ 2%);
Multiply (~ 2%);
...
>
*: Crude estimate based on categorizing the dynamic execution
probabilities (which are per-instruction rather than by category).
>
Meanwhile, DIV and friends are generally closer to 0.05% or so...
You can leave them out and hardly anyone will notice.
The literature from the CRAY-1 era indicated big number crunching
applications use FFDIV about ¼ that of FMUL, IDIV not so much.
Yeah, it is mostly integer divide and modulo that is rare...
For floating-point, N-R converges fairly quickly, so it still works if one has a moderately fast FMUL.
>
For the most part, something like RISC-V makes sense, except that
omitting Indexed Load/Store is basically akin to shooting oneself in the
foot (and does result in a significant increase in the amount of Shift
and ADD instructions used).
>
>
With RISC-V, one may see ~ 25% Load/Store followed by ~ 20% ADD and 15%
Shift, ...
If you add the number of indexed LD/STs in your list above with shifts,
you can find all those missing RISC-V shift instructions.
Yeah, basically...
Some of this is because ADD and Shift end up over-represented by their
need to be used in compound operations (indexed load/store and sign/zero
extension).
RISC-V 16-bit smash::
SLI Rt,Rs,48
SRA Rt,Rt,48
My 66000
SLA Rt,Rs,<16:0>
Where RISC-V uses the shifter at 48-bits twice, My 66000 only uses
the masking part of the shifter.
In my case, "EXTS.W Rm, Rn" exists, but this part was carried over fairly directly from SH.
Something like a bitfield extraction instruction could make sense, but would need more consideration.
<snip>
>
Meanwhile, I am once again reminded of an annoying edge case bug in my
Verilog implementation:
If a TLB Miss happens on an inter-ISA branch, it can leave the CPU core
in an inconsistent state.
Woops...
Yeah.
Kinda need to fix this...
Though, to be fair, branches where the source and destination are running a different instruction set are sort of a rare occurrence...
It is in the category of "fix or add to the list of seemingly stupid edge-case restrictions".
Decided to go and try and fix it by also capturing the mode bits from the corresponding pipeline stage (the same as with some of the dynamic flag bits).
Will need a bit more testing to try to figure out if it worked.
Not fully confirmed, but it appears trying to fix it may have broken the ability to run stuff in RISC-V mode. Trying again to see if RISC-V mode works with virtual memory disabled.
OK, RV mode appears to still work without virtual memory; likely trying to capture these bits created more CPU state instability in the case of TLB Miss interrupts than it fixed.
But, yeah, this part of the CPU core is still fairly fragile; poking at stuff related to the interrupt mechanism is prone to break stuff.
Next step is to disable this state capture and see if stuff works again with virtual memory, and then on from there, at the speed of booting stuff up in the Verilog simulations (not exactly fast). Then maybe try to figure out why it is going wrong.
...