Liste des Groupes | Revenir à c arch |
On 9/18/2024 1:42 PM, MitchAlsup1 wrote:Hacks together such a tool, and the results look curious...On Wed, 18 Sep 2024 17:55:34 +0000, BGB wrote:Likely for a custom CPU to be taken all that seriously at this point, one is going to need binary compatibility with at least one semi-popular ISA.
>On 9/18/2024 9:27 AM, MitchAlsup1 wrote:>On Wed, 18 Sep 2024 4:00:43 +0000, BGB wrote:>
>On 9/17/2024 6:04 PM, MitchAlsup1 wrote:>>Still limited to 32-bit displacement from IP.>
>
How would you perform the following call::
current IP = 0x0000000000001234
target IP = 0x7FFFFFFF00001234
>
This is a single (2-word) instruction in my ISA, assuming GOT is
32-bit displaceable and 64-bit entries.
>
Granted, but in plain RISC-V, there is no real better option.
>
If one wants to generate 64-bit displacement, and doesn't want to load a
constant from memory:
LUI X6, Disp20Hi //20 bits
ADDI X6, X6, Disp12Hi //12 bits
AUIPC X7, Disp20Lo
ADD X7, Disp12Lo
SLLI X6, X6, 32
ADD X7, X7, X6
How very much simpler is::
>
MEM Rd,[IP,Ri<<s,DISP64]
>
1 instruction, 3 words, 1 decode cycle, no forwarding, shorter latency.
>
It is simpler, but N/E in RV64G...
>
This is the whole issue of the idea:
Remain backwards compatible with RV64G / RV64GC (in a binary sense).
So, you like sailing with an albatross tied around your neck:: Check.
>
And, main options at this point are:
RISC-V, which is just kinda meh;
ARMv7 / ARMv8, which are not free;
And, v7/v8 are nowhere near patents expiring.
x86-64, just no.
Doable at least as far as x86-64 and SSE2 should be in the clear.
But, making it not perform well seems harder.
Well, or MIPS64 or SPARC64 or similar, but these are arguably worse options than RISC-V.
The idea is that the mode switching can allow swapping out the Compressed instructions to make room for other stuff, while also leaving the compressed instructions in existence for compatibility with binaries built assuming them.*and* try to allow extending it in a way such that performance can be>
less poor...
I should remind you that if you eliminate the compressed parts of
RISC-V you can fit the entire My 66000 ISA in the space remaining.
All the constants, all transcendentals, all the far-control transfers,
the efficient context switching, overhead free world switching,...
---------
And, is less drastic than gluing together two unrelated ISA's using inter-ISA branches (say, the current situation of trying to mix RISC-V code with XG2 via magic function pointers).
But, yeah, if you want to design a version of your ISA than can also co- execute with RISC-V, not like I have any reason to complain.
I tried adding this stuff experimentally with BGBCC in the past, in both of my ISA efforts, but seemingly my attempts didn't use them all that often (as opposed to [Rb+Disp] and [Rb+Ri*FixSc] which are used extensively).>>>>
Which is sort of the whole reason I am considering hacking around it
with an alternate encoding scheme.
Just put in real constants.>>
New encoding scheme can in theory do:
LEA X7, PC, Disp64
In a single 96-bit instruction.
Where is the indexing register?
Generally the use of a displacement and index register are mutually
exclusive (and, cases that can make use of Disp AND Index are much less
common than Disp OR Index).
COMMON ?alpha/ a(100,100), b(300,300),
>
..
>
x = a(i,j)*b(j,i);
>
I see large displacements with indexing all the time from ASM out
of Brian's compiler.
>
Arguably, the main relevant cases would have been for stack-arrays or arrays inside structs.
But, if such an array is referenced multiple times in a given basic block, it would likely still be more efficient to load the address of the array into a register.
Though, if one were to go simply on usage frequency, likely auto- increment would be slightly ahead.
Say (roughly from memory):
[Rb+Disp] // ~ 60% (includes PC and GBR)
[Rb+Ri*FixSc] // ~ 30% (eg: "ptr[i]")
[Rb]+ // ~ 6% (eg: "*ptr++")
[Rb+Ri*Sc+Disp] // ~ 4% (eg: "obj->arr[i]")
Well, unless someone can find a table that shows significantly different stats. Off hand, not easily finding such a table to compare with (preferably from a relatively mature target which has the relevant modes).
Can note that "*ptr++" seems most common for auto-increment, whereas "*ptr--", "*--ptr", and "*++ptr" are rarer.
Seems like no one has made tables online for the relative usage frequencies of the various x86-64 and ARM64 addressing mode...
Might be useful to have this data for "relatively mature" architectures.
Would be a pain to write an x86-64 disassembler mostly to use it just to stat up the ModR/M+SIB sequences. Does raise the question of if there is a semi-reliable way to stat this without needing to write a full disassembler.
One simple option would be to assume an instruction looks like:
[Prefix Bytes]
[REX byte]
OP_Byte | 0F+OP_Byte
Mod/RM + SIB + ...
And then use a heuristic to try to guess how to interpret the instruction stream based on "looks better" (more likely to be aligned with the instruction stream vs random unaligned garbage).
Though, such a "looks good" heuristic could itself risk skewing the results.
OK.I may still consider defining an encoding for this, but not yet. It is>
in a similar boat as auto-increment. Both add resource cost with
relatively little benefit in terms of overall performance.
Auto-increment because if one has superscalar, the increment can usually
be co-executed. And, full [Rb+Ri*Sc+Disp], because it is just too
infrequent to really justify the extra cost of a 3-way adder even if
limited mostly to the low-order bits...
Myopathy--look it up.
Not sure how that is related (a medical condition involving muscle defects...).
Can also note that a worthwhile design goal is to not add significant cost over what would be needed for a plain RV64GC implementation, but, could define a [Rb+Ri*Sc+Disp] encoding or similar if it would likely be beneficial enough to justify its existence.
As-is, I am trying to come up with something that could potentially be cheaper (both in resource cost and implementation complexity) than my existing BJX2 core (and that could maybe also be made to run Linux without significant porting effort), while still allowing for better performance than RISC-V by itself.
Any fundamentally new features added would need to be "low cost" in this scenario.
Les messages affichés proviennent d'usenet.