Sujet : Re: "Mini" tags to reduce the number of op codes
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.archDate : 11. Apr 2024, 19:35:41
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <uv9ahu$1r74h$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
User-Agent : Mozilla Thunderbird
On 4/11/2024 6:13 AM, Michael S wrote:
On Wed, 10 Apr 2024 23:30:02 +0000
mitchalsup@aol.com (MitchAlsup1) wrote:
Scott Lurndal wrote:
>
mitchalsup@aol.com (MitchAlsup1) writes:
BGB wrote:
>
>
In My 66000 case, the constant is the word following the
instruction. Easy to find, easy to access, no register pollution,
no DCache pollution.
>
It does occupy some icache space, however; have you boosted the
icache size to compensate?
>
The space occupied in the ICache is freed up from being in the DCache
so the overall hit rate goes up !! At typical sizes, ICache miss rate
is about ¼ the miss rate of DCache.
>
Besides:: if you had to LD the constant from memory, you use a LD
instruction and 1 or 2 words in DCache, while consuming a GPR. So,
overall, it takes fewer cycles, fewer GPRs, and fewer instructions.
>
Alternatively:: if you paste constants together (LUI, AUPIC) you have
no direct route to either 64-bit constants or 64-bit address spaces.
>
It looks to be a win-win !!
Win-win under constraints of Load-Store Arch. Otherwise, it depends.
FWIW:
The LDSH / SHORI mechanism does provide a way to get 64-bit constants, and needs less encoding space than the LUI route.
MOV Imm16. Rn
SHORI Imm16, Rn
SHORI Imm16, Rn
SHORI Imm16, Rn
Granted, if each is a 1-cycle instruction, this still takes 4 clock cycles.
An encoding that can MOV a 64-bit constant in 96-bits (12 bytes) and 1-cycle, is preferable....
In misc news:
Some compiler fiddling has now dropped the ".text" overhead (vs RV64G) from 10% to 5%.
This was mostly in the form of adding dependency tracking logic to ASM code (albeit in a form where it needs to use ".global" and ".extern" statements for things to work correctly), and no longer giving it a free pass.
This in turn allowed it to effectively cull some parts of the dynamic typesystem runtime and a bunch of the Binary128 support code (shaving roughly 14K off of the Doom build).
Does have a non-zero code impact (mostly in the form of requiring adding ".global" and ".extern" lines to the ASM code in some cases where they were absent).
Looks like a fair chunk of the dynamic types runtime is still present though, which appears to be culled in the GCC build (since GCC doesn't use the dynamic typesystem at all). Theoretically, Doom should not need it, as Doom is entirely "plain old C".
Main part that ended up culled with this change was seemingly most of the code for ex-nihilo objects and similar (which does not seem to be reachable from any of the Doom code).
There is a printf extension for printing variant types, but this is still present in the RV64G build (this would mostly include code needed for the "toString" operation). I guess, one could debate whether printf actually needs support for variant types (as can be noted, most normal C code will not use it).
Though, I guess one option could be to modify it to call toString via a function pointer which is only set if other parts of the dynamic typesystem are initialized (could potentially save several kB off the size of the binary it looks like). Might break stuff though if one ties to printf a variant but had not used any types much beyond fixnum and flonum, which would not have triggered the typesystem to initialize itself.
Probably doesn't matter too much, as this code is not likely a factor in the delta between the ISAs.
Note that if the size of Doom's ".text" section dropped by another 15K, it would reach parity with the RV64G build (which was around 290K in the relevant build ATM; goal being to keep the code fairly close to parity in this case, with the differences mostly allowed for ISA specific stuff).
Though, this is ignoring that roughly 11K of this delta are Jumbo prefixes (so the delta in instruction count is now roughly 1.3% at the moment); and RV64G has an additional 24K in its ".rodata" section (beyond what could be accounted for in string literals and similar).
So, in terms of text+rodata (+strtab *), my stuff is smaller at the moment.
*: Where GCC rolls its string literals into '.rodata', vs BGBCC having a dedicated section for string literals.
...