Re: "Mini" tags to reduce the number of op codes

Liste des GroupesRevenir à c arch 
Sujet : Re: "Mini" tags to reduce the number of op codes
De : bohannonindustriesllc (at) *nospam* gmail.com (BGB-Alt)
Groupes : comp.arch
Date : 10. Apr 2024, 22:53:32
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <uv71os$17d11$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : Mozilla Thunderbird
On 4/10/2024 1:57 PM, Chris M. Thomasson wrote:
On 4/10/2024 12:41 AM, BGB wrote:
On 4/10/2024 12:01 AM, Chris M. Thomasson wrote:
On 4/9/2024 3:47 PM, BGB-Alt wrote:
On 4/9/2024 4:05 PM, MitchAlsup1 wrote:
BGB wrote:
>
On 4/9/2024 1:24 PM, Thomas Koenig wrote:
I wrote:
>
MitchAlsup1 <mitchalsup@aol.com> schrieb:
Thomas Koenig wrote:
>
Maybe one more thing: In order to justify the more complex encoding,
I was going for 64 registers, and that didn't work out too well
(missing bits).
>
Having learned about M-Core in the meantime, pure 32-register,
21-bit instruction ISA might actually work better.
>
>
For 32-bit instructions at least, 64 GPRs can work out OK.
>
Though, the gain of 64 over 32 seems to be fairly small for most "typical" code, mostly bringing a benefit if one is spending a lot of CPU time in functions that have large numbers of local variables all being used at the same time.
>
>
Seemingly:
16/32/48 bit instructions, with 32 GPRs, seems likely optimal for code density;
32/64/96 bit instructions, with 64 GPRs, seems likely optimal for performance.
>
Where, 16 GPRs isn't really enough (lots of register spills), and 128 GPRs is wasteful (would likely need lots of monster functions with 250+ local variables to make effective use of this, *, which probably isn't going to happen).
>
16 GPRs would be "almost" enough if IP, SP, FP, TLS, GOT were not part of GPRs AND you have good access to constants.
>
>
On the main ISA's I had tried to generate code for, 16 GPRs was kind of a pain as it resulted in fairly high spill rates.
>
Though, it would probably be less bad if the compiler was able to use all of the registers at the same time without stepping on itself (such as dealing with register allocation involving scratch registers while also not conflicting with the use of function arguments, ...).
>
>
My code generators had typically only used callee save registers for variables in basic blocks which ended in a function call (in my compiler design, both function calls and branches terminating the current basic-block).
>
On SH, the main way of getting constants (larger than 8 bits) was via PC-relative memory loads, which kinda sucked.
>
>
This is slightly less bad on x86-64, since one can use memory operands with most instructions, and the CPU tends to deal fairly well with code that has lots of spill-and-fill. This along with instructions having access to 32-bit immediate values.
>
>
*: Where, it appears it is most efficient (for non-leaf functions) if the number of local variables is roughly twice that of the number of CPU registers. If more local variables than this, then spill/fill rate goes up significantly, and if less, then the registers aren't utilized as effectively.
>
Well, except in "tiny leaf" functions, where the criteria is instead that the number of local variables be less than the number of scratch registers. However, for many/most small leaf functions, the total number of variables isn't all that large either.
>
The vast majority of leaf functions use less than 16 GPRs, given one has
a SP not part of GPRs {including arguments and return values}. Once one starts placing things like memove(), memset(), sin(), cos(), exp(), log()
in the ISA, it goes up even more.
>
>
Yeah.
>
Things like memcpy/memmove/memset/etc, are function calls in cases when not directly transformed into register load/store sequences.
>
Did end up with an intermediate "memcpy slide", which can handle medium size memcpy and memset style operations by branching into a slide.
>
>
>
As noted, on a 32 GPR machine, most leaf functions can fit entirely in scratch registers. On a 64 GPR machine, this percentage is slightly higher (but, not significantly, since there are few leaf functions remaining at this point).
>
>
If one had a 16 GPR machine with 6 usable scratch registers, it is a little harder though (as typically these need to cover both any variables used by the function, and any temporaries used, ...). There are a whole lot more leaf functions that exceed a limit of 6 than of 14.
>
But, say, a 32 GPR machine could still do well here.
>
>
Note that there are reasons why I don't claim 64 GPRs as a large performance advantage:
On programs like Doom, the difference is small at best.
>
>
It mostly effects things like GLQuake in my case, mostly because TKRA-GL has a lot of functions with a large numbers of local variables (some exceeding 100 local variables).
>
Partly though this is due to code that is highly inlined and unrolled and uses lots of variables tending to perform better in my case (and tightly looping code, with lots of small functions, not so much...).
>
>
>
Where, function categories:
   Tiny Leaf:
     Everything fits in scratch registers, no stack frame, no calls.
   Leaf:
     No function calls (either explicit or implicit);
     Will have a stack frame.
   Non-Leaf:
     May call functions, has a stack frame.
>
You are forgetting about FP, GOT, TLS, and whatever resources are required
to do try-throw-catch stuff as demanded by the source language.
>
>
Yeah, possibly true.
>
In my case:
   There is no frame pointer, as BGBCC doesn't use one;
     All stack-frames are fixed size, VLA's and alloca use the heap;
   GOT, N/A in my ABI (stuff is GBR relative, but GBR is not a GPR);
   TLS, accessed via TBR.[...]
>
alloca using the heap? Strange to me...
>
>
Well, in this case:
The alloca calls are turned into calls which allocate the memory blob and add it to a linked list;
when the function returns, everything in the linked list is freed;
Then, it internally pulls this off via malloc and free.
>
Also the typical default stack size in this case is 128K, so trying to put big allocations on the stack is more liable to result in a stack overflow.
>
Bigger stack needs more memory, so is not ideal for NOMMU use. Luckily heap allocation is not too slow in this case.
>
>
Though, at the same time, ideally one limits use of language features where the code-generation degenerates into a mess of hidden runtime calls. These cases are not ideal for performance...
>
>
 Sometimes alloca is useful wrt offsetting the stack to avoid false sharing between stacks. Intel wrote a little paper that addresses this:
 https://www.intel.com/content/dam/www/public/us/en/documents/training/developing-multithreaded-applications.pdf
 Remember that one?
This seems mostly N/A in my case, as the cores use a weak memory model and there is no SMT.
Also thread creation tends to offset stacks by a random amount as well as a form of ASLR (IIRC, roughly 0..256 bytes in a multiple of 16).
As for reducing the cost of heap sharing between threads, there is another option here:
Give each thread its own local version of the heap (essentially, per-thread free-lists and similar). This can avoid the need to use mutex locking and similar, though may have a penalty if one tries to free memory objects that weren't allocated in the same thread.
In my case, the heap is split into multiple object sizes:
   Small:
     Under around 1K, allocated in terms of 16-byte cells;
     Allocated in chunks from the medium heap.
   Medium:
     Around 1K to 64K, allocated via subdividing a larger block (256K).
     Allocated via the large heap.
   Large:
     Bigger than 64K or so, allocated via pages (eg: "mmap()").
For the most common sizes (small and medium), the free-list and similar could be thread local; global locking then being used for large allocation, or for allocating new heap blocks (for the medium heap).
As can be noted, objects in the small object heap tend to be passed to a multiple of 16 bytes, and normally have a 16 byte object header (the pointer to an allocated object points just after this header).
Note that objects in the large heap may instead store this metadata externally.
Granted, yeah, mutex locking is fairly expensive with a weak memory model, and shared memory is generally undesirable as there is little in the way of memory coherence (absent explicit flushes, accessing memory belonging to a different thread may result in stale data).
A similar trick was used in the past for my BGBScript VMs, mostly because mutex locking is slow; and dynamic languages like this tend to involve a lot of rapid-fire small granularity allocations and frees (every object and array goes on the heap).
In BGBCC, it is merely just that large objects and arrays go on the heap (along with VLAs and similar). If one creates a lambda, this also goes on the heap.
If one wants to support proper lexical closures (N/A for both C++ and Java style lambdas, *), the local variables may also end up on the heap. And, if one wanted to support Scheme style call/cc (call-with-current-continuation), the entire ABI frame needs to go on the heap. However, a decision was made early on not to bother with call/cc support in the BGBCC ABI (an ABI capable of supporting call/cc would impose a severe performance penalty).
There was provision made for supporting exit-only continuations, which can effectively leverage the same mechanism as try/catch and throw (the continuation is effectively a self-throwing exception which will be caught at a predefined location).
*: By default, lambdas in BGBCC do not use lexical capture, and instead either capture by-value or capture-by-reference (like in C++ style lambdas, or GCC inner functions, using C++ style syntax) but differ in that the lambdas are callable as normal function pointers (and heap-allocated, rather than RAII value-types, though will be auto-freed when the originating function returns in the capture-by-reference case).
In my BGBScript2 language, capture-by-value had been the default, with lambdas having an indeterminate lifespan (they may continue to exist outside of the scope where the calling function had terminated; unlike GCC inner functions).
...

Date Sujet#  Auteur
3 Apr 24 * "Mini" tags to reduce the number of op codes81Stephen Fuld
3 Apr 24 +* Re: "Mini" tags to reduce the number of op codes8Anton Ertl
15 Apr 24 i+* Re: "Mini" tags to reduce the number of op codes6MitchAlsup1
15 Apr 24 ii`* Re: "Mini" tags to reduce the number of op codes5Terje Mathisen
15 Apr 24 ii +- Re: "Mini" tags to reduce the number of op codes1Terje Mathisen
15 Apr 24 ii `* Re: "Mini" tags to reduce the number of op codes3MitchAlsup1
16 Apr 24 ii  `* Re: "Mini" tags to reduce the number of op codes2Terje Mathisen
16 Apr 24 ii   `- Re: "Mini" tags to reduce the number of op codes1MitchAlsup1
17 Apr 24 i`- Re: "Mini" tags to reduce the number of op codes1Stephen Fuld
3 Apr 24 +* Re: "Mini" tags to reduce the number of op codes3Thomas Koenig
17 Apr 24 i`* Re: "Mini" tags to reduce the number of op codes2Stephen Fuld
17 Apr 24 i `- Re: "Mini" tags to reduce the number of op codes1BGB-Alt
3 Apr 24 +* Re: "Mini" tags to reduce the number of op codes12BGB-Alt
3 Apr 24 i+* Re: "Mini" tags to reduce the number of op codes9MitchAlsup1
4 Apr 24 ii+* Re: "Mini" tags to reduce the number of op codes7Terje Mathisen
4 Apr 24 iii+* Re: "Mini" tags to reduce the number of op codes3Michael S
4 Apr 24 iiii`* Re: "Mini" tags to reduce the number of op codes2Terje Mathisen
4 Apr 24 iiii `- Re: "Mini" tags to reduce the number of op codes1Michael S
5 Apr 24 iii`* Re: "Mini" tags to reduce the number of op codes3BGB-Alt
5 Apr 24 iii `* Re: "Mini" tags to reduce the number of op codes2MitchAlsup1
5 Apr 24 iii  `- Re: "Mini" tags to reduce the number of op codes1BGB
17 Apr 24 ii`- Re: "Mini" tags to reduce the number of op codes1Stephen Fuld
3 Apr 24 i`* Re: "Mini" tags to reduce the number of op codes2MitchAlsup1
4 Apr 24 i `- Re: "Mini" tags to reduce the number of op codes1BGB
5 Apr 24 +* Re: "Mini" tags to reduce the number of op codes54John Savard
5 Apr 24 i+- Re: "Mini" tags to reduce the number of op codes1BGB-Alt
5 Apr 24 i`* Re: "Mini" tags to reduce the number of op codes52MitchAlsup1
7 Apr 24 i `* Re: "Mini" tags to reduce the number of op codes51John Savard
7 Apr 24 i  +* Re: "Mini" tags to reduce the number of op codes6MitchAlsup1
8 Apr 24 i  i`* Re: "Mini" tags to reduce the number of op codes5John Savard
8 Apr 24 i  i +* Re: "Mini" tags to reduce the number of op codes2Thomas Koenig
17 Apr 24 i  i i`- Re: "Mini" tags to reduce the number of op codes1John Savard
8 Apr 24 i  i `* Re: "Mini" tags to reduce the number of op codes2MitchAlsup1
17 Apr 24 i  i  `- Re: "Mini" tags to reduce the number of op codes1John Savard
7 Apr 24 i  `* Re: "Mini" tags to reduce the number of op codes44Thomas Koenig
7 Apr 24 i   `* Re: "Mini" tags to reduce the number of op codes43MitchAlsup1
8 Apr 24 i    `* Re: "Mini" tags to reduce the number of op codes42Thomas Koenig
8 Apr 24 i     +- Re: "Mini" tags to reduce the number of op codes1Anton Ertl
9 Apr 24 i     `* Re: "Mini" tags to reduce the number of op codes40Thomas Koenig
9 Apr 24 i      +* Re: "Mini" tags to reduce the number of op codes38BGB
9 Apr 24 i      i`* Re: "Mini" tags to reduce the number of op codes37MitchAlsup1
10 Apr 24 i      i `* Re: "Mini" tags to reduce the number of op codes36BGB-Alt
10 Apr 24 i      i  +* Re: "Mini" tags to reduce the number of op codes31MitchAlsup1
10 Apr 24 i      i  i+* Re: "Mini" tags to reduce the number of op codes23BGB
10 Apr 24 i      i  ii`* Re: "Mini" tags to reduce the number of op codes22MitchAlsup1
10 Apr 24 i      i  ii +* Re: "Mini" tags to reduce the number of op codes3BGB-Alt
10 Apr 24 i      i  ii i`* Re: "Mini" tags to reduce the number of op codes2MitchAlsup1
11 Apr 24 i      i  ii i `- Re: "Mini" tags to reduce the number of op codes1BGB
10 Apr 24 i      i  ii +- Re: "Mini" tags to reduce the number of op codes1BGB-Alt
11 Apr 24 i      i  ii +* Re: "Mini" tags to reduce the number of op codes16MitchAlsup1
11 Apr 24 i      i  ii i`* Re: "Mini" tags to reduce the number of op codes15Michael S
11 Apr 24 i      i  ii i `* Re: "Mini" tags to reduce the number of op codes14BGB
11 Apr 24 i      i  ii i  `* Re: "Mini" tags to reduce the number of op codes13MitchAlsup1
11 Apr 24 i      i  ii i   +* Re: "Mini" tags to reduce the number of op codes9BGB-Alt
12 Apr 24 i      i  ii i   i`* Re: "Mini" tags to reduce the number of op codes8MitchAlsup1
12 Apr 24 i      i  ii i   i `* Re: "Mini" tags to reduce the number of op codes7BGB
12 Apr 24 i      i  ii i   i  `* Re: "Mini" tags to reduce the number of op codes6MitchAlsup1
12 Apr 24 i      i  ii i   i   `* Re: "Mini" tags to reduce the number of op codes5BGB
13 Apr 24 i      i  ii i   i    +- Re: "Mini" tags to reduce the number of op codes1MitchAlsup1
13 Apr 24 i      i  ii i   i    `* Re: "Mini" tags to reduce the number of op codes3MitchAlsup1
13 Apr 24 i      i  ii i   i     +- Re: "Mini" tags to reduce the number of op codes1BGB
15 Apr 24 i      i  ii i   i     `- Re: "Mini" tags to reduce the number of op codes1BGB-Alt
12 Apr 24 i      i  ii i   `* Re: "Mini" tags to reduce the number of op codes3Michael S
12 Apr 24 i      i  ii i    +- Re: "Mini" tags to reduce the number of op codes1Michael S
15 Apr 24 i      i  ii i    `- Re: "Mini" tags to reduce the number of op codes1MitchAlsup1
11 Apr 24 i      i  ii `- Re: "Mini" tags to reduce the number of op codes1Terje Mathisen
11 Apr 24 i      i  i`* Re: "Mini" tags to reduce the number of op codes7Paul A. Clayton
11 Apr 24 i      i  i +- Re: "Mini" tags to reduce the number of op codes1BGB
11 Apr 24 i      i  i +* Re: "Mini" tags to reduce the number of op codes2BGB-Alt
12 Apr 24 i      i  i i`- Re: "Mini" tags to reduce the number of op codes1MitchAlsup1
12 Apr 24 i      i  i +* Re: "Mini" tags to reduce the number of op codes2MitchAlsup1
21 Apr 24 i      i  i i`- Re: "Mini" tags to reduce the number of op codes1Paul A. Clayton
21 Apr 24 i      i  i `- Re: "Mini" tags to reduce the number of op codes1Paul A. Clayton
10 Apr 24 i      i  `* Re: "Mini" tags to reduce the number of op codes4Chris M. Thomasson
10 Apr 24 i      i   `* Re: "Mini" tags to reduce the number of op codes3BGB
10 Apr 24 i      i    `* Re: "Mini" tags to reduce the number of op codes2Chris M. Thomasson
10 Apr 24 i      i     `- Re: "Mini" tags to reduce the number of op codes1BGB-Alt
13 Apr 24 i      `- Re: "Mini" tags to reduce the number of op codes1Brian G. Lucas
15 Apr 24 +- Re: "Mini" tags to reduce the number of op codes1MitchAlsup1
17 Apr 24 `* Re: "Mini" tags to reduce the number of op codes2Stephen Fuld
17 Apr 24  `- Re: "Mini" tags to reduce the number of op codes1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal