Stack caching (: Stack vs stackless operation)

Liste des GroupesRevenir à cl forth 
Sujet : Stack caching (: Stack vs stackless operation)
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.lang.forth
Date : 01. Mar 2025, 09:18:06
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2025Mar1.091806@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6 7 8
User-Agent : xrn 10.11
anton@mips.complang.tuwien.ac.at (Anton Ertl) writes:
see-code exchange        see-code exchange4     see-code exchange2
over    1->2             dup    1->2            dup >r     1->1
 mov     r15,$08[r12]     mov     r15,r8       >r    1->1
@    2->2                @    2->2                mov     -$08[r13],r8
 mov     r15,[r15]        mov     r15,[r15]      sub     r13,$08
swap    2->3             rot    2->3            @    1->1
 add     r12,$08          mov     r9,$08[r12]    mov     r8,[r8]
 mov     r9,r8            add     r12,$08      over    1->2
 mov     r8,[r12]       !@    3->2               mov     r15,$08[r12]
!@    3->2                 mov     rax,r15      @    2->2
 mov     rax,r15          mov     r15,[r9]       mov     r15,[r15]
 mov     r15,[r9]         mov     [r9],rax     r>    2->3
 mov     [r9],rax       swap    2->3             mov     r9,$00[r13]
swap    2->3               add     r12,$08        add     r13,$08
 add     r12,$08          mov     r9,r8        !    3->1
 mov     r9,r8            mov     r8,[r12]       mov     [r9],r15
 mov     r8,[r12]       !    3->1              swap    1->2
!    3->1                  mov     [r9],r15       mov     r15,$08[r12]
 mov     [r9],r15       ;s    1->1               add     r12,$08
;s    1->1                 mov     rbx,$00[r13] !    2->0
 mov     rbx,$00[r13]     add     r13,$08        mov     [r15],r8
 add     r13,$08          mov     rax,[rbx]    ;s    0->1
 mov     rax,[rbx]        jmp     eax            mov     r8,$08[r12]
 jmp     eax                                     add     r12,$08
                                                 mov     rbx,$00[r13]
                                                 add     r13,$08
                                                 mov     rax,[rbx]
                                                 jmp     eax

The difference between exchange and exchange4 shows how stack caching
can have a hard-to-predict effect.  Gforth searches for the shortest
path through the available stack-cache states, where shortness is
defined by the native-code length.  E.g., it starts with state 1, and
from there it can use any of the dup variants starting in state 1, or
first transition to another state and use a dup variant starting from
there.

For SWAP and ROT gforth-fast has the following variants:

primitive in-out #     code bytes
swap       1-1  132  len= 4+ 13+ 3
swap       2-2   37  len= 4+  9+ 3
swap       3-3    4  len= 4+  9+ 3
swap       0-2    8  len= 4+ 14+ 3
swap       1-2   82  len= 4+  9+ 3
swap       2-1   74  len= 4+  8+ 3
swap       2-3   30  len= 4+ 11+ 3
swap       3-2    3  len= 4+ 11+ 3
swap       2-0   20  len= 4+ 13+ 3
rot        1-1   46  len= 4+ 23+ 3
rot        3-3    6  len= 4+ 12+ 3
rot        3-1   24  len= 4+ 13+ 3
rot        2-3   15  len= 4+  9+ 3
rot        1-3   17  len= 4+ 17+ 3
rot        0-3    1  len= 4+ 19+ 3

You get these data (in a rawer form) with

gforth-fast --print-prim -e bye |& grep ^swap
gforth-fast --print-prim -e bye |& grep ^rot

The in column is the stack-cache state on entering the word, the out
column is the stack-cache state on leaving the word.  The # column
shows how many times this variant of the primitive is used (static
counts).  The code length colum shows three parts, the middle of which
is the part that's copied to dynamic superinstructions like the ones
shown above, and this length is what is used in the search for the
shortest path.

In EXCHANGE4, the shortest variant of ROT is used: ROT 2->3; and the
primitives of the other variants are selected to also result in the
shortest overall code.

In EXCHANGE and EXCHANGE4, SWAP 2->3 is not the shortest variant of
SWAP, not even the shortest variant starting from state 2, but ending
in state 3 allows to use cheap variants of further primitives such as
!@ and !, resulting in the overall shortest code for this sequence.
In EXCHANGE2, we see the selection of a shorter version of SWAP, but
one of the costs is that ;s becomes longer (but in this case the
overall savings from using a shorter version of SWAP and shorter
versions of earlier instructions make up for that).

Why am I looking at this?  For stack-shuffling primitives like SWAP
and ROT, it's not obvious which variant is how long and which variant
should be selected.

These stack-shiffling words therefore are also good candidates for
performing stack-cache state transitions that would otherwise require
inserting extra transition code:

E.g., EXCHANGE and its variants consume two stack items, but need to
start in stack-cache state 1 and end in stack-cache state 1 (gforth is
currently not smart enough to deal with other states at basic-block
boundaries), so not everything can be done in the stack cache; the
stack pointer needs to be increased by two cells, and there need to be
accesses to the memory part of the stack for two stack items.

In EXCHANGE, the adjustment by one cell and memory access for one
stack item is done in the first SWAP 2->3, and another one in the
second one.  In EXCHANGE4, ROT 2->3 and SWAP 2->3 perform these tasks.
In EXCHANGE2, the SWAP 1->3 does it for one cell, and the stack-cache
state transition 0->1 in the first two instructions of ;s do it for
the other cell (gforth-fast actually does not have a built-in variant
;S 0->1 and the code shown as ;S 0->1 by SEE-CODE is actually composed
of a transition 0->1 and the ;S 1->1 variant).

I wanted to know how often which variant of these stack-shuffling
primitives is used, and how this relates to their length.  One
interesting result is that ROT 1->3 is used relatively frequently
despite having relatively long code.  Apparently the code that comes
before these 17 instances of ROT benefits a lot from being in
stack-cache state 1 and this amortizes the longer code for the ROT
1->3 compared to ROT 2->3.

Another interesting result is the low usage of SWAP 3->2 compared to
SWAP 2->3.  This may say something about how SWAP is used in Forth
programs.  Or it may be an artifact of tie-breaking: If two paths have
the same length, one is chosen rather arbitrarily, but consistently,
and this may make one variant appear more useful than merited by the
benefit that the existence of the variant has on code length.

For those interested, here's the code for the various variants shown
above:

r12: stack pointer
 r8: stack cache register a (tos in state 1, 2nd in state 2, 3rd in state 3)
r15: stack-cache register b                 (tos in state 2, 2nd in state 3)
 r9: stack-cache register c                                 (tos in state 3)

swap 1-1
559E7F769425:   mov     rax,$08[r12]
559E7F76942A:   mov     $08[r12],r8
559E7F76942F:   mov     r8,rax

swap 2-2
559E7F76E5B1:   mov     rax,r8
559E7F76E5B4:   mov     r8,r15
559E7F76E5B7:   mov     r15,rax

swap 3-3
559E7F76E5C3:   mov     rax,r15
559E7F76E5C6:   mov     r15,r9
559E7F76E5C9:   mov     r9,rax

swap 0-2
559E7F76E5D5:   mov     r15,$10[r12]
559E7F76E5DA:   mov     r8,$08[r12]
559E7F76E5DF:   add     r12,$10

swap 1-2
559E7F76E5EC:   mov     r15,$08[r12]
559E7F76E5F1:   add     r12,$08

swap 2-1
559E7F76E5FE:   mov     [r12],r15
559E7F76E602:   sub     r12,$08

swap 2-3
559E7F76E60F:   add     r12,$08
559E7F76E613:   mov     r9,r8
559E7F76E616:   mov     r8,[r12]

swap 3-2
559E7F76E623:   mov     [r12],r8
559E7F76E627:   mov     r8,r9
559E7F76E62A:   sub     r12,$08

swap 2-0
559E7F76E637:   mov     [r12],r15
559E7F76E63B:   sub     r12,$10
559E7F76E63F:   mov     $08[r12],r8

rot  1-1
559E7F76944C:   mov     rdx,$08[r12]
559E7F769451:   mov     rax,$10[r12]
559E7F769456:   mov     $08[r12],r8
559E7F76945B:   mov     $10[r12],rdx
559E7F769460:   mov     r8,rax

rot  3-3
559E7F76EDC0:   mov     rax,r8
559E7F76EDC3:   mov     r8,r15
559E7F76EDC6:   mov     r15,r9
559E7F76EDC9:   mov     r9,rax

rot  3-1
559E7F76EDD5:   mov     [r12],r15
559E7F76EDD9:   sub     r12,$10
559E7F76EDDD:   mov     $08[r12],r9

rot  2-3
559E7F76EDEB:   mov     r9,$08[r12]
559E7F76EDF0:   add     r12,$08

rot  1-3
559E7F76EDFD:   mov     r9,$10[r12]
559E7F76EE02:   mov     r15,r8
559E7F76EE05:   add     r12,$10
559E7F76EE09:   mov     r8,-$08[r12]

rot  0-3
559E7F76EE17:   mov     r9,$18[r12]
559E7F76EE1C:   mov     r8,$10[r12]
559E7F76EE21:   add     r12,$18
559E7F76EE25:   mov     r15,-$10[r12]

- anton
--
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

Date Sujet#  Auteur
24 Feb 25 * Stack vs stackless operation72LIT
24 Feb 25 +* Re: Stack vs stackless operation4minforth
24 Feb 25 i`* Re: Stack vs stackless operation3LIT
24 Feb 25 i `* Re: Stack vs stackless operation2minforth
24 Feb 25 i  `- Re: Stack vs stackless operation1LIT
24 Feb 25 +* Re: Stack vs stackless operation14Anton Ertl
24 Feb 25 i`* Re: Stack vs stackless operation13LIT
25 Feb 25 i `* Re: Stack vs stackless operation12Anton Ertl
25 Feb 25 i  `* Re: Stack vs stackless operation11LIT
25 Feb 25 i   `* Re: Stack vs stackless operation10Anton Ertl
25 Feb 25 i    `* Re: Stack vs stackless operation9LIT
25 Feb 25 i     +* Re: Stack vs stackless operation5minforth
25 Feb 25 i     i`* Re: Stack vs stackless operation4LIT
25 Feb 25 i     i `* Re: Stack vs stackless operation3minforth
25 Feb 25 i     i  `* Re: Stack vs stackless operation2LIT
25 Feb 25 i     i   `- Re: Stack vs stackless operation1Gerry Jackson
25 Feb 25 i     `* Re: Stack vs stackless operation3Anton Ertl
25 Feb 25 i      `* Re: Stack vs stackless operation2LIT
25 Feb 25 i       `- Re: Stack vs stackless operation1Anton Ertl
25 Feb 25 +* Re: Stack vs stackless operation9dxf
25 Feb 25 i`* Re: Stack vs stackless operation8LIT
25 Feb 25 i +* Re: Stack vs stackless operation6dxf
25 Feb 25 i i`* Re: Stack vs stackless operation5LIT
26 Feb 25 i i `* Re: Stack vs stackless operation4dxf
26 Feb 25 i i  `* Re: Stack vs stackless operation3LIT
26 Feb 25 i i   `* Re: Stack vs stackless operation2minforth
26 Feb 25 i i    `- Re: Stack vs stackless operation1LIT
25 Feb 25 i `- Re: Stack vs stackless operation1Hans Bezemer
25 Feb 25 +* Re: Stack vs stackless operation2LIT
25 Feb 25 i`- do...loop (was: Stack vs stackless operation)1Anton Ertl
25 Feb 25 +* Re: Stack vs stackless operation10LIT
26 Feb 25 i`* Re: Stack vs stackless operation9Hans Bezemer
26 Feb 25 i `* Re: Stack vs stackless operation8LIT
26 Feb 25 i  `* Re: Stack vs stackless operation7Hans Bezemer
26 Feb 25 i   `* Re: Stack vs stackless operation6LIT
27 Feb 25 i    `* Re: Stack vs stackless operation5LIT
27 Feb 25 i     `* Re: Stack vs stackless operation4LIT
2 Mar 25 i      `* Re: Stack vs stackless operation3LIT
5 Mar 25 i       `* Re: Stack vs stackless operation2Hans Bezemer
6 Mar 25 i        `- Re: Stack vs stackless operation1LIT
25 Feb 25 `* Re: Stack vs stackless operation32LIT
25 Feb 25  +* Re: Stack vs stackless operation10Anton Ertl
25 Feb 25  i+- Re: Stack vs stackless operation1LIT
26 Feb 25  i`* Re: Stack vs stackless operation8LIT
26 Feb 25  i +- Re: Stack vs stackless operation1LIT
26 Feb 25  i `* Re: Stack vs stackless operation6John Ames
26 Feb 25  i  `* Re: Stack vs stackless operation5LIT
27 Feb 25  i   `* Re: Stack vs stackless operation4dxf
27 Feb 25  i    `* Re: Stack vs stackless operation3LIT
27 Feb 25  i     `* Re: Stack vs stackless operation2Hans Bezemer
27 Feb 25  i      `- Re: Stack vs stackless operation1LIT
26 Feb 25  +* Re: Stack vs stackless operation2Waldek Hebisch
26 Feb 25  i`- Re: Stack vs stackless operation1Anton Ertl
26 Feb 25  `* Re: Stack vs stackless operation19mhx
26 Feb 25   +- Re: Stack vs stackless operation1minforth
26 Feb 25   +* Re: Stack vs stackless operation16Anton Ertl
26 Feb 25   i`* Re: Stack vs stackless operation15Anton Ertl
26 Feb 25   i +* Re: Stack vs stackless operation7Paul Rubin
26 Feb 25   i i+- Re: Stack vs stackless operation1minforth
27 Feb 25   i i`* Re: Stack vs stackless operation5Anton Ertl
27 Feb 25   i i +* Re: Stack vs stackless operation2Paul Rubin
27 Feb 25   i i i`- Re: Stack vs stackless operation1Anton Ertl
27 Feb 25   i i `* Re: Stack vs stackless operation2Gerry Jackson
27 Feb 25   i i  `- Re: Stack vs stackless operation1Anton Ertl
28 Feb 25   i `* Re: Stack vs stackless operation7Anton Ertl
28 Feb 25   i  `* Re: Stack vs stackless operation6Paul Rubin
1 Mar 25   i   `* Re: Stack vs stackless operation5Anton Ertl
1 Mar 25   i    +- Stack caching (: Stack vs stackless operation)1Anton Ertl
1 Mar 25   i    `* Re: Stack vs stackless operation3Anton Ertl
1 Mar 25   i     `* Re: Stack vs stackless operation2Anton Ertl
1 Mar 25   i      `- Re: Stack vs stackless operation1mhx
27 Feb 25   `- Re: Stack vs stackless operation1mhx

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal