Re: Stack vs stackless operation

Liste des GroupesRevenir à cl forth 
Sujet : Re: Stack vs stackless operation
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.lang.forth
Date : 25. Feb 2025, 08:26:58
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2025Feb25.082658@mips.complang.tuwien.ac.at>
References : 1 2 3
User-Agent : xrn 10.11
zbigniew2011@gmail.com (LIT) writes:
Probably because the case where the two operands
of a + are in memory, and the result is needed
in memory is not that frequent.
>
One example could be matrix multiplication.
It's rather trivial but cumbersome operation,
where usually a few transitional variables are
used to maintain clarity of the code.

Earlier you wrote about performance, now you switch to clarity of the
code.  What is the goal?

If we stick with performance, the fastest version in
<http://theforth.net/package/matmul/current-view/matmul.4th> on all
systems (which I measured and that does not use a primitive FAXPY) is
version 2, and that spends most of its time in:

: faxpy-nostride ( ra f_x f_y ucount -- )
    \ vy=ra*vx+vy
    dup >r 3 and 0 ?do
fdup over f@ f* dup f+! float+ swap float+ swap
    loop
    r> 2 rshift 0 ?do
fdup over f@ f* dup f+! float+ swap float+ swap
fdup over f@ f* dup f+! float+ swap float+ swap
fdup over f@ f* dup f+! float+ swap float+ swap
fdup over f@ f* dup f+! float+ swap float+ swap
    loop
    2drop fdrop ;

It's not the clearest code, and certainly the version without
unrolling is clearer (and may be almost as fast in the newer versions
of SwiftForth and VFX which make counted loops significantly faster):

: faxpy-nostride ( ra f_x f_y ucount -- )
    \ vy=ra*vx+vy
    0 ?do
fdup over f@ f* dup f+! float+ swap float+ swap
    loop
    2drop fdrop ;

Each iteration performs 2 FP loads and 1 FP store.  With
memory-to-memory variants of F* and F+ that would be 4 FP loads and 2
FP stores, and I don't think it would be any clearer.  And if you use
memory-to-memory variants of the address computation, things would
become even slower.  And I doubt that they would become clearer.

Some time later I worked on how SIMD could be integrated into Forth,
and used matrix multiplication as an example.  With the wordset I
propose this whole loop became

( v1 r addr ) v@ f*vs f+v ( v2 )

Only one memory access is visible here at all; there are some more in
the implementation of these words, however.  You can find the paper
about that at <http://www.euroforth.org/ef17/papers/ertl.pdf>.  A
further refinement of that work can be found at
<https://www.complang.tuwien.ac.at/papers/ertl18manlang.pdf>
(presented in a Java setting for the audience of the conference, but
the implementation was in a Forth setting, see
<https://github.com/AntonErtl/vectors>).  This work eliminates many of
the memory accesses that the earlier implementation performs,
demonstrating that the memory accesses are not fundamental in the
model.  In particular, Figure 11 shows code corresponding to

( v1 r1 addr1 r2 addr2 ) v@ f*vs v@ f+v v@ f*vs f+v ( v2 )

i.e., the code above unrolled by a factor of 2; it has 3 SIMD loads
and 1 SIMD store per SIMD-granule processed (the SIMD granule is 4
doubles for AVX).  Further unrolling results in even fewer loads and
stores per FLOP (FP multiplication and FP addition).

Probably "bigger" Forth compilers are indeed
already "too good" for the difference to be
(practically) noticeable — still maybe for
simpler Forths, I mean like the ones for DOS
or even for 8-bit machines it would make sense?

Forth was designed for small machines and very simple implementations.
We have words like "1+" that are beneficial in that setting.  We also
have "+!", which is the closest to what you have in mind.  But even in
those times nobody went for a word like "+> ( addr1 addr2 addr3 -- )",
because it is not useful often enough.

- anton
--
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

Date Sujet#  Auteur
24 Feb 25 * Stack vs stackless operation72LIT
24 Feb 25 +* Re: Stack vs stackless operation4minforth
24 Feb 25 i`* Re: Stack vs stackless operation3LIT
24 Feb 25 i `* Re: Stack vs stackless operation2minforth
24 Feb 25 i  `- Re: Stack vs stackless operation1LIT
24 Feb 25 +* Re: Stack vs stackless operation14Anton Ertl
24 Feb 25 i`* Re: Stack vs stackless operation13LIT
25 Feb 25 i `* Re: Stack vs stackless operation12Anton Ertl
25 Feb 25 i  `* Re: Stack vs stackless operation11LIT
25 Feb 25 i   `* Re: Stack vs stackless operation10Anton Ertl
25 Feb 25 i    `* Re: Stack vs stackless operation9LIT
25 Feb 25 i     +* Re: Stack vs stackless operation5minforth
25 Feb 25 i     i`* Re: Stack vs stackless operation4LIT
25 Feb 25 i     i `* Re: Stack vs stackless operation3minforth
25 Feb 25 i     i  `* Re: Stack vs stackless operation2LIT
25 Feb 25 i     i   `- Re: Stack vs stackless operation1Gerry Jackson
25 Feb 25 i     `* Re: Stack vs stackless operation3Anton Ertl
25 Feb 25 i      `* Re: Stack vs stackless operation2LIT
25 Feb 25 i       `- Re: Stack vs stackless operation1Anton Ertl
25 Feb 25 +* Re: Stack vs stackless operation9dxf
25 Feb 25 i`* Re: Stack vs stackless operation8LIT
25 Feb 25 i +* Re: Stack vs stackless operation6dxf
25 Feb 25 i i`* Re: Stack vs stackless operation5LIT
26 Feb 25 i i `* Re: Stack vs stackless operation4dxf
26 Feb 25 i i  `* Re: Stack vs stackless operation3LIT
26 Feb 25 i i   `* Re: Stack vs stackless operation2minforth
26 Feb 25 i i    `- Re: Stack vs stackless operation1LIT
25 Feb 25 i `- Re: Stack vs stackless operation1Hans Bezemer
25 Feb 25 +* Re: Stack vs stackless operation2LIT
25 Feb 25 i`- do...loop (was: Stack vs stackless operation)1Anton Ertl
25 Feb 25 +* Re: Stack vs stackless operation10LIT
26 Feb 25 i`* Re: Stack vs stackless operation9Hans Bezemer
26 Feb 25 i `* Re: Stack vs stackless operation8LIT
26 Feb 25 i  `* Re: Stack vs stackless operation7Hans Bezemer
26 Feb 25 i   `* Re: Stack vs stackless operation6LIT
27 Feb 25 i    `* Re: Stack vs stackless operation5LIT
27 Feb 25 i     `* Re: Stack vs stackless operation4LIT
2 Mar 25 i      `* Re: Stack vs stackless operation3LIT
5 Mar 25 i       `* Re: Stack vs stackless operation2Hans Bezemer
6 Mar 25 i        `- Re: Stack vs stackless operation1LIT
25 Feb 25 `* Re: Stack vs stackless operation32LIT
25 Feb 25  +* Re: Stack vs stackless operation10Anton Ertl
25 Feb 25  i+- Re: Stack vs stackless operation1LIT
26 Feb 25  i`* Re: Stack vs stackless operation8LIT
26 Feb 25  i +- Re: Stack vs stackless operation1LIT
26 Feb 25  i `* Re: Stack vs stackless operation6John Ames
26 Feb 25  i  `* Re: Stack vs stackless operation5LIT
27 Feb 25  i   `* Re: Stack vs stackless operation4dxf
27 Feb 25  i    `* Re: Stack vs stackless operation3LIT
27 Feb 25  i     `* Re: Stack vs stackless operation2Hans Bezemer
27 Feb 25  i      `- Re: Stack vs stackless operation1LIT
26 Feb 25  +* Re: Stack vs stackless operation2Waldek Hebisch
26 Feb 25  i`- Re: Stack vs stackless operation1Anton Ertl
26 Feb 25  `* Re: Stack vs stackless operation19mhx
26 Feb 25   +- Re: Stack vs stackless operation1minforth
26 Feb 25   +* Re: Stack vs stackless operation16Anton Ertl
26 Feb 25   i`* Re: Stack vs stackless operation15Anton Ertl
26 Feb 25   i +* Re: Stack vs stackless operation7Paul Rubin
26 Feb 25   i i+- Re: Stack vs stackless operation1minforth
27 Feb 25   i i`* Re: Stack vs stackless operation5Anton Ertl
27 Feb 25   i i +* Re: Stack vs stackless operation2Paul Rubin
27 Feb 25   i i i`- Re: Stack vs stackless operation1Anton Ertl
27 Feb 25   i i `* Re: Stack vs stackless operation2Gerry Jackson
27 Feb 25   i i  `- Re: Stack vs stackless operation1Anton Ertl
28 Feb 25   i `* Re: Stack vs stackless operation7Anton Ertl
28 Feb 25   i  `* Re: Stack vs stackless operation6Paul Rubin
1 Mar 25   i   `* Re: Stack vs stackless operation5Anton Ertl
1 Mar 25   i    +- Stack caching (: Stack vs stackless operation)1Anton Ertl
1 Mar 25   i    `* Re: Stack vs stackless operation3Anton Ertl
1 Mar 25   i     `* Re: Stack vs stackless operation2Anton Ertl
1 Mar 25   i      `- Re: Stack vs stackless operation1mhx
27 Feb 25   `- Re: Stack vs stackless operation1mhx

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal