Newsportal USENET - Re: Stack vs stackless operation

antispam@fricas.org (Waldek Hebisch) writes:

Potential alternative is
a pair of operations, say PUSH and POP, and Forth compiler
that replaces pair like V1 @ by PUSH(V1). Note that here
address of V1 is intended to be part to PUSH (so it will
take as much space as separate V1 and @, but is only a
single primitive).

In Gforth variables are compiled as "lit <addr>", and Gforth has a
primitive LIT@, and a generalized constant-folding optimization that
replaces "lit <addr> @" with "LIT@ <addr>". In the Gforth image there
are 490 uses of lit@ (out of 33611 uses of primitives) and 76
occurences of "lit @" (in parts that are compiled before the
generalized constant-folding is active). There are also 293
occurences of "lit !".

However, given the minimal difference between the code produced for
"LIT@" and "LIT @", LIT@ is no longer beneficial. E.g.:

variable v ok
: foo1 v @ + ; ok
: foo2 v [ basic-block-end ] @ + ; ok
see-code foo1
$7F27184A08B0 lit@ 1->2
$7F27184A08B8 v
7F271806B580:   mov    rax,$08[rbx]
7F271806B584:   mov    r15,[rax]
$7F27184A08C0 + 2->1
7F271806B587:   add    r13,r15
$7F27184A08C8 ;s 1->1
...
see-code foo2
$7F27184A08F8 lit 1->2
$7F27184A0900 v
7F271806B596:   mov    r15,$08[rbx]
$7F27184A0908 @ 2->2
7F271806B59A:   mov    r15,[r15]
$7F27184A0910 + 2->1
7F271806B59D:   add    r13,r15
$7F27184A0918 ;s 1->1
...

More generally, a simple "optimizer" that replaces short
sequences of Forth primitives by different, shorter sequence
of primitives is likely to give similar gain. However,
chance of match decreases with length of the sequence.

Gforth has that as static superinstructions. You can see the
sequences in
<http://git.savannah.gnu.org/cgit/gforth.git/tree/peeprules.vmg>, in
the lines before there is any occurence of prim-states or something
similar. As you can see, many of the formerly-used sequences are now
commented out, because static superinstructions do not play well with
a) static stack caching (currently static superinstructions only work
for the default stack cache state) and b) IP-update optimization (if
one of the primitives in the sequence has an immediate argument (e.g.,
LIT), you would need additional variants for various IP offsets, or
update the IP before the sequence).

The remaining static superinstructions

* have to do with stacks where we do not have stack caching (FP stack,
locals stack, return stack),

* are combinations of comparison primitives and ?BRANCH (this avoids
the need to reify the result of the comparison in a general-purpose
register), or

* are sequences of typical memory-access words (not because they occur
so often, but because it's better to have a small number of words
that can be combined, and a number of combinations in the optimizer
than to have a combinatorial explosion of words).

Above you bet on relatively long seqences (and on programmer
writing alternative seqence). Shorter seqences have more
chance of matching, so you need smaller number of them
for similar gain.

That's certainly our experience. Long sequences with high dynamic
counts often come out of the inner loop of a single benchmark, and do
not help other programs at all. We later preferred to go with static
usage counts (i.e., the sequence occurs several times in the code),
and this naturally leads to short sequences.

One can
do better than using machine stack, namely keeping thing in
registers, but that means generating machine code and doing
optimization.

Gforth does stack caching at the level of primitives, by having
several variants of the primitives for different start and end states
of the primitives, and using a shortest-path search for finding out
which combination of these variants to use. However, for multiple
stacks this leads to a large number of states, and the shortest-path
algorithm becomes too expensive. For now we only stack-cache the data
stack.

For extending this to multiple stacks, I see several alternatives:

* Use a greedy algorithm instead of an optimal shortest-path
algorithm. The difference is probably non-existent in most cases.

* Manage the stack cache using register allocation techniques instead
of representing it as an abstract state. This would often produce
similar results as the greedy technique, but it can also handle
stack manipulation words cheaply without having an explosion of
stack states and the related complexity in the generator that
generates the states and the tables for the state-handling.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

Date	Sujet	#	Auteur
24 Feb 25	Stack vs stackless operation	72	LIT
24 Feb 25	Re: Stack vs stackless operation	4	minforth
24 Feb 25	Re: Stack vs stackless operation	3	LIT
24 Feb 25	Re: Stack vs stackless operation	2	minforth
24 Feb 25	Re: Stack vs stackless operation	1	LIT
24 Feb 25	Re: Stack vs stackless operation	14	Anton Ertl
24 Feb 25	Re: Stack vs stackless operation	13	LIT
25 Feb 25	Re: Stack vs stackless operation	12	Anton Ertl
25 Feb 25	Re: Stack vs stackless operation	11	LIT
25 Feb 25	Re: Stack vs stackless operation	10	Anton Ertl
25 Feb 25	Re: Stack vs stackless operation	9	LIT
25 Feb 25	Re: Stack vs stackless operation	5	minforth
25 Feb 25	Re: Stack vs stackless operation	4	LIT
25 Feb 25	Re: Stack vs stackless operation	3	minforth
25 Feb 25	Re: Stack vs stackless operation	2	LIT
25 Feb 25	Re: Stack vs stackless operation	1	Gerry Jackson
25 Feb 25	Re: Stack vs stackless operation	3	Anton Ertl
25 Feb 25	Re: Stack vs stackless operation	2	LIT
25 Feb 25	Re: Stack vs stackless operation	1	Anton Ertl
25 Feb 25	Re: Stack vs stackless operation	9	dxf
25 Feb 25	Re: Stack vs stackless operation	8	LIT
25 Feb 25	Re: Stack vs stackless operation	6	dxf
25 Feb 25	Re: Stack vs stackless operation	5	LIT
26 Feb 25	Re: Stack vs stackless operation	4	dxf
26 Feb 25	Re: Stack vs stackless operation	3	LIT
26 Feb 25	Re: Stack vs stackless operation	2	minforth
26 Feb 25	Re: Stack vs stackless operation	1	LIT
25 Feb 25	Re: Stack vs stackless operation	1	Hans Bezemer
25 Feb 25	Re: Stack vs stackless operation	2	LIT
25 Feb 25	do...loop (was: Stack vs stackless operation)	1	Anton Ertl
25 Feb 25	Re: Stack vs stackless operation	10	LIT
26 Feb 25	Re: Stack vs stackless operation	9	Hans Bezemer
26 Feb 25	Re: Stack vs stackless operation	8	LIT
26 Feb 25	Re: Stack vs stackless operation	7	Hans Bezemer
26 Feb 25	Re: Stack vs stackless operation	6	LIT
27 Feb 25	Re: Stack vs stackless operation	5	LIT
27 Feb 25	Re: Stack vs stackless operation	4	LIT
2 Mar 25	Re: Stack vs stackless operation	3	LIT
5 Mar 25	Re: Stack vs stackless operation	2	Hans Bezemer
6 Mar 25	Re: Stack vs stackless operation	1	LIT
25 Feb 25	Re: Stack vs stackless operation	32	LIT
25 Feb 25	Re: Stack vs stackless operation	10	Anton Ertl
25 Feb 25	Re: Stack vs stackless operation	1	LIT
26 Feb 25	Re: Stack vs stackless operation	8	LIT
26 Feb 25	Re: Stack vs stackless operation	1	LIT
26 Feb 25	Re: Stack vs stackless operation	6	John Ames
26 Feb 25	Re: Stack vs stackless operation	5	LIT
27 Feb 25	Re: Stack vs stackless operation	4	dxf
27 Feb 25	Re: Stack vs stackless operation	3	LIT
27 Feb 25	Re: Stack vs stackless operation	2	Hans Bezemer
27 Feb 25	Re: Stack vs stackless operation	1	LIT
26 Feb 25	Re: Stack vs stackless operation	2	Waldek Hebisch
26 Feb 25	Re: Stack vs stackless operation	1	Anton Ertl
26 Feb 25	Re: Stack vs stackless operation	19	mhx
26 Feb 25	Re: Stack vs stackless operation	1	minforth
26 Feb 25	Re: Stack vs stackless operation	16	Anton Ertl
26 Feb 25	Re: Stack vs stackless operation	15	Anton Ertl
26 Feb 25	Re: Stack vs stackless operation	7	Paul Rubin
26 Feb 25	Re: Stack vs stackless operation	1	minforth
27 Feb 25	Re: Stack vs stackless operation	5	Anton Ertl
27 Feb 25	Re: Stack vs stackless operation	2	Paul Rubin
27 Feb 25	Re: Stack vs stackless operation	1	Anton Ertl
27 Feb 25	Re: Stack vs stackless operation	2	Gerry Jackson
27 Feb 25	Re: Stack vs stackless operation	1	Anton Ertl
28 Feb 25	Re: Stack vs stackless operation	7	Anton Ertl
28 Feb 25	Re: Stack vs stackless operation	6	Paul Rubin
1 Mar 25	Re: Stack vs stackless operation	5	Anton Ertl
1 Mar 25	Stack caching (: Stack vs stackless operation)	1	Anton Ertl
1 Mar 25	Re: Stack vs stackless operation	3	Anton Ertl
1 Mar 25	Re: Stack vs stackless operation	2	Anton Ertl
1 Mar 25	Re: Stack vs stackless operation	1	mhx
27 Feb 25	Re: Stack vs stackless operation	1	mhx