Sujet : Re: OOS approach revisited
De : stephen (at) *nospam* vfxforth.com (Stephen Pelc)
Groupes : comp.lang.forthDate : 28. Jun 2025, 10:37:33
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <103od4s$pis9$1@dont-email.me>
References : 1 2 3 4
User-Agent : Usenapp for MacOS
On 27 Jun 2025 at 22:35:32 CEST, "minforth" <
minforth@gmx.net> wrote:
Am 27.06.2025 um 20:15 schrieb albert@spenarnc.xs4all.nl:
In article <bc63996456fe967e5c66d17cbbeb21c2@www.novabbs.com>,
LIT <zbigniew2011@gmail.com> wrote:
It really depends on how counted loops are implemented.
Most CPUs have operators for register-based count-down loops
that are blazingly fast.
If they can be used within Forth-based loop constructs
I would expect a greater speed increase than what you measured.
In that old fig-Forth it's rather short and simple:
sqHeader '(LOOP)'
XLOOP dw $ + 2
mov BX,1
XLOO1: add [BP],BX
mov AX,[BP]
sub AX,[BP+2]
xor AX,BX
js BRAN1
add BP,4
inc SI
inc SI
jmp NEXT
It doesn't look that bad. Can it be
done even shorter?
My optimiser looks into the combination of DO and LOOP,
transfers the returns stack into registers after inlining
everything. It is near vfx performance.
All experimental, but yes there is much to be gained.
Must be tricky to do UNLOOP in a register-based loop. ;-)
Here are the code generators for VFX x64 LOOP and UNLOOP.
All the complexity is in the DO and ?DO code.
: c_loop \ mrk> drbid -- ; compile code for LOOP ; SFP094
c_shuffle reset-opt \ SFP097
a[ INC r14 ]a use-a \ update index
a[ INC r15 ]a use-a \ update limit-index-$8000.0000
a[ JNO ]a <ares use-a \ resolve backward branch
c_unloop \ remove DO ... LOOP state
>RES \ resolve forward branch
;
: c_unloop \ -- ; compile code for UNLOOP
c_shuffle reset-opt
a[ pop r14 \ restore old index
pop r15 \ restore old index-limit-xorbit63
pop rax \ discarded
]a use-a
;
Stephen
-- Stephen Pelc, stephen@vfxforth.comWodni & Pelc GmbHVienna, AustriaTel: +44 (0)7803 903612, +34 649 662 974http://www.vfxforth.com/downloads/VfxCommunity/ free VFX Forth downloads