Sujet : LOOP (was: OOS approach revisited)
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.lang.forthDate : 28. Jun 2025, 11:23:51
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2025Jun28.122351@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6
User-Agent : xrn 10.11
minforth <
minforth@gmx.net> writes:
Most CPUs have operators for register-based count-down loops
that are blazingly fast.
Which "operators" do you have in mind, and what do you mean with
"blazingly fast".
Anyway, we have discussed this repeatedly, e.g., in
<
2022Feb13.231208@mips.complang.tuwien.ac.at> I wrote in reply to your
posting <
f4b89e0b-2ded-4b18-8dc1-bba6dcda47bbn@googlegroups.com>, and
cited earlier discussions in the topic.
|"
minf...@arcor.de" <
minforth@arcor.de> writes:
[...]
|>F.ex. match NEXT efficiently to x_86 processor LOOP instruction (counter in=
|> _CX register)
|>and you'll happily count down from 5 to 1.
|
|Yes, but why would one do this? As we have established in an earlier
|discussion (see below), the LOOP instruction is typically not faster
|than a sequence of simpler instructions:
|
|<
2018Jun6.184616@mips.complang.tuwien.ac.at>:
||
minforth@arcor.de writes:
||>FOR..NEXT matches easily with the x86 LOOP instruction and ECX as counter.
||>Should do speedy enough. ;-)
||
||Have you measured it? I have
||<
2017Mar14.183125@mips.complang.tuwien.ac.at>
||<
2017Mar15.141411@mips.complang.tuwien.ac.at> and compared the
||following loops:
||
||.L5: .L5:
|| subq $1, %rax loop .L5
|| jne .L5
||
||I found that for these loops Sandy Bridge, Haswell, and Skylake take
||~4 cycles per iteration using LOOP, and 1-2 cycles per iteration when
||using jne.
|
|<
2018Jun7.141731@mips.complang.tuwien.ac.at>:
||cycles for 1000 iterations
|| K10 Excavator Zen
||Phenom II Athlon X4 845 Ryzen 1600X
|| 3021 1314 1051 loop
|| 2020 1484 1051 sub; jne
|| 2026 1489 1053 add; cmp; jne
|
|There is no performance advantage on modern AMD and Intel CPUs for the
|instruction LOOP over a good implementation of the Forth word LOOP (as
|in the third example).
If they can be used within Forth-based loop constructs
I would expect a greater speed increase than what you measured.
You obviously ignore repeated refutations of your claims of superior
performance for LOOP-instruction-based counted loops. Maybe you
should implement and measure such a counted loop yourself and compare
it to the LOOP word on SwiftForth and VFX Forth.
- anton
-- M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.htmlcomp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html New standard: https://forth-standard.org/EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/EuroForth 2024 proceedings:
http://www.euroforth.org/ef24/papers/