Re: On my AMD FX-8370 I don't benefit from a compact code area.

Liste des Groupes 
Sujet : Re: On my AMD FX-8370 I don't benefit from a compact code area.
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.lang.forth
Date : 27. Feb 2025, 19:18:46
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2025Feb27.191846@mips.complang.tuwien.ac.at>
References : 1
User-Agent : xrn 10.11
albert@spenarnc.xs4all.nl writes:
I test lina64 on my AMD FX-8370 8 core 4 Ghz.
>
The genuine Byte benchmark sieve takes 1.5 ms on my unmodified lina.
That is a indirect threaded Forth with no optimisation and all the
machine code scattered throughout the dictionary.
>
I build a version where there is actually a code segment and all code is
collected there. There was no significant difference in speed.
>
All the code of the Forth fits comfortable in the L1 cache.
Is this to be expected?
An  L1 cache hit is an L1 cache hit?

Not at all.  Since the Pentium and the K5 (I think) there is an
instruction cache and a data cache (and then uop caches, which can be
seen as a kind of instruction cache).  However, apart from the early
ones (Pentium, K6, and probably K5), the same grains (with typically
64-byte granularity these days) can reside in both the I-cache and the
D-cache, as long as that grain is not written to.

So if your complete Forth system including the primitives and the
sieve program fits into the D-cache and fits into the I-cache, and you
have no writes close to code, you will indeed only see compulsory
misses.

I have posted here about the performance pitfalls of keeping code
close to data since 1995, and Forth system implementors typically have
taken measures only when I presented benchmark results where there
system looks bad.  But they usually only did the minimum necessary for
that particular benchmark, so over the years the issue has come up
again and again.

One interesting aspect is that small benchmarks like the sieve are
often not affected, but larger application benchmarks are.  E.g., in
my recent work [ertl24] all the small benchmarks are unaffected by the
problem, whereas several of the larger benchmarks were affected in
SwiftForth-4.0.0-RC87 and saw significant speedups from a fix in RC89.

So I applaud that you have done the right thing and completely
separated code from data.  You may not see a benefit on Sieve, but
there may be a difference in a different program (and you may not even
notice until you measure both variants).

@InProceedings{ertl24,
  author =       {M. Anton Ertl},
  title =        {How to Implement Words (Efficiently)},
  crossref =     {euroforth24},
  pages =        {43--52},
  url =          {http://www.euroforth.org/ef24/papers/ertl.pdf},
  url-slides =   {http://www.euroforth.org/ef24/papers/ertl-slides.pdf},
  video =        {https://www.youtube.com/watch?v=bAq4760h5ZQ},
  OPTnote =      {not refereed},
  abstract =     {The implementation of Forth words has to satisfy the
                  following requirements: 1) A word must be
                  represented by a single cell (for
                  \code{execute}). 2) A word may represent a
                  combination of code and data (for, e.g.,
                  \code{does>}).  In addition, on some hardware,
                  keeping executed native code and (written) data
                  close together results in slowness and therefore
                  should be avoided; moreover, failing to pair up
                  calls with returns results in (slow) branch
                  mispredictions.  The present work describes how
                  various Forth systems over the decades have
                  satisfied the requirements, and how many systems run
                  into performance pitfalls in various situations.
                  This paper also discusses how to avoid this
                  slowness, including in native-code systems.}
}
@Proceedings{euroforth24,
  title = {40th EuroForth Conference},
  booktitle = {40th EuroForth Conference},
  year = {2024},
  key = {EuroForth'24},
  url =          {http://www.euroforth.org/ef24/papers/proceedings.pdf}
}

- anton
--
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

Date Sujet#  Auteur
27 Feb 25 * On my AMD FX-8370 I don't benefit from a compact code area.3albert
27 Feb 25 `* Re: On my AMD FX-8370 I don't benefit from a compact code area.2Anton Ertl
28 Feb 25  `- Re: On my AMD FX-8370 I don't benefit from a compact code area.1albert

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal