Re: Efficient implementation of Finite State Machines (FSM)

Liste des GroupesRevenir à cl forth 
Sujet : Re: Efficient implementation of Finite State Machines (FSM)
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.lang.forth
Date : 09. Mar 2025, 17:29:23
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2025Mar9.172923@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6 7 8 9 10
User-Agent : xrn 10.11
Gerry Jackson <do-not-use@swldwa.uk> writes:
Use of a wordlist, whose wid is held in an immediate constant, enables
easy linkage between states at compile time, eg a typical state action
in outline is:
>
:noname <action>
   if state-x [ >order ] next-state [ previous ]
   else state-y [ >order ] next-state [ previous ]
; this-state >order to next-state previous

It means that Gforth will have to improve its SEE in order to point
out the differences between the different NEXT-STATEs.  Currently I get:

/2 >order  ok
next-state xt-see
noname :
  dup @ #1 and
  IF     next-state
  ELSE   next-state
  THEN ; ok

Disadvantages are:
- the Forth code doesn't give much idea of the overall operation of the
FSM (probably true for FSMs in general)

That's definitely the case here.  IIRC for Michael Gassanenko it was a
demonstration of filtering and backtracking, but it's unclear to me
how that transfers to FSMs.

Anyway, let's look at the core:

: run-fsm  ( ad xt -- )  begin dup while execute repeat 2drop  ;

This means that every state has to return to this loop to dispatch the
next one.  Now Gforth (development) has EXECUTE-EXIT, which allows
tail-calling the next state for better efficiency.

I have worked out an example:
https://www.complang.tuwien.ac.at/forth/programs/fsm-ae.4th

Let's first look at the example: The example recognizes and prints
numbers in a text and ignores everything else.  It terminates when it
sees '$'.  It has two states, one for being inside a number and one
for outside:

state outside-num
state inside-num

(Note that this is not the standar Forth word STATE).

Then we define transitions:

: out->out ( c-addr -- c-addr1 )
    count outside-num transition ;

' out->out outside-num all-transitions

The out->out transition is the simplest one: It fetches the next char
(with COUNT) and switches to OUTSIDE-NUM.  TRANSITION already starts
the dispatch for that state and the next char; this (and maybe also
COUNT) could be put in the general FSM interpreter (START-DFA), but by
having TRANSITION in the individual transition actions (e.g.,
OUT->OUT), the implementation is more flexible, as we will see.

At first OUT->OUT is put in transitions from OUTSIDE-NUM for all
characters using ALL-TANSITIONS; later the transitions of various
characters are overwritten:

' out->in '9' 1+ '0' outside-num some-transitions
' out->stop '$' outside-num one-transition

Note that the stack effect comment for out->out is from the start of
the word to the start of the next state-transition word; the actual
stack effect depends on the implementation of transition.

For more state transitions and the corresponding transition words see
the source code.

Example usage:

s" 123 abc 456 df$" drop outside-num start-dfa \ prints "123 456"

Now for the implementations: States are just arrays of xts, indexed by
the character, and the xt is that of the transition from the state
with that character.

The implementation without EXECUTE-EXIT looks as follows:

    : transition ( c addr -- xt )
        \ addr specifies the next state
        ]] swap th @ [[ ; immediate
   
    : stop ( c-addr -- 0 )
        drop 0 ;

    : start-dfa ( c-addr addr -- )
        swap count rot transition
        begin ( ... xt )
            execute dup
        0= until
        drop ;

TRANSITION could be a plain colon definition here, but it is a macro
in order to make it more competetive in Gforth with the EXECUTE-EXIT
variant.  Here the termination is performed by replacing the next
c-addr with 0 and testing for 0 in the loop.

An alternative implementation is to use EXECUTE-EXIT to tail-call the
next transition:

    : transition ( c addr -- )
        ]] swap th @ execute-exit [[ ; immediate

    : stop ( -- )
        \ let the ";" behind the STOP do the stopping
        ]] drop [[ ; immediate

    : start-dfa ( c-addr addr -- )
        \ let the dfa work on the string starting at c-addr, with initial
        \ state addr
        swap count rot transition ;

Here TRANSITION contains the EXECUTE in the form of EXECUTE-EXIT, and
so each transition word directly calls the next one, and no loop is
necessary; with EXECUTE this would fill the return stack after a few
thousand transitions, but EXECUTE-EXIT takes the return address off
the return stack before EXECUTEing the next word and therefore can
perform an indefinite number of state transitions.

So how do we get out of the state machine?  By not performing a
transition; instead we simply return to the caller of START-DFA.

I looked at the generated code and thought that we can get rid of the
SWAP in the transition code by using the generalized constant folding
feature of Gforth.  This replaces the definition of TRANSITION with:

        : noopt-transition-compile, ( xt -- )
            \ basic compile, implementation for TRANSITION
            drop ]] swap th @ execute-exit [[ ;

        : 1opt-transition-compile, ( xt -- )
            drop lits> ]] cells literal + @ execute-exit [[ ;

        `noopt-transition-compile, 1 foldn: transition-compile,
        `1opt-transition-compile, 1 set-fold#

        : transition ( c addr -- )
            \ dispatches the transition code for char C in state ADDR.  The
            \ stack effect describes the state of the stack when the xt has
            \ been consumed, not when the EXECUTE-EXIT returns, if at all;
            \ either compile, or execute-exit this word in order to achieve
            \ tail-call optimization
            [ 0 noopt-transition-compile, ] ;
        ' transition-compile, set-optimizer

The NOOPT-TRANSITION-COMPILE, implementation is used when TRANSITION
is compiled without a preceding constant, 1OPT-TRANSITION-COMPILE, is
used when 1 or more constants precede the compilation of TRANSITION.
In the latter case code without the SWAP is generated.

How fast are the variants.  I have a benchmark that performs 1M
iterations of processing a string of 100 non-digits (last char is '$',
of course), and one where the processed string contains numbers.  The
latter just takes more instructions and cycles, but the former just
performs 99 OUT->OUT transitions in each iteration and shows the basic
performance of the whole scheme.

Here are the results on a Zen4:

  loop             plain         optimized   implementation variant
1_278_763_454  1_175_241_846  1_175_505_964  cycles
3_651_376_357  2_441_376_030  2_045_832_844  instructions

"loop" is the version that has a loop; the others are the EXECUTE-EXIT
variants, "plain" uses the macro, "optimized" tries to do better by
recognizing that there is a constant in play.  While "optimized" saves
4 instructions per transition compared to "plain", it does not save
cycles.  Overall the number of cycles is quite high given the number
of instructions and the capabilities of the CPU.  I'll take a look at
it below.  The loop variant costs 12 instructions and 1 cycle per
transition more than the plain variant.

Here's the code for OUT->OUT:

   plain                              optimized
count    1->2                      count    1->2                 
  mov     rax,r8                     mov     rax,r8              
  add     r8,$01                     add     r8,$01              
  movzx   r15d,byte PTR [eax]        movzx   r15d,byte PTR [eax] 
lit    2->3                        cells    2->2                 
outside-num                          shl     r15,$03             
  mov     r9,$10[rbx]              lit+    2->2                  
swap    3->3                       outside-num                   
  mov     rax,r15                    add     r15,$18[rbx]        
  mov     r15,r9                   @    2->2                     
  mov     r9,rax                     mov     r15,[r15]           
cells    3->3                      execute-;s    2->1            
  shl     r9,$03                     add     rbx,$30             
+    3->2                            mov     rbx,$00[r13]        
  add     r15,r9                     mov     rax,-$10[r15]       
@    2->2                            mov     rdx,r15             
  mov     r15,[r15]                  add     r13,$08             
execute-;s    2->1                   sub     rbx,$08             
  add     rbx,$40                    jmp     eax                 
  mov     rbx,$00[r13]
  mov     rax,-$10[r15]
  mov     rdx,r15
  add     r13,$08
  sub     rbx,$08
  jmp     eax

You see only 13 instructions in OPTIMIZED here.  The jump first leads
to DOCOL (6 instructions) before enterin this code again.

For now I don't see why the whole thing takes so many cycles.  I'll
take a closer look later.

- anton
--
M. Anton Ertl  http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
     New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

Date Sujet#  Auteur
3 Mar 25 * Fetch string from comment28sjack
3 Mar 25 +* Re: Fetch string from comment2minforth
4 Mar 25 i`- Re: Fetch string from comment1sjack
4 Mar 25 +* Re: Fetch string from comment10dxf
4 Mar 25 i`* Re: Fetch string from comment9sjack
4 Mar 25 i `* Re: Fetch string from comment8dxf
5 Mar 25 i  +* Re: Fetch string from comment4mhx
5 Mar 25 i  i+- Re: Fetch string from comment1Anton Ertl
5 Mar 25 i  i`* Re: Fetch string from comment2dxf
6 Mar 25 i  i `- Re: Fetch string from comment1dxf
5 Mar 25 i  `* Re: Fetch string from comment3sjack
5 Mar 25 i   `* Re: Fetch string from comment2sjack
5 Mar 25 i    `- Re: Fetch string from comment1minforth
4 Mar 25 +* Re: Fetch string from comment13mhx
5 Mar 25 i`* Re: Fetch string from comment12Gerry Jackson
5 Mar 25 i `* Re: Fetch string from comment11Anton Ertl
5 Mar 25 i  +* Re: Fetch string from comment9mhx
5 Mar 25 i  i`* Re: Fetch string from comment8Anton Ertl
6 Mar 25 i  i `* Re: Fetch string from comment7mhx
6 Mar 25 i  i  `* Re: Fetch string from comment6Anton Ertl
7 Mar 25 i  i   +- Re: Fetch string from comment1mhx
7 Mar 25 i  i   `* Efficient implementation of Finite State Machines (FSM)4Gerry Jackson
9 Mar 25 i  i    `* Re: Efficient implementation of Finite State Machines (FSM)3Anton Ertl
10 Mar 25 i  i     +- Re: Efficient implementation of Finite State Machines (FSM)1Gerry Jackson
11 Mar 25 i  i     `- Re: Efficient implementation of Finite State Machines (FSM)1Anton Ertl
5 Mar 25 i  `- Re: Fetch string from comment1Gerry Jackson
4 Mar 25 +- Re: Fetch string from comment1sjack
7 Mar 25 `- Re: Fetch string from comment1sjack

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal