Newsportal USENET - Re: Roman numerals , recognizer "0r".

Last Dutch Forth meeting we discussed a challenge in
recognizing Roman numbers.

IIRC Ulrich Hoffman gave a presentation on a kind of Latin Forth with
Roman numerals at EuroForth 2023 (in Rome). Unfortunatly, most of the
video recordings were unusable (bad sound), and there is not other
record from this presentation.

It show off the power of the PREFIX word.
Remark:
A word marked PREFIX is found in the dictionary also
if it is immediately followed by another word.
0r (zero-r) M C CM are all prefixes. Making them also IMMEDIATE
made the Roman denotation work also in compilation
mode. It suffices to add POSTPONE LITERAL to the
denotation prefix, (that does nothing in interpret mode).
This is ciforth specific code.
>
--------------------------------------------------
\ $Id: paas.frt,v 1.4 2025/06/02 12:23:48 albert Exp $
\ Copyright (2012): Albert van der Horst {by GNU Public License}
>
\ Make an interpreter of Roman numerals.
\ 0rMCMXLVIII is like 0X7FFF000
>
\ The idea is to have a Roman thingy M C also CM IV
\ that add a constant 1000 100 or 900 4 to what is
\ already on the stack.
\ It is dangerous to let loose a PREFIX D , for
\ example DROP is no longer understood, so the
\ Roman thingies are tucked away in a ROMAN wordlist.
>
\ ERROR 1001 : The components of a Roman numeral
\ must be in descending order.
\ This detects error 1001, but this is not the subject.
: !ERR ; : ?ERR? ;
>
NAMESPACE ROMANS
: 0r !ERR 0 NAME ROMANS EVALUATE PREVIOUS POSTPONE LITERAL ;
PREFIX IMMEDIATE
: rdigit CREATE , PREFIX IMMEDIATE DOES> @ ?ERR? + ;
\ Define Roman thingies starting with number the ten times smaller
: _row BEGIN DUP rdigit 10 / DUP 0= UNTIL DROP ;
>
ROMANS DEFINITIONS
1000 _row M C X I
900 _row CM XC IX
500 _row D L V
400 _row CD XL IV
PREVIOUS DEFINITIONS

That's a clever use of your prefix feature. OTOH, the DROP problem
shows the limitations of that approach. Does this code reject "LLL"?
I don't see where that would come from.

Below you find a REC-ROMAN implemented in the current state of affairs
in Gforth. Here I had to walk through the string explicitly, because
there is no prefix mechanism to do it for me.

OTOH, REC-ROMAN is case-sensitive which makes it easy to avoid
conflicts: just write it as a lower-case word if you want to call a
word that happens to use the same letters as a roman numeral.
REC-ROMAN is inserted as first recognizer to be checked, otherwise
this conflict avoidance strategy would not work. There are no words
in Gforth on startup that conflict with roman numerals in the case in
which they are defined (I have not checked if there are any that would
conflict if the name was written upper-case, but at least I and L do).

Because conflicts can be avoided, there is no need to use a prefix
like your Or, so I do not use that. Here are some examples:

MCMXLVIII . \ 1948
mcmxlviii . \ error: undefined word
MIM \ error: undefined word
L . \ 50
LLL \ error: undefined word
MCMXLVIII LXXVII + . \ 2025

And here's the code:
------------------------------------------------------------------
0
value: rdigit-value
2value: rdigit-string
constant rdigit-size

: romandigit ( u "romandigit" -- )
, parse-name save-mem 2, ;

create romandigits
\ this table contains variants with 4 repetitions, you can comment
\ them out if desired
900 romandigit CM
500 romandigit D
400 romandigit CD
400 romandigit CCCC
300 romandigit CCC
200 romandigit CC
100 romandigit C
90 romandigit XC
50 romandigit L
40 romandigit XL
40 romandigit XXXX
30 romandigit XXX
20 romandigit XX
10 romandigit X
   9 romandigit IX
   5 romandigit V
   4 romandigit IV
   4 romandigit IIII
   3 romandigit III
   2 romandigit II
   1 romandigit I
here constant end-romandigits

: roman>n? ( c-addr u -- n f )
\ if c-addr u contains a roman numeral, f is true and n is the value,
\ otherwise f is false.
dup >r 'M' skip r> over - 1000 *
romandigits case {: d: str1 n1 rd1 :}
rd1 end-romandigits = ?of n1 str1 nip 0= endof
str1 rd1 rdigit-string string-prefix? ?of
str1 rd1 rdigit-string nip /string
n1 rd1 rdigit-value +
rd1 rdigit-size + contof
str1 n1 rd1 rdigit-size + next-case ;

: rec-roman ( c-addr u -- n translate-num | 0 )
roman>n? if ['] translate-num else drop 0 then ;

' rec-roman action-of forth-recognize >stack

true [if]
s" MCMXLVIII" roman>n? . . \ -1 1948

\ check forth-wordlist for conflicts
[: ( nt -- f ) dup name>string roman>n? nip if
dup name>string type space then
drop true
;] forth-wordlist traverse-wordlist
[then]
-----------------------------------------------------------

For ROMAN>N? I first tried an orthodox approach with data and return
stack only, and BEGIN etc., but with 4 stack items that have to be
updated possibly at every iteration that was somewhat unwieldy, and I
produced a buggy version. Then I tried this approach with the
extended CASE and locals, and I got it right on first try, despite its
bulk. I leave it to dxf to show how much better this becomes in
orthodox Forth.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
   New standard: https://forth-standard.org/
EuroForth 2023 proceedings: http://www.euroforth.org/ef23/papers/
EuroForth 2024 proceedings: http://www.euroforth.org/ef24/papers/

Date	Sujet	#	Auteur
8 Jun 25	Re: Roman numerals , recognizer "0r".	3	Anton Ertl
9 Jun 25	Re: Roman numerals , recognizer "0r".	1	Anton Ertl
9 Jun 25	Re: Roman numerals , recognizer "0r".	1	Anton Ertl