Sujet : Re: Command Languages Versus Programming Languages
De : 643-408-1753 (at) *nospam* kylheku.com (Kaz Kylheku)
Groupes : comp.unix.shell comp.unix.programmer comp.lang.miscDate : 22. Nov 2024, 19:18:04
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <20241122101217.134@kylheku.com>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : slrn/pre1.0.4-9 (Linux)
On 2024-11-22,
Muttley@DastartdlyHQ.org <
Muttley@DastartdlyHQ.org> wrote:
On Thu, 21 Nov 2024 19:12:03 -0000 (UTC)
Kaz Kylheku <643-408-1753@kylheku.com> boring babbled:
On 2024-11-20, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
I'm curious what you mean by Regexps presented in a "procedural" form.
Can you give some examples?
>
Here is an example: using a regex match to capture a C comment /* ... */
in Lex compared to just recognizing the start sequence /* and handling
the discarding of the comment in the action.
>
Without non-greedy repetition matching, the regex for a C comment is
quite obtuse. The procedural handling is straightforward: read
characters until you see a * immediately followed by a /.
>
Its not that simple I'm afraid since comments can be commented out.
Umm, no.
>
eg:
>
// int i; /*
This /* sequence is inside a // comment, and so the machinery that
recognizes /* as the start of a comment would never see it.
Just like "int i;" is in a string literal and so not recognized
as a keyword, whitespace, identifier and semicolon.
int j;
/*
int k;
*/
++j;
>
A C99 and C++ compiler would see "int j" and compile it, a regex would
simply remove everything from the first /* to */.
No, it won't, because that's not how regexes are used in a lexical
analyzer. At the start of the input, the lexical analyzer faces
the characters "// int i; /*\n". This will trigger the pattern match
for // comments. Essentially that entire sequence through the newline
is treated as a kind of token, equivalent to a space.
Once a token is recognized and removed from the input, it is gone;
no other regular expression can match into it.
Also the same probably applies to #ifdef's.
Lexically analyzing C requires implementing the translation phases
as described in the standard. There are preprocessor phases which
delimit the input into preprocessor tokens (pp-tokens). Comments
are stripped in preprocessing. But logical lines (backslash
continuations) are recognized below comments; i.e. this is one
comment:
\ comment \
split \
into \
physical \
lines
A lexical scanner can have an input routine which transparently handles
this low-level detail, so that it doesn't have to deal with the
line continuations in every token pattern.
-- TXR Programming Language: http://nongnu.org/txrCygnal: Cygwin Native Application Library: http://kylheku.com/cygnalMastodon: @Kazinator@mstdn.ca