Newsportal USENET - Re: Command Languages Versus Programming Languages

cross@spitfire.i.gajendra.net (Dan Cross) writes:

Rainer Weikusat <rweikusat@talktalk.net> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat <rweikusat@talktalk.net> wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>
[...]
>
Personally I think that writing bulky procedural stuff for something
like [0-9]+ can only be much worse, and that further abbreviations
like \d+ are the better direction to go if targeting a good interface.
YMMV.
>
Assuming that p is a pointer to the current position in a string, e is a
pointer to the end of it (ie, point just past the last byte) and -
that's important - both are pointers to unsigned quantities, the 'bulky'
C equivalent of [0-9]+ is
>
while (p < e && *p - '0' < 10) ++p;
>
That's not too bad. And it's really a hell lot faster than a
general-purpose automaton programmed to recognize the same pattern
(which might not matter most of the time, but sometimes, it does).
>
It's also not exactly right. `[0-9]+` would match one or more
characters; this possibly matches 0 (ie, if `p` pointed to
something that wasn't a digit).
>
The regex won't match any digits if there aren't any. In this case, the
match will fail. I didn't include the code for handling that because it
seemed pretty pointless for the example.
>
That's rather the point though, isn't it? The program snippet
(modulo the promotion to signed int via the "usual arithmetic
conversions" before the subtraction and comparison giving you
unexpected values; nothing to do with whether `char` is signed
or not) is a snippet that advances a pointer while it points to
a digit, starting at the current pointer position; that is, it
just increments a pointer over a run of digits.

That's the core part of matching someting equivalent to the regex [0-9]+
and the only part of it is which is at least remotely interesting.

But that's not the same as a regex matcher, which has a semantic
notion of success or failure. I could run your snippet against
a string such as, say, "ZZZZZZ" and it would "succeed" just as
it would against an empty string or a string of one or more
digits.

Why do you believe that p being equivalent to the starting position
would be considered a "successful match", considering that this
obviously doesn't make any sense?

[...]

By the way, something that _would_ match `^[0-9]+$` might be:

[too much code]

Something which would match [0-9]+ in its first argument (if any) would
be:

#include "string.h"
#include "stdlib.h"

int main(int argc, char **argv)
{
char *p;
unsigned c;

p = argv[1];
if (!p) exit(1);
while (c = *p, c && c - '0' > 10) ++p;
if (!c) exit(1);
return 0;
}

but that's 14 lines of text, 13 of which have absolutely no relation to
the problem of recognizing a digit.

Date	Sujet	#		Auteur
30 Sep 24	Re: Command Languages Versus Programming Languages	146		Bozo User