Re: Command Languages Versus Programming Languages

Liste des GroupesRevenir à cu shell 
Sujet : Re: Command Languages Versus Programming Languages
De : rweikusat (at) *nospam* talktalk.net (Rainer Weikusat)
Groupes : comp.unix.shell comp.unix.programmer comp.lang.misc
Date : 22. Nov 2024, 16:41:09
Autres entêtes
Message-ID : <877c8vtgx6.fsf@doppelsaurus.mobileactivedefense.com>
References : 1 2 3 4 5
User-Agent : Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat  <rweikusat@talktalk.net> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat  <rweikusat@talktalk.net> wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>
[...]
>
Personally I think that writing bulky procedural stuff for something
like [0-9]+ can only be much worse, and that further abbreviations
like \d+ are the better direction to go if targeting a good interface.
YMMV.
>
Assuming that p is a pointer to the current position in a string, e is a
pointer to the end of it (ie, point just past the last byte) and -
that's important - both are pointers to unsigned quantities, the 'bulky'
C equivalent of [0-9]+ is
>
while (p < e && *p - '0' < 10) ++p;
>
That's not too bad. And it's really a hell lot faster than a
general-purpose automaton programmed to recognize the same pattern
(which might not matter most of the time, but sometimes, it does).
>
It's also not exactly right.  `[0-9]+` would match one or more
characters; this possibly matches 0 (ie, if `p` pointed to
something that wasn't a digit).
>
The regex won't match any digits if there aren't any. In this case, the
match will fail. I didn't include the code for handling that because it
seemed pretty pointless for the example.
>
That's rather the point though, isn't it?  The program snippet
(modulo the promotion to signed int via the "usual arithmetic
conversions" before the subtraction and comparison giving you
unexpected values; nothing to do with whether `char` is signed
or not) is a snippet that advances a pointer while it points to
a digit, starting at the current pointer position; that is, it
just increments a pointer over a run of digits.

That's the core part of matching someting equivalent to the regex [0-9]+
and the only part of it is which is at least remotely interesting.

But that's not the same as a regex matcher, which has a semantic
notion of success or failure.  I could run your snippet against
a string such as, say, "ZZZZZZ" and it would "succeed" just as
it would against an empty string or a string of one or more
digits.

Why do you believe that p being equivalent to the starting position
would be considered a "successful match", considering that this
obviously doesn't make any sense?

[...]

By the way, something that _would_ match `^[0-9]+$` might be:

[too much code]

Something which would match [0-9]+ in its first argument (if any) would
be:

#include "string.h"
#include "stdlib.h"

int main(int argc, char **argv)
{
    char *p;
    unsigned c;

    p = argv[1];
    if (!p) exit(1);
    while (c = *p, c && c - '0' > 10) ++p;
    if (!c) exit(1);
    return 0;
}

but that's 14 lines of text, 13 of which have absolutely no relation to
the problem of recognizing a digit.

Date Sujet#  Auteur
30 Sep 24 o Re: Command Languages Versus Programming Languages146Bozo User

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal