In article <
874j40sk01.fsf@doppelsaurus.mobileactivedefense.com>,
Rainer Weikusat <
rweikusat@talktalk.net> wrote:
cross@spitfire.i.gajendra.net (Dan Cross) writes:
Rainer Weikusat <rweikusat@talktalk.net> wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>
[...]
>
Personally I think that writing bulky procedural stuff for something
like [0-9]+ can only be much worse, and that further abbreviations
like \d+ are the better direction to go if targeting a good interface.
YMMV.
>
Assuming that p is a pointer to the current position in a string, e is a
pointer to the end of it (ie, point just past the last byte) and -
that's important - both are pointers to unsigned quantities, the 'bulky'
C equivalent of [0-9]+ is
>
while (p < e && *p - '0' < 10) ++p;
>
That's not too bad. And it's really a hell lot faster than a
general-purpose automaton programmed to recognize the same pattern
(which might not matter most of the time, but sometimes, it does).
>
It's also not exactly right. `[0-9]+` would match one or more
characters; this possibly matches 0 (ie, if `p` pointed to
something that wasn't a digit).
>
The regex won't match any digits if there aren't any. In this case, the
match will fail. I didn't include the code for handling that because it
seemed pretty pointless for the example.
That's rather the point though, isn't it? The program snippet
(modulo the promotion to signed int via the "usual arithmetic
conversions" before the subtraction and comparison giving you
unexpected values; nothing to do with whether `char` is signed
or not) is a snippet that advances a pointer while it points to
a digit, starting at the current pointer position; that is, it
just increments a pointer over a run of digits.
But that's not the same as a regex matcher, which has a semantic
notion of success or failure. I could run your snippet against
a string such as, say, "ZZZZZZ" and it would "succeed" just as
it would against an empty string or a string of one or more
digits. And then there are other matters of context; does the
user intend for the regexp to match the _whole_ string? Or any
portion of the string (a la `grep`)? So, for example, does the
string "aaa1234aaa" match `[0-9]+`? As written, the above
snippet is actually closer to advancing `p` over `^[0-9]*`. One
might differentiate between `*` and `+` after the fact, by
examining `p` against some (presumably saved) source value, but
that's more code.
These are just not equivalent. That's not to say that your
snippet is not _useful_ in context, but to pretend that it's the
same as the regular expression is pointlessly reductive.
By the way, something that _would_ match `^[0-9]+$` might be:
term% cat mdp.c
#include <assert.h>
#include <stdbool.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static bool
mdigit(unsigned int c)
{
return c - '0' < 10;
}
bool
mdp(const char *str, const char *estr)
{
if (str == NULL || estr == NULL || str == estr)
return false;
if (!mdigit(*str))
return false;
while (str < estr && mdigit(*str))
str++;
return str == estr;
}
bool
probe(const char *s, bool expected)
{
if (mdp(s, s + strlen(s)) != expected) {
fprintf(stderr, "test failure: `%s` (expected %s)\n",
s, expected ? "true" : "false");
return false;
}
return true;
}
int
main(void)
{
bool success = true;
success = probe("1234", true) && success;
success = probe("", false) && success;
success = probe("ab", false) && success;
success = probe("0", true) && success;
success = probe("0123456789", true) && success;
success = probe("a0123456", false) && success;
success = probe("0123456b", false) && success;
success = probe("0123c456", false) && success;
success = probe("0123#456", false) && success;
return success ? EXIT_SUCCESS : EXIT_FAILURE;
}
term% cc -Wall -Wextra -Werror -pedantic -std=c11 mdp.c -o mdp
term% ./mdp
term% echo $?
0
term%
Granted the test scaffolding and `#include` boilerplate makes
this appear rather longer than it would be in context, but it's
still not nearly as succinct.
- Dan C.