Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]

Liste des GroupesRevenir à cu shell 
Sujet : Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]
De : ben (at) *nospam* bsb.me.uk (Ben Bacarisse)
Groupes : comp.unix.shell
Date : 24. Jul 2024, 00:51:44
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <87y15r650v.fsf@bsb.me.uk>
References : 1 2 3 4
User-Agent : Gnus/5.13 (Gnus v5.13)
Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2024-07-23, Kenny McCormack <gazelle@shell.xmission.com> wrote:
Which all kind of echoes back to the other recent thread in this NG about
regular expressions vs. globs.  The cold hard fact is that there really is
no such thing as "regular expressions" (*), since every language, every
program, every implementation of them, is quite different.
>
(*) As an abstract concept, separate from any specific implementation.
>
Yes, there are regular expressions as an abstract concept. They are part
of the theory of automata.  Much of the research went on up through the
1960's.  The * operator is called the "Kleene star".
https://en.wikipedia.org/wiki/Kleene_star
>
In the old math/CS papers about regular expressions, regular expressions
are typically represented in terms of some input symbol alphabet
(usually just letters a, b, c ...) and only the operators | and *,
and parentheses (other than when advanced operators are being discussed,
like intersection and complement, whicha re not easily constructed from
these.)
>
I think character classes might have been a pragmatic invention in
regex implementations. The theory doesn't require [a-c] because
that can be encoded as (a|b|c).
>
The ? operator is not required because (R)? can be written (R)(R)*.

(Aside: the choice is arbitrary but + would be a more "Unixy" choice for
that operator.)

Escaping is not required because the oeprators and input symbols are
distinct; the idea that ( could be an input symbol is something that
occurs in implementations, not in the theory.
>
Regex implementors take the theory and adjust it to taste,
and add necessary details such as character escape sequences for
control characters, and escaping to allow the oeprator characters
themselves to be matched. Plus character classes, with negation
and ranges and all that.
>
Not all implementations follow solid theory. For instance, the branch
operator | is supposed to be commutative.  There is no difference
between R1|R2 and R2|R1.  But in many implementations (particularly
backtracking ones like PCRE and similar), there is a difference: these
implementations implement R1|R2|R3  by trying the expressions in left to
right order and stop at the first match.
>
This matters when regexes are used for matching a prefix of the input;
if the regex is interpreted according to the theory should match
the longest possible prefix; it cannot ignore R3, which matches
thousands of symbols, because R2 matched three symbols.

This is more a consequence of the different views. The in the formal
theory there is no notion of "matching".  Regular expressions define
languages (i.e. sets of sequences of symbols) according to a recursive
set of rules.  The whole idea of an RE matching a string is from their
use in practical applications.

--
Ben.

Date Sujet#  Auteur
22 Jul 24 * bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]19Kenny McCormack
23 Jul 24 +* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]9Kaz Kylheku
23 Jul 24 i`* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]8Janis Papanagnou
23 Jul 24 i +* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]6Kenny McCormack
23 Jul 24 i i`* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]5Janis Papanagnou
23 Jul 24 i i `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]4Kenny McCormack
23 Jul 24 i i  `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]3Janis Papanagnou
23 Jul 24 i i   `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]2Kenny McCormack
24 Jul 24 i i    `- Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]1Janis Papanagnou
23 Jul 24 i `- Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]1Kaz Kylheku
23 Jul 24 `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]9Arti F. Idiot
23 Jul 24  `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]8Kenny McCormack
23 Jul 24   `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]7Kaz Kylheku
24 Jul 24    +* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]5Ben Bacarisse
24 Jul 24    i`* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]4Kaz Kylheku
24 Jul 24    i `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]3Ben Bacarisse
24 Jul 24    i  `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]2Kaz Kylheku
24 Jul 24    i   `- Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]1Ben Bacarisse
24 Jul 24    `- Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]1Janis Papanagnou

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal