Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]

Liste des GroupesRevenir à cu shell 
Sujet : Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]
De : ben (at) *nospam* bsb.me.uk (Ben Bacarisse)
Groupes : comp.unix.shell
Date : 24. Jul 2024, 22:28:32
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <87msm65vjz.fsf@bsb.me.uk>
References : 1 2 3 4 5 6 7 8
User-Agent : Gnus/5.13 (Gnus v5.13)
Kaz Kylheku <643-408-1753@kylheku.com> writes:

On 2024-07-24, Ben Bacarisse <ben@bsb.me.uk> wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
>
On 2024-07-23, Ben Bacarisse <ben@bsb.me.uk> wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
This matters when regexes are used for matching a prefix of the input;
if the regex is interpreted according to the theory should match
the longest possible prefix; it cannot ignore R3, which matches
thousands of symbols, because R2 matched three symbols.
>
This is more a consequence of the different views. The in the formal
theory there is no notion of "matching".  Regular expressions define
languages (i.e. sets of sequences of symbols) according to a recursive
set of rules.  The whole idea of an RE matching a string is from their
use in practical applications.
>
Under the set view, we can ask, what is the longest prefix of
the input which belongs to the language R1|R2. The answer is the
same for R2|R1, which denote the same set, since | corresponds
to set union.
>
What is "the input" in the set view.  The set view is simply a recursive
definition of the language.
>
It is a separate string under consideration.
>
We have a set, and are asking the question "what is the longest prefix
of the given string which is a member of the set".

It's better, then, (as in the latter wording) not to use a term from the
"implementation" view of REs.

Broken regular expressions identify the longest prefix, except
when the | operator is used; then they just identify a prefix,
not necessarily longest.
>
What is a "broken" RE in the set view?
>
Inconsistency in being able to answer the question "what is the longest
prefix of the string which is a member of the set".
>
Broken regexes contain a pitfall: they deliver the right answer
for expressions like ab*. If the input is "abbbbbbbc",
>
they identify the entire "abbbbbbb" prefix. But if the branch
operator is used, as in "a|ab*", oops, they short-circuit.
The "a" matches a prefix of the input, and so that's done; no need
to match the "ab*" part of the branch.

I don't see any "pitfall".  The answer to you question "what is the
longest prefix of the given string which is a member of the set" is not
"a" and nothing in the either the formal definition of the language
"a|ab*" nor in the wording of the question is a pitfall.  The longest
prefix of "abbbbbbbc" that is in the language "a|ab*" is, unambiguously,
"abbbbbbb".

The "a" prefix is in the language described from the language; a
set element has been identified. But it's not the longest one.

Yes.  But there is no "pitfall" and the RE is not "broken" in any formal
sense at all.

An implementation might be broken and there are pitfalls to look out for
when viewing REs as patterns to match, but that's my whole point.  This
is all about the "other" view, not the view of REs as defining formal
languages.

It is an inconsistency. If the longest match is not required, why
bother finding one for "ab*"; for that expression, the "a" prefix could
also just be returned.

We could, of course, ask about other prefixes of "abbbbbbbc" that are in
the language "a|ab*".  I don't see anything inconsistent here at all.

--
Ben.

Date Sujet#  Auteur
22 Jul 24 * bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]19Kenny McCormack
23 Jul 24 +* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]9Kaz Kylheku
23 Jul 24 i`* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]8Janis Papanagnou
23 Jul 24 i +* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]6Kenny McCormack
23 Jul 24 i i`* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]5Janis Papanagnou
23 Jul 24 i i `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]4Kenny McCormack
23 Jul 24 i i  `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]3Janis Papanagnou
23 Jul 24 i i   `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]2Kenny McCormack
24 Jul 24 i i    `- Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]1Janis Papanagnou
23 Jul 24 i `- Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]1Kaz Kylheku
23 Jul 24 `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]9Arti F. Idiot
23 Jul 24  `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]8Kenny McCormack
23 Jul 24   `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]7Kaz Kylheku
24 Jul 24    +* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]5Ben Bacarisse
24 Jul 24    i`* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]4Kaz Kylheku
24 Jul 24    i `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]3Ben Bacarisse
24 Jul 24    i  `* Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]2Kaz Kylheku
24 Jul 24    i   `- Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]1Ben Bacarisse
24 Jul 24    `- Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]1Janis Papanagnou

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal