Sujet : Re: bash aesthetics question: special characters in reg exp in [[ ... =~~ ... ]]
De : 643-408-1753 (at) *nospam* kylheku.com (Kaz Kylheku)
Groupes : comp.unix.shellDate : 24. Jul 2024, 19:35:51
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <20240724112619.254@kylheku.com>
References : 1 2 3 4 5 6 7
User-Agent : slrn/pre1.0.4-9 (Linux)
On 2024-07-24, Ben Bacarisse <
ben@bsb.me.uk> wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
>
On 2024-07-23, Ben Bacarisse <ben@bsb.me.uk> wrote:
Kaz Kylheku <643-408-1753@kylheku.com> writes:
This matters when regexes are used for matching a prefix of the input;
if the regex is interpreted according to the theory should match
the longest possible prefix; it cannot ignore R3, which matches
thousands of symbols, because R2 matched three symbols.
>
This is more a consequence of the different views. The in the formal
theory there is no notion of "matching". Regular expressions define
languages (i.e. sets of sequences of symbols) according to a recursive
set of rules. The whole idea of an RE matching a string is from their
use in practical applications.
>
Under the set view, we can ask, what is the longest prefix of
the input which belongs to the language R1|R2. The answer is the
same for R2|R1, which denote the same set, since | corresponds
to set union.
>
What is "the input" in the set view. The set view is simply a recursive
definition of the language.
It is a separate string under consideration.
We have a set, and are asking the question "what is the longest prefix
of the given string which is a member of the set".
Broken regular expressions identify the longest prefix, except
when the | operator is used; then they just identify a prefix,
not necessarily longest.
>
What is a "broken" RE in the set view?
Inconsistency in being able to answer the question "what is the longest
prefix of the string which is a member of the set".
Broken regexes contain a pitfall: they deliver the right answer
for expressions like ab*. If the input is "abbbbbbbc",
they identify the entire "abbbbbbb" prefix. But if the branch
operator is used, as in "a|ab*", oops, they short-circuit.
The "a" matches a prefix of the input, and so that's done; no need
to match the "ab*" part of the branch.
The "a" prefix is in the language described from the language; a
set element has been identified. But it's not the longest one.
It is an inconsistency. If the longest match is not required, why
bother finding one for "ab*"; for that expression, the "a" prefix could
also just be returned.
-- TXR Programming Language: http://nongnu.org/txrCygnal: Cygwin Native Application Library: http://kylheku.com/cygnalMastodon: @Kazinator@mstdn.ca