GNU Awk's types of regular expressions

Liste des Groupes 
Sujet : GNU Awk's types of regular expressions
De : janis_papanagnou+ng (at) *nospam* hotmail.com (Janis Papanagnou)
Groupes : comp.lang.awk
Date : 28. Nov 2024, 19:18:29
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <viac5m$l8oh$1@dont-email.me>
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
In GNU Awk there's currently three types of regular expressions, in
addition to the standard regexp-constants (/regex/) and the dynamic
regexps ("regex", or variables containing "regex") there's in newer
versions also first class regexp objects (@/regex/, "Strongly Typed
Regexp Constants") supported.

One principal advantage of regexp-constants is that the engine to
parse the regexp can be created in advance, while a dynamic regexp
may be constructed dynamically (from strings) and needs an explicit
runtime-step to create the engine before the matching can be done.
Now I assumed that  @/regex-const/  would in that respect behave as
 /regex-const/ ... - until I found in the GNU Awk manual this text:

|
| Thus, if you have something like this:
|
|   re = @/don't panic/
|   sub(/don't/, "do", re)
|   print typeof(re), re
|
| then re retains its type, but now attempts to match the string ‘do
| panic’. This provides a (very indirect) way to create regexp-typed
| variables at runtime.
|

(I'm astonished that first class regexp objects can be dynamically
changed. But that is not my point here; I'm interested in potential
pre-compiles of regexp constants...)

This would imply that the first class regexp constants can be changed
like dynamic regexps and that there's no regexp pre-compile involved.
This would also rise suspicion that the "normal" regexp-constants are
probably also not precomputed.

So constant-regexps (both forms) have (only?) the advantage that the
regexp-syntax can be (initially during awk parsing) checked, e.g.,

  re = @/don't panic[/
       ^ unterminated regexp

And dynamic regexps and first class regexps that got changed (e.g.
by code like

  sub(/don't/, "do[", re)

in above sample snippet) would both create runtime errors, e.g.

  error: Unmatched [, [^, [:, [., or [=: /do[ panic/
  fatal: could not make typed regex

(as all ill-formed regexp-types will produce a runtime error).

Janis

Date Sujet#  Auteur
28 Nov 24 * GNU Awk's types of regular expressions10Janis Papanagnou
29 Nov 24 +* Re: GNU Awk's types of regular expressions3Kaz Kylheku
29 Nov 24 i+- Re: GNU Awk's types of regular expressions1Janis Papanagnou
30 Nov 24 i`- Re: GNU Awk's types of regular expressions1Janis Papanagnou
1 Dec 24 `* Re: GNU Awk's types of regular expressions6Aharon Robbins
1 Dec 24  `* Re: GNU Awk's types of regular expressions5Janis Papanagnou
2 Dec 24   `* Re: GNU Awk's types of regular expressions4Aharon Robbins
2 Dec 24    `* Re: GNU Awk's types of regular expressions3Janis Papanagnou
2 Dec 24     `* Re: GNU Awk's types of regular expressions2Aharon Robbins
3 Dec 24      `- Re: GNU Awk's types of regular expressions1Janis Papanagnou

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal