Newsportal USENET - Re: Experiences with match() subexpressions?

On 2025-04-11, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:

On 11.04.2025 08:33, Aharon Robbins wrote:
In article <vt9dre$3t3po$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
The feature can be very useful,
but not for the case I was looking for. - Actually, it could have
provided the functionality I was seeking, but since GNU Awk relies
on the GNU regexp functions as they are implemented I cannot expect
that any provided features gets extended by Awk. - If GNU Awk would
have an own RE implementation then we could think about using, e.g.,
another array dimension to store the (now only temporary existing,
and generally unavailable) subexpressions.

Actually, this is not so trivial. The data structures at the C level
as mandated by POSIX are one dimensional; the submatches in parentheses
are counted from left to right. There's no way to represent the
subexpressions that are under control of interval expressions, which
would essentially require a two-dimensional data structure.
>
Yes, that's why I had thought about a 2-dimensional array [on GNU
Awk level] so that arr[n][i] for i=1..z would contain the patterns.
This is what I actually tried with GNU Awk (before I had asked you)
to see whether there's some undocumented feature.

I solved this problem 15 years ago in the TXR Pattern Language

$ echo 'R=r1,R=r2,R=r3,E=e' | txr -B -c '@(coll)R=@r,@(until)E@(end)E=@e'
r[0]="r1"
r[1]="r2"
r[2]="r3"
e="e"

We can eval the output into Bash and have a ${r[@]} array.

We can see the captured variables in a Lisp format:

$ echo 'R=r1,R=r2,R=r3,E=e' | txr -l -c '@(coll)R=@r,@(until)E@(end)E=@e'
(r "r1" "r2" "r3")
(e . "e")

The matches occuring in repetition constructs like @(coll) or its
vertical, line-oriented counterpart @(collect), are automatically
tabulated into lists.

We can see that the "e" variable wasn't; it is string valued,
rather than list valued.

One possibility is to use the @(merge dest {sources}*) directive which
examines different nesting depths of its operands and
intelligently combines them.

$ echo 'R=r1,R=r2,R=r3,E=e' | txr -B -c '@(coll)R=@r,@(until)E@(end)E=@e
@(merge x r e)'
r[0]="r1"
r[1]="r2"
r[2]="r3"
e="e"
x[0]="r1"
x[1]="r2"
x[2]="r3"
x[3]="e"

$ echo 'R=r1,R=r2,R=r3,E=e' | txr -B -c '@(coll)R=@r,@(until)E@(end)E=@e
@(merge x r e)
@(forget r e)'
x[0]="r1"
x[1]="r2"
x[2]="r3"
x[3]="e"

A plethora of techniques are possible.

In Lisp, Split data along commas, then again on =

1> (flow "R=r1,R=r2,R=r3,E=e"
   (spl ","))
("R=r1" "R=r2" "R=r3" "E=e")
2> (flow "R=r1,R=r2,R=r3,E=e"
   (spl ",")
   (map (op spl "=")))
(("R" "r1") ("R" "r2") ("R" "r3") ("E" "e"))

Or pattern match the comma splits:

3> (flow "R=r1,R=r2,R=r3,E=e"
   (spl ",")
   (map (do match `@key=@val` @1 (list key val))))
(("R" "r1") ("R" "r2") ("R" "r3") ("E" "e"))

Just the R's please

4> (flow "R=r1,R=r2,R=r3,E=e"
   (spl ",")
   (map (do if-match `R=@val` @1 val)))
("r1" "r2" "r3" nil)

Splice out the nils:

8> (flow "R=r1,R=r2,R=r3,E=e"
   (spl ",")
   (mappend (do if-match `R=@val` @1 (list val))))
("r1" "r2" "r3")

Or remove them:

9> (flow "R=r1,R=r2,R=r3,E=e"
   (spl ",")
   (map (do if-match `R=@val` @1 val))
   (remq nil))

Heck, use a Lispified Awk. The variable f holds
the fields. Whenw e assign f to itself, that
forces the recalculation of variable rec with
the ofs:

10> (awk (:inputs '("R=r1,R=r2,R=r3,E=e"))
   (:set fs "," ofs ":")
   (t (set f f) (prn)))
R=r1:R=r2:R=r3:E=e
nil

Use two Awks, nested inside each other: inner Awk
processes the fields f produced by the outer Awk:

11> (awk (:inputs '("R=r1,R=r2,R=r3,E=e"))
   (:set fs "," ofs ":")
   (t (awk (:inputs f)
   (:set fs "=")
   (t (prn [f 1])))))
r1
r2
r3
e
nil

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Date	Sujet	#	Auteur
10 Apr 25	Experiences with match() subexpressions?	22	Janis Papanagnou
10 Apr 25	Re: Experiences with match() subexpressions?	21	Janis Papanagnou
10 Apr 25	Re: Experiences with match() subexpressions?	14	Kenny McCormack
10 Apr 25	Re: Experiences with match() subexpressions?	13	Janis Papanagnou
10 Apr 25	Re: Experiences with match() subexpressions?	12	Kenny McCormack
10 Apr 25	Re: Experiences with match() subexpressions?	11	Janis Papanagnou
11 Apr 25	Re: Experiences with match() subexpressions?	10	Aharon Robbins
11 Apr 25	Re: Experiences with match() subexpressions?	5	Janis Papanagnou
11 Apr 25	Re: Experiences with match() subexpressions?	1	Kaz Kylheku
18 Apr 25	Re: Experiences with match() subexpressions?	3	Manuel Collado
18 Apr 25	Re: Experiences with match() subexpressions?	1	Kenny McCormack
18 Apr 25	Re: Experiences with match() subexpressions?	1	Janis Papanagnou
11 Apr 25	Re: Experiences with match() subexpressions?	1	Kaz Kylheku
11 Apr 25	The new matcher (Was: Experiences with match() subexpressions?)	2	Kenny McCormack
11 Apr 25	Re: The new matcher (Was: Experiences with match() subexpressions?)	1	Janis Papanagnou
11 Apr 25	Re: Experiences with match() subexpressions?	1	Kaz Kylheku
11 Apr 25	Re: Experiences with match() subexpressions?	6	Ed Morton
13 Apr 25	Re: Experiences with match() subexpressions?	5	Ed Morton
14 Apr 25	Nitpicking the code (Was: Experiences with match() subexpressions?)	4	Kenny McCormack
14 Apr 25	Re: Nitpicking the code (Was: Experiences with match() subexpressions?)	3	Janis Papanagnou
15 Apr 25	Re: Nitpicking the code (Was: Experiences with match() subexpressions?)	2	Ed Morton
15 Apr 25	Re: Nitpicking the code (Was: Experiences with match() subexpressions?)	1	Janis Papanagnou