On 2025-04-11, Janis Papanagnou <janis_papanagnou+
ng@hotmail.com> wrote:
On 11.04.2025 08:33, Aharon Robbins wrote:
In article <vt9dre$3t3po$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
The feature can be very useful,
but not for the case I was looking for. - Actually, it could have
provided the functionality I was seeking, but since GNU Awk relies
on the GNU regexp functions as they are implemented I cannot expect
that any provided features gets extended by Awk. - If GNU Awk would
have an own RE implementation then we could think about using, e.g.,
another array dimension to store the (now only temporary existing,
and generally unavailable) subexpressions.
Actually, this is not so trivial. The data structures at the C level
as mandated by POSIX are one dimensional; the submatches in parentheses
are counted from left to right. There's no way to represent the
subexpressions that are under control of interval expressions, which
would essentially require a two-dimensional data structure.
>
Yes, that's why I had thought about a 2-dimensional array [on GNU
Awk level] so that arr[n][i] for i=1..z would contain the patterns.
This is what I actually tried with GNU Awk (before I had asked you)
to see whether there's some undocumented feature.
I solved this problem 15 years ago in the TXR Pattern Language
$ echo 'R=r1,R=r2,R=r3,E=e' | txr -B -c '@(coll)R=@r,@(until)E@(end)E=@e'
r[0]="r1"
r[1]="r2"
r[2]="r3"
e="e"
We can eval the output into Bash and have a ${r[@]} array.
We can see the captured variables in a Lisp format:
$ echo 'R=r1,R=r2,R=r3,E=e' | txr -l -c '@(coll)R=@r,@(until)E@(end)E=@e'
(r "r1" "r2" "r3")
(e . "e")
The matches occuring in repetition constructs like @(coll) or its
vertical, line-oriented counterpart @(collect), are automatically
tabulated into lists.
We can see that the "e" variable wasn't; it is string valued,
rather than list valued.
One possibility is to use the @(merge dest {sources}*) directive which
examines different nesting depths of its operands and
intelligently combines them.
$ echo 'R=r1,R=r2,R=r3,E=e' | txr -B -c '@(coll)R=@r,@(until)E@(end)E=@e
@(merge x r e)'
r[0]="r1"
r[1]="r2"
r[2]="r3"
e="e"
x[0]="r1"
x[1]="r2"
x[2]="r3"
x[3]="e"
$ echo 'R=r1,R=r2,R=r3,E=e' | txr -B -c '@(coll)R=@r,@(until)E@(end)E=@e
@(merge x r e)
@(forget r e)'
x[0]="r1"
x[1]="r2"
x[2]="r3"
x[3]="e"
A plethora of techniques are possible.
In Lisp, Split data along commas, then again on =
1> (flow "R=r1,R=r2,R=r3,E=e"
(spl ","))
("R=r1" "R=r2" "R=r3" "E=e")
2> (flow "R=r1,R=r2,R=r3,E=e"
(spl ",")
(map (op spl "=")))
(("R" "r1") ("R" "r2") ("R" "r3") ("E" "e"))
Or pattern match the comma splits:
3> (flow "R=r1,R=r2,R=r3,E=e"
(spl ",")
(map (do match `@key=@val` @1 (list key val))))
(("R" "r1") ("R" "r2") ("R" "r3") ("E" "e"))
Just the R's please
4> (flow "R=r1,R=r2,R=r3,E=e"
(spl ",")
(map (do if-match `R=@val` @1 val)))
("r1" "r2" "r3" nil)
Splice out the nils:
8> (flow "R=r1,R=r2,R=r3,E=e"
(spl ",")
(mappend (do if-match `R=@val` @1 (list val))))
("r1" "r2" "r3")
Or remove them:
9> (flow "R=r1,R=r2,R=r3,E=e"
(spl ",")
(map (do if-match `R=@val` @1 val))
(remq nil))
Heck, use a Lispified Awk. The variable f holds
the fields. Whenw e assign f to itself, that
forces the recalculation of variable rec with
the ofs:
10> (awk (:inputs '("R=r1,R=r2,R=r3,E=e"))
(:set fs "," ofs ":")
(t (set f f) (prn)))
R=r1:R=r2:R=r3:E=e
nil
Use two Awks, nested inside each other: inner Awk
processes the fields f produced by the outer Awk:
11> (awk (:inputs '("R=r1,R=r2,R=r3,E=e"))
(:set fs "," ofs ":")
(t (awk (:inputs f)
(:set fs "=")
(t (prn [f 1])))))
r1
r2
r3
e
nil
-- TXR Programming Language: http://nongnu.org/txrCygnal: Cygwin Native Application Library: http://kylheku.com/cygnalMastodon: @Kazinator@mstdn.ca