Newsportal USENET - Re: Experiences with match() subexpressions?

On 10.04.2025 13:08, Kenny McCormack wrote:

In article <vt7qs4$2gior$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
On 10.04.2025 09:06, Janis Papanagnou wrote:
I'm looking for subexpressions of regexp-matches using GNU Awk's
third parameter of match(). For example
>
data = "R=r1,R=r2,R=r3,E=e"
match (data, /^(R=([^,]+),){2,5}E=(.+)$/, arr)
>
The result stored in 'arr' seems to be determined by the static
parenthesis structure, so with the pattern repetition {2,5} only
the last matched data in the subexpression (r3) seems to persist
in arr. - I suppose there's no cute way to achieve what I wanted?
>
To clarify; what I wanted is access of the values "r1", "r2", "r3",
and "e" through 'arr'.

I have to admit that I (still) don't really understand how this match third
arg stuff works.

I've never used that before but it seems to be quite simple; for every
parenthesis group expression in the regexp it provides (statically, as
the parentheses are written, from left to right) an array element with
the expanded matched subexpression.

I.e., I can never predict what will happen, so I always
just dump out the array and try to reverse-engineer it each time I need to
use it.

I adapted your code into the following test script:

--- Cut Here ---
#!/bin/sh
gawk 'BEGIN {
data = "R=r1,R=r2,R=r3,E=e"
match (data, /^(R=([^,]+),){2,5}E=(.+)$/, arr)
for (i in arr) print i,arr[i]
}'

# To clarify; what I wanted is access of the values "r1", "r2", "r3",
# and "e" through 'arr'.
--- Cut Here ---

The output I get is:

--- Cut Here ---
0start 1
0length 18
3start 18
1start 11
2start 13
3length 1
2length 2
1length 5

Above output stuff appears because in 'arr' there's additional elements
about the pattern positions stored.

I don't need that so I'm just interested in the data patterns below and
iterate with a index-counted loop...

0 R=r1,R=r2,R=r3,E=e

the whole expression

1 R=r3,

the expression in the first parenthesis

2 r3

the expression in the second, embedded parenthesis

3 e

the expression in the final parenthesis

--- Cut Here ---

After playing around a bit, I could not come up with any sensible way of
getting what you want to get.

Yeah, Arnold just told me the same; that it's impossible because the
underlying GNU regexp library doesn't support what I'm looking for.

What I considered a possible workaround (in this case) is to sequence
the (...){2,5} expression by using sequences of (...)? expressions.
(But in the general case, for larger ranges than 2-5, that's neither
feasible nor sensible any more.)

As an alternative, it sounds like you could just could just split the
string on the comma; that would get you:

Yes, that was also how I did such things in the past. Only when I saw
that "third argument" to match() I hoped the two-level parsing could
be simplified in one step. The reason was that I thought to have seen
other languages (Perl, maybe?) that supported such a feature.

R=r1
R=r2
R=r3
E=e

Or, for finer control, you could use patsplit().

I think I'll do the parsing the straightforward two-step way as I did
before the GNU Awk specific functions were available; it's probably
also the clearest way to program that functionality.

Janis

Date	Sujet	#	Auteur
10 Apr 25	Experiences with match() subexpressions?	22	Janis Papanagnou
10 Apr 25	Re: Experiences with match() subexpressions?	21	Janis Papanagnou
10 Apr 25	Re: Experiences with match() subexpressions?	14	Kenny McCormack
10 Apr 25	Re: Experiences with match() subexpressions?	13	Janis Papanagnou
10 Apr 25	Re: Experiences with match() subexpressions?	12	Kenny McCormack
10 Apr 25	Re: Experiences with match() subexpressions?	11	Janis Papanagnou
11 Apr 25	Re: Experiences with match() subexpressions?	10	Aharon Robbins
11 Apr 25	Re: Experiences with match() subexpressions?	5	Janis Papanagnou
11 Apr 25	Re: Experiences with match() subexpressions?	1	Kaz Kylheku
18 Apr 25	Re: Experiences with match() subexpressions?	3	Manuel Collado
18 Apr 25	Re: Experiences with match() subexpressions?	1	Kenny McCormack
18 Apr 25	Re: Experiences with match() subexpressions?	1	Janis Papanagnou
11 Apr 25	Re: Experiences with match() subexpressions?	1	Kaz Kylheku
11 Apr 25	The new matcher (Was: Experiences with match() subexpressions?)	2	Kenny McCormack
11 Apr 25	Re: The new matcher (Was: Experiences with match() subexpressions?)	1	Janis Papanagnou
11 Apr 25	Re: Experiences with match() subexpressions?	1	Kaz Kylheku
11 Apr 25	Re: Experiences with match() subexpressions?	6	Ed Morton
13 Apr 25	Re: Experiences with match() subexpressions?	5	Ed Morton
14 Apr 25	Nitpicking the code (Was: Experiences with match() subexpressions?)	4	Kenny McCormack
14 Apr 25	Re: Nitpicking the code (Was: Experiences with match() subexpressions?)	3	Janis Papanagnou
15 Apr 25	Re: Nitpicking the code (Was: Experiences with match() subexpressions?)	2	Ed Morton
15 Apr 25	Re: Nitpicking the code (Was: Experiences with match() subexpressions?)	1	Janis Papanagnou