Sujet : Re: substr() - copying or not copying, that is here the question.
De : janis_papanagnou+ng (at) *nospam* hotmail.com (Janis Papanagnou)
Groupes : comp.lang.awkDate : 31. May 2025, 23:16:58
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <101fv4s$1g5c8$1@dont-email.me>
References : 1 2
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
On 31.05.2025 21:07, Mack The Knife wrote:
In article <101f9oo$18edp$1@dont-email.me>,
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
In the context p=index(substr(t,s),r)
it would not be necessary to copy the substr(t,s),
the index() function could operate on the original
using some access "descriptor" (say, a pointer and
a length) in read-only mode.
>
Will (GNU) Awk do a copy of the data value or does
it use a read-only descriptor access to the already
existing substring of variable "t"?
>
Currently I'm playing with some huge data and copies
of MB sized data is costly (if it's repeatedly done
with various substr() subscripts).
substr() makes a copy. This is clear in the code.
Okay. Thanks for checking that!
It's almost impossible to do this via read-only descriptor.
Consider something like
x = substr($0, 10, 15)
getline
print x
Well, it's possible to do that with a descriptor if GNU
Awk had a delayed/lazy evaluation principle implemented.
(Before 'getline' invalidates $0 a copy is necessary, of
course.)
(It's been reported that there's some optimizations in
GNU Awk implemented, so it could have also be the case
here. That's why I'm asking.)
Gawk manages the storage such that for something like
your example the copy will be released after index()
returns a value.
As said, I'm working on a huge string of data. What are
other options to efficiently work on substring parts of
the data? With the result of your code-check I don't see
a chance to achieve that with GNU or maybe any Awk using
only standard functionality.
Okay, maybe I could write an extension to work on memory
mapped files - the data originally stems from a file -
and seek/read through "C" mechanisms. (But that's huge
effort compared to some natively available function. And
then I'd probably better implement that straightly in "C"
instead of using Awk, in the first place, since I'd have
to implement the GNU Awk Extension anyway in "C".)
Janis