Sujet : Re: "sed" question
De : Keith.S.Thompson+u (at) *nospam* gmail.com (Keith Thompson)
Groupes : comp.lang.awkDate : 08. Mar 2024, 05:06:00
Autres entêtes
Organisation : None to speak of
Message-ID : <87zfv9mkpj.fsf@nosuchdomain.example.com>
References : 1 2 3 4 5 6 7 8
User-Agent : Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)
Grant Taylor <
gtaylor@tnetconsulting.net> writes:
On 3/7/24 18:09, Keith Thompson wrote:
I know that's what awk does, but I don't think I would have expected
it if I didn't know about it.
>
Okay. I think that's a fair observation.
>
$0 is the current input line.
>
Or $0 is the current /record/ in awk parlance.
Yes.
If you don't change anything, or if you modify $0 itself, whitespace
betweeen fields is preserved.
>
If you modify any of the fields, $0 is recomputed and whitespace
between tokens is collapsed.
>
I don't agree with that.
>
% echo 'one two three' | awk '{print $0; print $1,$2,$3}'
one two three
one two three
>
I didn't /modify/ anything and awk does print the fields with
different white space.
That's just the semantics of print with comma-delimited arguments, just
like:
% awk 'BEGIN{a="foo"; b="bar"; print a, b}'
foo bar
Printing the values of $1, $2, and $3 doesn't change $0. Writing to any
of $1, $2, $3, even with the same value, does change $0.
$ echo 'one two three' | awk '{print $0; print $1,$2,$3; print $0; $2 = $2; print $0}'
one two three
one two three
one two three
one two three
awk *could* have been defined to preserve inter-field whitespace
even when you modify individual fields,
>
I question the veracity of that. Specifically when lengthening or
shortening the value of a field. E.g. replacing "two" with
"fifteen". This is particularly germane when you look at $0 as a fixed
width formatted output.
But awk doesn't work with fixed-width data. The length of each field,
and the length of $0, is variable.
If awk *purely* dealt with input lines only as lists of tokens, then
this:
echo 'one two three' | awk '{print $0}'
would print "one two three" rather than "one two three" (and awk would
lose the ability to deal with arbitrarily formatted input). The fact
that the inter-field whitespace is reset only when individual fields are
touched feels arbitrary to me.
and I think I would have found that more intuitive.
>
I don't agree.
>
(And ideally there would be a way to refer to that inter-field
whitespace.)
>
Remember, awk is meant for working on fields of data in a record. By
default, the fields are delimited by white space characters. I'll say it this way, awk is meant for working on the non-white space
characters. Or yet another way, awk is not meant for working on
white space charters.
Awk has strong builtin support for working on whitespace-delimited
fields, and that support tends to ignore the details of that whitespace.
But you can also write awk code that just deals with $0.
One trivial example:
awk '{ count += length + 1 } END { print count }'
behaves similarly to `wc -l`, and counts whitespace characters just like
any other characters.
The fact that modifying a field has the side effect of messing up $0
seems counterintuitive.
>
Maybe.
>
But I think it's one that is acceptable for what awk is intended to do.
It's also the existing behavior, and changing it would break things, so
I wouldn't suggest changing it.
Perhaps the behavior matches your intuition better than it matches
mine.
>
I sort of feel like you are wanting to / trying to use awk in places
where sed might be better. sed just sees a string of text and is ignorant of any structure without a carefully crafted RE to provide it.
Not really. I'm just remarking on one particular awk feature that I
find a bit counterintuitive.
Awk is optimized for working on records consisting of fields, and not
caring much about how much whitespace there is between fields. But it's
flexible enought to do *lots* of other things.
[...]
-- Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.comWorking, but not speaking, for Medtronicvoid Void(void) { Void(); } /* The recursive call of the void */