Sujet : Re: Buffer contents well-defined after fgets() reaches EOF ?
De : janis_papanagnou+ng (at) *nospam* hotmail.com (Janis Papanagnou)
Groupes : comp.lang.cDate : 10. Feb 2025, 02:35:01
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vobl46$td52$1@dont-email.me>
References : 1 2 3 4 5 6
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
On 10.02.2025 01:57, Keith Thompson wrote:
[...]
Here's (some of) what the C standard says about text streams:
A text stream is an ordered sequence of characters composed into
lines, each line consisting of zero or more characters plus a
terminating new-line character. Whether the last line requires
a terminating new-line character is implementation-defined.
For an implementation that *doesn't* require a new-line on the
last line, a stream without a trailing new-line is valid. For an
implementation that *does* require it, such a stream is invalid,
and a program that attempts to process it can have undefined behavior.
This is what "C" accepts (or tolerates), yes.
Given that some folks with the aid of some fancy editors makes it
possible to suppress (or not create) the final line ending - bytes
are still expensive it seems - I suppose it's a sensible requirement
for "C" compilers to be tolerant here.
Most modern implementations don't require that trailing new-line.
For example, `echo -n hello > hello.txt` creates a valid text file.
Of course a C program that deals with text files can impose any
additional restrictions its author likes.
And cat alpha.c beta.c > gamma.c will create inconsistent texts if
there's no line terminator on the last lines of some files.
The above describes how a text stream looks to a C program. The
external representation can be quite different, with transformations
to map between them.
(Concerning this thread; I'm anyway operating on custom data files
in plain text format, so I'm less concerned about how "C" compilers
expect their "C" source.)
The most common such transformation is
mapping the DOS/Windows CR-LF line terminator to LF on input, and
vice versa on output. Or the external representation might store
each line as a fixed-length character sequence padded with spaces.
I appreciate that the editor I use keeps data consistent but allows
an explicit change between Unix and DOS text modes (where necessary
of if desired).
The most extreme context I had worked in was a company that allowed
(for every employee) a free choice of used computer technology; that
led to program text files that literally had all the inconsistencies.
Since many files were edited by different folks there where all sorts
of line terminators mixed even in the same one file, and there either
were complete last lines or not. The (some?) IDEs used were tolerant
WRT line terminators and their mixing. Other tools reacted sensibly.
The first thing I've done was to write a "C" tool to detect and fix
these sorts of inconsistencies.
Janis