Sujet : Re: Case Insensitive File Systems -- Torvalds Hates Them
De : rich (at) *nospam* example.invalid (Rich)
Groupes : comp.os.linux.miscDate : 08. May 2025, 02:58:00
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vvh338$1c1cf$2@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13
User-Agent : tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64))
Computer Nerd Kev <
not@telling.you.invalid> wrote:
Richard Kettlewell <invalid@invalid.invalid> wrote:
Second, the issue in shell is is that newlines (or spaces) interact
badly with its approach to string handling: a filename can cause a
script to unexpectedly fail. For all that C has truly awful string
handling, it doesn't go awry just because there's a space or newline in
a string that it's working with.
When dealing with programs like find, sort, uniq etc. it's more of a
data format issue than a shell issue. As in the link from the GNU
find documentation which I supplied before where "find" apparantly
runs "sort" itself and needs that to support null-terminated line
delimiters to handle newlines in filenames, rather than the default
newline-terminated format:
http://www.gnu.org/software/findutils/manual/html_node/find_html/Newline-Handling.html
That is not the "find" documentation, that is the documentation for
locate/updatedb and the db format they use to make locating a filename
faster than a raw disk scan. That 'running of sort' is done by
'updatedb', not find.
Of course a C program can use any character to separate strings
A C program uses (if it uses libc's string routines) ASCII nulls to
mark the end of a C string in memory. Of course if you want to build
your own "C string" library (C's strings are provided by libc, not the
C language itself) you can choose to delimit strings however you wish.
but newlines are most common in existing UNIX tools for text string
processing,
They are common in the CLI because:
1) they are easy for humans to enter (there's a big key dedicated to
creating them)
2) they produce the expected output in most terminal emulators (and/or
on actual hardware terminals) and printers
and most easily human-readable, so it's convenient to use
that data format.
Newlines are called "control characters" for a reason. Instead of
being "printable" (meaning having a character glyph) they instead
perform a "control function". That being to being the display of
actual printables on a next line down in a terminal or printer.
But it means assuming that newlines in filenames won't actually
appear.
Which, in reality, is all but true unless someone is going out of their
way to experiment or be very odd. The only time I've ever encountered
filenames with newlines has been when I've deliberately created them to
verify some bit of code (or to try to break some bit of code, although
verify/break often go hand in hand).