Sujet : Re: relearning C: why does an in-place change to a char* segfault?
De : 643-408-1753 (at) *nospam* kylheku.com (Kaz Kylheku)
Groupes : comp.lang.cDate : 01. Aug 2024, 20:39:04
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <20240801114615.906@kylheku.com>
References : 1
User-Agent : slrn/pre1.0.4-9 (Linux)
On 2024-08-01, Mark Summerfield <
mark@qtrac.eu> wrote:
This program segfaults at the commented line:
>
#include <ctype.h>
#include <stdio.h>
>
void uppercase_ascii(char *s) {
while (*s) {
*s = toupper(*s); // SEGFAULT
s++;
}
}
>
int main() {
char* text = "this is a test";
The "this is a test" object is a literal. It is part of the program's image.
When you try to change it, you're making your program self-modifying.
The ISO C language standard doesn't require implementations to support
self-modifying programs; the behavior is left undefined.
It could work in some documented, reliable way, in a given
implementation.
It's the same with any other constant in the program. Say you have
a malloc(1024) somewhere in the program. That 1024 number is encoded
into the program's image somhow, and in principle you could write code
to somehow get at that number and change it to 256. Long before you got
that far, you would be in undefined behavior territory. If it worked,
it could have surprising effects. For instance, there could be another
call to malloc(1024) in the program and, surprisingly, *that* one also
changes to malloc(256).
A literal like "this is a test" is similar to that 1024, except
that it's very easy to get at it. The language defines it aws an object
with an address, and to get that address all we have to do is evaluate
that expression itself. A minimal piece of code that requests the
undefined consequences of modifying a string literal is as easy
as "a"[0] = 0.
Program received signal SIGSEGV, Segmentation fault.
0x000055555555516e in uppercase_ascii (s=0x555555556004 "this is a test")
at inplace.c:6
6 *s = toupper(*s);
On Linux, the string literals of a C executable are located together
with the program text. They are interspersed among the machine
instructions which reference them. The program text is mapped
read-only, so an attempted modification is an access violation trapped
by the OS, turned into a SIGSEGV signal.
GCC uses to have a -fwritable-strings option, but it has been removed
for quite some time now.
-- TXR Programming Language: http://nongnu.org/txrCygnal: Cygwin Native Application Library: http://kylheku.com/cygnalMastodon: @Kazinator@mstdn.ca