Re: "undefined behavior"?

Liste des GroupesRevenir à cl c 
Sujet : Re: "undefined behavior"?
De : david.brown (at) *nospam* hesbynett.no (David Brown)
Groupes : comp.lang.c
Date : 13. Jun 2024, 15:15:55
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v4erec$29e2g$1@dont-email.me>
References : 1 2 3
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0
On 13/06/2024 00:29, DFS wrote:
On 6/12/2024 5:38 PM, David Brown wrote:
On 12/06/2024 22:47, DFS wrote:
Wrote a C program to mimic the stats shown on:
>
https://www.calculatorsoup.com/calculators/statistics/descriptivestatistics.php
>
My code compiles and works fine - every stat matches - except for one anomaly: when using a dataset of consecutive numbers 1 to N, all values  > 40 are flagged as outliers.  Up to 40, no problem.  Random numbers dataset of any size: no problem.
>
And values 41+ definitely don't meet the conditions for outliers (using the IQR * 1.5 rule).
>
Very strange.
>
Edit: I just noticed I didn't initialize a char:
before: char outliers[100];
after : char outliers[100] = "";
>
And the problem went away.  Reset it to before and problem came back.
>
Makes no sense.  What could cause the program to go FUBAR at data point 41+ only when the dataset is consecutive numbers?
>
Also, why doesn't gcc just do you a solid and initialize to "" for you?
>
>
It is /really/ difficult to know exactly what your problem is without seeing your C code!  There may be other problems that you haven't seen yet.
 The outlier section starts on line 169
=====================================================================================
<snip>
Apart from the initialisation issue, I would suggest you re-consider the way you add strings to the "outliers" buffer.  If there are two many of them, it will overflow - there's nothing to stop you putting more than 200 characters into it.  I would recommend dropping the "temp" variable and instead keep track of a pointer to the terminated null character of your current "outliers" string.  Use "snprintf" to "print" directly into the string, rather than going via "temp", and use the return value of the "snprintf" to update your end pointer.  You will easily be able to avoid the risk of overrun, while also being slightly more efficient too.
The line:
outliers[strlen(outliers)] = '\0';
is completely useless.  "strlen" starts at the beginning of "outliers", and counts along until it finds a null character - thus either "outliers[strlen(outliers)]" is already equal to '\0', or your attempt at calculating "strlen" with an overrun buffer will lead to more undefined behaviour.

 
Non-static local variables without initialisers have "indeterminate" value if there is no initialiser.  Trying to use these "indeterminate" values is undefined behaviour - you have absolutely no control over what might happen.  Any particular behaviour you see is done to luck from the rest of the code and what happened to be in memory at the time.
 In 2024 that's surprising.  I can't be the only one to forget to initialize a char[] variable.
 
You are not - attempting to use an uninitialised variable is a common error.  That is why C compilers provide warnings about this kind of thing, along with run-time tools like the sanitizers Ben recommended, to help find such mistakes.  But compiler vendors can't force people to use such tools and warning flags, nor can the tools find /all/ cases of errors.  At some point, programmers have to take responsibility for knowing the language they are using, and writing their code correctly. Good tools and good use of those tools is an aid to careful coding, not an alternative to it.

 
There is no automatic initialisation of non-static local variables, because that would often be inefficient.
 It would've saved me half an hour of frustration.
And the things you have learned as a result - from your own debugging, and the threads here - will save you many more hours of frustration in the future.
There are languages that focus on ease of use and do all the management of things like strings and buffers, and prevent users from mistakes like this, at the cost of slower run-times.  There are languages that do very little automatically for the programmer and have absolutely minimal overheads, for maximal efficiency.  C is the later kind of language.
Remember, while you might see automatic initialisation of local variables as a negligible overhead, other people might not - I've worked on C code for microcontrollers where a wasted processor cycle or two is too much.  If your code does not care about such efficiencies, then you have to question whether C is the right language in the first place.  I believe most modern code that is written in C would be better if it were written in other higher level languages (precisely because a half hour of /your/ time is usually more valuable than a few microseconds of your computer's time).
On the subject of initialisation, I strongly suggest that you do /not/ get in the habit of always initialising your variables to 0 when you define them.  Do that only if 0 is the real, appropriate starting value.   Prefer to avoid declaring the variable at all until you need it, then define it with its initial value (and consider making it "const" to reduce the risk of other coding errors).  If the structure of the code requires you to define the variable before you have a value for it, prefer to leave it without an initial value.  Then compiler warnings have a much better chance of spotting mistakes.

 Now I'm getting 'stack smashing detected' errors (after the program runs correctly) when using datasets of consecutive numbers.
 
I think Ben found that buffer overrun for you, and showed you how to find it yourself in the future.

hmmmm 2 issues in a row using consecutives - that's a clue!
  
The best way to avoid errors like yours, IMHO, is not to declare such variables until you have data to put in them - thus you always have a sensible initialiser of real data.  Occasionally that is not practical, but it works in most cases.
 Data is definitely going in them: either the value 'none' or a list of the outliers and some text.
 
Now that I have your source code, I can see the error is the way you put data in - strcat() reads the existing data, it does not just write data.

 
For a data array, zero initialisation is common.  Typically you do this with :
>
     int xs[100] = { 0 };
>
That puts the explicit 0 in the first element of xs, and then the rest of the array is cleared with zeros.
 
I recommend never using "char" as a type unless you really mean a  > character, limited to 7-bit ASCII.  So if your "outliers" array really
is an array of such characters, "char" is fine.  If it is intended to be numbers and for some reason you specifically want 8-bit values, use "uint8_t" or "int8_t", and initialise with { 0 }.
 I did mean characters, limited to: 0-9a-zA-Z()
OK.

 I think I'm using the char variable correctly.
  sprintf(tempchar,"%d ",outlier);
  strcat(char,tempchar);
Yes.  Without your source code, I could only guess.
But see earlier in this post for a suggestion to improve your use of the variable.

 
A major lesson here is to learn how to use your tools.  C is not a forgiving language.  Make use of all the help your tools can give you - enable warnings here.  "gcc -Wall" enables a range of common warnings with few false positives in normal well-written code, including ones that check for attempts to read uninitialised data.
 I always use -Wall, and I was using it here.
 
Good.  Unfortunately, good though gcc is, it is not perfect.  Improving warnings is a continuous endeavour for the gcc developers, but they usually have to err on the side of avoiding false positives.

 "-Wextra" enables a
slew of extra warnings.  Some of these will annoy people and trigger on code they find reasonable, while most are good choices for a lot of code - but personal preference varies significantly.  And remember to enable optimisation, since it makes the static checking more powerful.
 Just did this:
gcc -Wall -Wextra -O3 mmv2.c -o mmv2 -lm
 
"-O3" is rarely much use - stick to "-O2" for normal use.  The extra optimisations enabled by "-O3" help in some code, but work worse on other code due to the increased size, so they should be used with care. Certainly "-O3" is rarely worth it unless you are also using a "-march=" flag (such as "-fmarch=native") to tune for a particular processor and enable stuff like vectorisation.  Getting the fastest code is more of an art than a science!

and no warnings or errors at all.
 But: it now aborts near the front when using consecutive data points (but not randoms).
 *** buffer overflow detected ***: terminated
Aborted
 I'm actually happy about that.  I should be able to find and fix it.
  
If you /really/ want gcc to zero out such local data automatically, use "-ftrivial-auto-var-init=zero".  But it is much better to use warnings and write correct code - options like that one are an addition to well-checked code for paranoid software in security-critical contexts.
  Great answer!   I can always count on D Brown for excellent advice. Thank you.
 
I try :-)
You get the best results by combing the advice from a variety of people here, along with your own experimentations.

Date Sujet#  Auteur
12 Jun 24 * "undefined behavior"?77DFS
12 Jun 24 +* Re: "undefined behavior"?39Barry Schwarz
12 Jun 24 i`* Re: "undefined behavior"?38DFS
13 Jun 24 i `* Re: "undefined behavior"?37Keith Thompson
13 Jun 24 i  `* Re: "undefined behavior"?36DFS
13 Jun 24 i   `* Re: "undefined behavior"?35Keith Thompson
13 Jun 24 i    `* Re: "undefined behavior"?34Malcolm McLean
13 Jun 24 i     +- Re: "undefined behavior"?1Ben Bacarisse
13 Jun 24 i     +* Re: "undefined behavior"?29bart
13 Jun 24 i     i+* Re: "undefined behavior"?22Malcolm McLean
13 Jun 24 i     ii+* Re: "undefined behavior"?2Chris M. Thomasson
14 Jun 24 i     iii`- Re: "undefined behavior"?1Malcolm McLean
14 Jun 24 i     ii`* Re: "undefined behavior"?19Ben Bacarisse
14 Jun 24 i     ii `* Re: "undefined behavior"?18Malcolm McLean
14 Jun 24 i     ii  `* Re: "undefined behavior"?17Ben Bacarisse
14 Jun 24 i     ii   +* Re: "undefined behavior"?13Malcolm McLean
14 Jun 24 i     ii   i+* Re: "undefined behavior"?4Richard Harnden
14 Jun 24 i     ii   ii`* Re: "undefined behavior"?3Malcolm McLean
14 Jun 24 i     ii   ii `* Re: "undefined behavior"?2bart
14 Jun 24 i     ii   ii  `- Re: "undefined behavior"?1Malcolm McLean
14 Jun 24 i     ii   i`* Re: "undefined behavior"?8Ben Bacarisse
15 Jun 24 i     ii   i `* Re: "undefined behavior"?7Malcolm McLean
15 Jun 24 i     ii   i  +- Re: "undefined behavior"?1Ben Bacarisse
15 Jun 24 i     ii   i  `* Re: "undefined behavior"?5David Brown
15 Jun 24 i     ii   i   `* Re: "undefined behavior"?4Richard Harnden
16 Jun 24 i     ii   i    +- Re: "undefined behavior"?1Ben Bacarisse
16 Jun 24 i     ii   i    `* Re: "undefined behavior"?2David Brown
16 Jun 24 i     ii   i     `- Re: "undefined behavior"?1Malcolm McLean
14 Jun 24 i     ii   `* Re: "undefined behavior"?3Chris M. Thomasson
14 Jun 24 i     ii    `* Re: "undefined behavior"?2Ben Bacarisse
15 Jun 24 i     ii     `- Re: "undefined behavior"?1Chris M. Thomasson
14 Jun 24 i     i`* Re: "undefined behavior"?6Keith Thompson
14 Jun 24 i     i +- Re: "undefined behavior"?1bart
14 Jun 24 i     i +* Re: "undefined behavior"?3David Brown
14 Jun 24 i     i i`* Re: "undefined behavior"?2Keith Thompson
15 Jun 24 i     i i `- Re: "undefined behavior"?1David Brown
14 Jun 24 i     i `- Re: "undefined behavior"?1Keith Thompson
13 Jun 24 i     `* Re: "undefined behavior"?3Keith Thompson
14 Jun 24 i      `* Re: "undefined behavior"?2Malcolm McLean
14 Jun 24 i       `- Re: "undefined behavior"?1Keith Thompson
12 Jun 24 +* Re: "undefined behavior"?15David Brown
13 Jun 24 i+* Re: "undefined behavior"?6Keith Thompson
13 Jun 24 ii+* Re: "undefined behavior"?2David Brown
14 Jun 24 iii`- Re: "undefined behavior"?1Keith Thompson
19 Jun 24 ii`* Re: "undefined behavior"?3Tim Rentsch
19 Jun 24 ii `* Re: "undefined behavior"?2Keith Thompson
22 Jun 24 ii  `- Re: "undefined behavior"?1Tim Rentsch
13 Jun 24 i`* Re: "undefined behavior"?8DFS
13 Jun 24 i +* Re: "undefined behavior"?4Ike Naar
13 Jun 24 i i`* Re: "undefined behavior"?3DFS
13 Jun 24 i i `* Re: "undefined behavior"?2Lew Pitcher
13 Jun 24 i i  `- Re: "undefined behavior"?1DFS
13 Jun 24 i `* Re: "undefined behavior"?3David Brown
14 Jun 24 i  `* Re: "undefined behavior"?2Keith Thompson
14 Jun 24 i   `- Re: "undefined behavior"?1David Brown
12 Jun 24 +* Re: "undefined behavior"?19Janis Papanagnou
13 Jun 24 i`* Re: "undefined behavior"?18Keith Thompson
13 Jun 24 i +* Re: "undefined behavior"?2Janis Papanagnou
13 Jun 24 i i`- Re: "undefined behavior"?1David Brown
13 Jun 24 i `* Re: "undefined behavior"?15David Brown
13 Jun 24 i  `* Re: "undefined behavior"?14DFS
14 Jun 24 i   `* Re: "undefined behavior"?13David Brown
15 Jun 24 i    +* Re: "undefined behavior"?11DFS
15 Jun 24 i    i`* Re: "undefined behavior"?10Keith Thompson
15 Jun 24 i    i `* Re: "undefined behavior"?9DFS
15 Jun 24 i    i  `* Re: "undefined behavior"?8Keith Thompson
15 Jun 24 i    i   `* Re: "undefined behavior"?7DFS
15 Jun 24 i    i    +* Re: "undefined behavior"?2Janis Papanagnou
15 Jun 24 i    i    i`- Re: "undefined behavior"?1DFS
15 Jun 24 i    i    +- Re: "undefined behavior"?1James Kuyper
15 Jun 24 i    i    +- Re: "undefined behavior"?1Keith Thompson
15 Jun 24 i    i    +- Re: "undefined behavior"?1bart
15 Jun 24 i    i    `- Re: "undefined behavior"?1David Brown
15 Jun 24 i    `- Re: "undefined behavior"?1David Brown
12 Jun 24 +- Re: "undefined behavior"?1Keith Thompson
13 Jun 24 +- Re: "undefined behavior"?1bart
13 Jun 24 `- Re: "undefined behavior"?1Bonita Montero

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal