Re: realloc() - frequency, conditions, or experiences about relocation?

Liste des GroupesRevenir à l c 
Sujet : Re: realloc() - frequency, conditions, or experiences about relocation?
De : malcolm.arthur.mclean (at) *nospam* gmail.com (Malcolm McLean)
Groupes : comp.lang.c
Date : 17. Jun 2024, 17:36:57
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v4pl6q$nn9o$1@dont-email.me>
References : 1 2 3 4 5 6
User-Agent : Mozilla Thunderbird
On 17/06/2024 15:33, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
 
On 17/06/2024 10:55, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
>
On 17/06/2024 10:18, Ben Bacarisse wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>
In a recent thread realloc() was a substantial part of the discussion.
"Occasionally" the increased data storage will be relocated along
with the previously stored data. On huge data sets that might be a
performance factor. Is there any experience or are there any concrete
factors about the conditions when this relocation happens? - I could
imagine that it's no issue as long as you're in some kB buffer range,
but if, say, we're using realloc() to substantially increase buffers
often it might be an issue to consider. It would be good to get some
feeling about that internal.
There is obviously a cost, but there is (usually) no alternative if
contiguous storage is required.  In practice, the cost is usually
moderate and can be very effectively managed by using an exponential
allocation scheme: at every reallocation multiply the storage space by
some factor greater than 1 (I often use 3/2, but doubling is often used
as well).  This results in O(log(N)) rather than O(N) allocations as in
your code that added a constant to the size.  Of course, some storage is
wasted (that /might/ be retrieved by a final realloc down to the final
size) but that's rarely significant.
>
So can we work it out?
What is "it"?
>
Let's assume for the moment that the allocations have a semi-normal
distribution,
What allocations?  The allocations I talked about don't have that
distribution.
>
with negative values disallowed. Now ignoring the first few
values, if we have allocated, say, 1K, we ought to be able to predict the
value by integrating the distribution from 1k to infinity and taking the
mean.
I have no idea what you are talking about.  What "value" are you looking
to calculate?
>
We have a continuously growing buffer, and we want the best strategy for
reallocations as the stream of characters comes at us. So, given we now how
many characters have arrived, can we predict how many will arrive, and
therefore ask for the best amount when we reallocate, so that we neither
make too many reallocation (reallocate on every byte received) or ask for
too much (demand SIZE_MAX memory when the first byte is received).?
 Obviously not, or we'd use the prediction.  You question was probably
rhetorical, but it didn't read that way.
 
Your strategy for avoiding these extremes is exponential growth.
 It's odd to call it mine.  It's very widely know and used.  "The one I
mentioned" might be less confusing description.
 
You
allocate a small amount for the first few bytes. Then you use exponential
growth, with a factor of ether 2 or 1.5. My question is whether or not we
can be cuter. And of course we need to know the statistical distribution of
the input files. And I'm assuming a semi-normal distribution, ignoring the
files with small values, which we will allocate enough for anyway.
>
And so we integrate the distribution between the point we are at and
infinity. Then we tkae the mean. And that gives us a best estimate of how
many bytes are to come, and therefore how much to grow the buffer by.
 I would be surprised if that were worth the effort at run time.  A
static analysis of "typical" input sizes might be interesting as that
could be used to get an estimate of good factors to use, but anything
more complicated than maybe a few factors (e.g. doubling up to 1MB then
3/2 thereafter) is likely to be too messy to useful.
>
There's virualy no run-time effort, unless you ask caller to pass in a customised distribution, which you analyse on the fly, which would be quite a bit of work.
All the work is done beforehand. We need a statistical distribution of the files sizes of the files we are interesed in. So, probably, text files on personal computers. Then we'll excude the small files, say under 1kb which will have an odd distribution for various reasons, and which we are not interested in as we can easily afford 1k as an initial buffer.
And we're probably looking at a semi- normal, maybe log- normal distribution. There's no reason to suspect it would be anything odd. And with the normal distribution there is no closed form integral, but tables of integrals are published.
So we convert 1K to a Z-score, integrate from that to infinity, halve the result, and that gives us an estimate of the most likely file size - having established that the file is over 1k, half will be below and half above this size. So that's the next amount to realloc. Say, for the sake of argument, 4K. Then we do the same thing, starting from 4k, and working out the most likely file size, given that the file is over 4K. Now the disribution tends to flatten out towards the tail, so if best guess, given at least 1K, was 4K, best guess diven 4k, won't be 8K. It will be 10k, maybe 12k. Do the same again for 12k. And we'll get a series of numbers like this.
1k, 4k, 10k, 20k, 50k, 120k, 500k, 2MB, 8MB ...
and so on, rapidly increasing to SIZE_MAX. And then at runtime we just hardcode those in, it's a lookup table with not too many entries.
Becuase we've chosen the mean, half the time you will reallocate. You can easily fine tune the strategy by choosing a proportion other than 0.5, depending on whether saving memory or reducing allocations is more important to you.
and the hard part is getting some real statistics to work on.
 >
Also, the cost of reallocations is not constant.  Larger ones are
usually more costly than small ones, so if one were going to a lot of
effort to make run-time guesses, that cost should be factored in as
well.
 
Unfortunately yes. Real optimisation problems can be almost impossible for reasons like this. iF the cost of reallocations isn't constant, tou've got to put in correctiong factors, and then what was a fairly simple procedure becomes difficult.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Date Sujet#  Auteur
17 Jun 24 * realloc() - frequency, conditions, or experiences about relocation?95Janis Papanagnou
17 Jun 24 +- Re: realloc() - frequency, conditions, or experiences about relocation?1Chris M. Thomasson
17 Jun 24 +* Re: realloc() - frequency, conditions, or experiences about relocation?53Ben Bacarisse
17 Jun 24 i`* Re: realloc() - frequency, conditions, or experiences about relocation?52Malcolm McLean
17 Jun 24 i +* Re: realloc() - frequency, conditions, or experiences about relocation?50Ben Bacarisse
17 Jun 24 i i`* Re: realloc() - frequency, conditions, or experiences about relocation?49Malcolm McLean
17 Jun 24 i i +* Re: realloc() - frequency, conditions, or experiences about relocation?21Ben Bacarisse
17 Jun 24 i i i+* Re: realloc() - frequency, conditions, or experiences about relocation?17Anton Shepelev
18 Jun 24 i i ii`* Re: realloc() - frequency, conditions, or experiences about relocation?16Tim Rentsch
18 Jun 24 i i ii +* Re: realloc() - frequency, conditions, or experiences about relocation?8Malcolm McLean
18 Jun 24 i i ii i+* Re: realloc() - frequency, conditions, or experiences about relocation?5Malcolm McLean
29 Jun 24 i i ii ii`* Re: realloc() - frequency, conditions, or experiences about relocation?4Lawrence D'Oliveiro
2 Jul 24 i i ii ii `* Re: realloc() - frequency, conditions, or experiences about relocation?3Malcolm McLean
2 Jul 24 i i ii ii  +- Re: realloc() - frequency, conditions, or experiences about relocation?1Ben Bacarisse
4 Jul 24 i i ii ii  `- Re: realloc() - frequency, conditions, or experiences about relocation?1Lawrence D'Oliveiro
24 Jun 24 i i ii i`* Re: realloc() - frequency, conditions, or experiences about relocation?2Tim Rentsch
24 Jun 24 i i ii i `- Re: realloc() - frequency, conditions, or experiences about relocation?1David Brown
20 Jun 24 i i ii `* Indefinite pronouns [was:Re: realloc() - frequency, conditions, or experiences about relocation?]7Anton Shepelev
20 Jun 24 i i ii  +* Re: Indefinite pronouns3vallor
21 Jun 24 i i ii  i`* Re: Indefinite pronouns2David Brown
21 Jun 24 i i ii  i `- Re: Indefinite pronouns1Keith Thompson
20 Jun 24 i i ii  +* Re: Indefinite pronouns [was:Re: realloc() - frequency, conditions, or experiences about relocation?]2Kenny McCormack
20 Jun 24 i i ii  i`- Re: Indefinite pronouns [was: Re: <something technical>]1Janis Papanagnou
21 Jun 24 i i ii  `- Re: Indefinite pronouns [was:Re: realloc() - frequency, conditions, or experiences about relocation?]1Tim Rentsch
17 Jun 24 i i i+* Re: realloc() - frequency, conditions, or experiences about relocation?2Richard Harnden
17 Jun 24 i i ii`- Re: realloc() - frequency, conditions, or experiences about relocation?1Chris M. Thomasson
17 Jun 24 i i i`- Re: realloc() - frequency, conditions, or experiences about relocation?1Malcolm McLean
17 Jun 24 i i +* Re: realloc() - frequency, conditions, or experiences about relocation?23Anton Shepelev
18 Jun 24 i i i+- Re: realloc() - frequency, conditions, or experiences about relocation?1David Jones
19 Jun 24 i i i+* Re: realloc() - frequency, conditions, or experiences about relocation?9David Duffy
19 Jun 24 i i ii+* Re: realloc() - frequency, conditions, or experiences about relocation?7Malcolm McLean
19 Jun 24 i i iii+* Re: realloc() - frequency, conditions, or experiences about relocation?4Ben Bacarisse
19 Jun 24 i i iiii`* Re: realloc() - frequency, conditions, or experiences about relocation?3David Brown
19 Jun 24 i i iiii `* Re: realloc() - frequency, conditions, or experiences about relocation?2Ben Bacarisse
20 Jun 24 i i iiii  `- Re: realloc() - frequency, conditions, or experiences about relocation?1David Brown
20 Jun 24 i i iii`* Re: realloc() - frequency, conditions, or experiences about relocation?2Anton Shepelev
8 Jul 24 i i iii `- Re: realloc() - frequency, conditions, or experiences about relocation?1Anton Shepelev
19 Jun 24 i i ii`- Re: realloc() - frequency, conditions, or experiences about relocation?1Anton Shepelev
2 Jul 24 i i i`* Re: realloc() - frequency, conditions, or experiences about relocation?12Rich Ulrich
2 Jul 24 i i i +* Re: realloc() - frequency, conditions, or experiences about relocation?5Keith Thompson
2 Jul 24 i i i i`* Re: realloc() - frequency, conditions, or experiences about relocation?4Rich Ulrich
8 Jul 24 i i i i `* Re: realloc() - frequency, conditions, or experiences about relocation?3Anton Shepelev
22 Jul 24 i i i i  `* Re: realloc() - frequency, conditions, or experiences about relocation?2Rich Ulrich
23 Jul 24 i i i i   `- Re: realloc() - frequency, conditions, or experiences about relocation?1Anton Shepelev
2 Jul 24 i i i `* Re: realloc() - frequency, conditions, or experiences about relocation?6Paul
2 Jul 24 i i i  `* Re: realloc() - frequency, conditions, or experiences about relocation?5Rich Ulrich
2 Jul 24 i i i   `* Re: realloc() - frequency, conditions, or experiences about relocation?4Rich Ulrich
2 Jul 24 i i i    `* Re: realloc() - frequency, conditions, or experiences about relocation?3Paul
2 Jul 24 i i i     +- Re: realloc() - frequency, conditions, or experiences about relocation?1James Kuyper
2 Jul 24 i i i     `- Re: realloc() - frequency, conditions, or experiences about relocation?1James Kuyper
17 Jun 24 i i +- Re: realloc() - frequency, conditions, or experiences about relocation?1Chris M. Thomasson
18 Jun 24 i i `* Re: realloc() - frequency, conditions, or experiences about relocation?3Keith Thompson
18 Jun 24 i i  +- Re: realloc() - frequency, conditions, or experiences about relocation?1Malcolm McLean
18 Jun 24 i i  `- Re: realloc() - frequency, conditions, or experiences about relocation?1Bonita Montero
17 Jun 24 i `- Re: realloc() - frequency, conditions, or experiences about relocation?1David Brown
17 Jun 24 +* Re: realloc() - frequency, conditions, or experiences about relocation?28Bonita Montero
20 Jun 24 i`* Re: realloc() - frequency, conditions, or experiences about relocation?27Vir Campestris
21 Jun 24 i `* Re: realloc() - frequency, conditions, or experiences about relocation?26Bonita Montero
24 Jun 24 i  `* Re: realloc() - frequency, conditions, or experiences about relocation?25Lawrence D'Oliveiro
24 Jun 24 i   +* Re: realloc() - frequency, conditions, or experiences about relocation?21Keith Thompson
24 Jun 24 i   i+* Re: realloc() - frequency, conditions, or experiences about relocation?11David Brown
24 Jun 24 i   ii+* Re: realloc() - frequency, conditions, or experiences about relocation?7Malcolm McLean
24 Jun 24 i   iii+* Re: realloc() - frequency, conditions, or experiences about relocation?3Keith Thompson
25 Jun 24 i   iiii+- Re: realloc() - frequency, conditions, or experiences about relocation?1Malcolm McLean
25 Jun 24 i   iiii`- Re: realloc() - frequency, conditions, or experiences about relocation?1Chris M. Thomasson
25 Jun 24 i   iii`* Re: realloc() - frequency, conditions, or experiences about relocation?3Lawrence D'Oliveiro
25 Jun 24 i   iii `* Re: realloc() - frequency, conditions, or experiences about relocation?2Bonita Montero
26 Jun 24 i   iii  `- Re: realloc() - frequency, conditions, or experiences about relocation?1Lawrence D'Oliveiro
24 Jun 24 i   ii+* Re: realloc() - frequency, conditions, or experiences about relocation?2Chris M. Thomasson
24 Jun 24 i   iii`- Re: realloc() - frequency, conditions, or experiences about relocation?1Bonita Montero
25 Jun 24 i   ii`- Re: realloc() - frequency, conditions, or experiences about relocation?1Lawrence D'Oliveiro
24 Jun 24 i   i+- Re: realloc() - frequency, conditions, or experiences about relocation?1Bonita Montero
25 Jun 24 i   i`* Re: realloc() - frequency, conditions, or experiences about relocation?8Lawrence D'Oliveiro
25 Jun 24 i   i +* Re: realloc() - frequency, conditions, or experiences about relocation?6Keith Thompson
25 Jun 24 i   i i+- Re: realloc() - frequency, conditions, or experiences about relocation?1Richard Damon
28 Jun 24 i   i i+* Re: realloc() - frequency, conditions, or experiences about relocation?2Phil Carmody
28 Jun 24 i   i ii`- Re: realloc() - frequency, conditions, or experiences about relocation?1Keith Thompson
28 Jun 24 i   i i`* Re: realloc() - frequency, conditions, or experiences about relocation?2James Kuyper
28 Jun 24 i   i i `- Re: realloc() - frequency, conditions, or experiences about relocation?1Keith Thompson
28 Jun 24 i   i `- Re: realloc() - frequency, conditions, or experiences about relocation?1James Kuyper
24 Jun 24 i   `* Re: realloc() - frequency, conditions, or experiences about relocation?3Bonita Montero
24 Jun 24 i    `* Down the hall, past the water cooler, third door on the left... (Was: realloc() - frequency, conditions, or experiences about) relocation?2Kenny McCormack
24 Jun 24 i     `- Re: Down the hall, past the water cooler, third door on the left... (Was: realloc() - frequency, conditions, or experiences about) relocation?1Bonita Montero
17 Jun 24 +* Re: realloc() - frequency, conditions, or experiences about relocation?2David Brown
17 Jun 24 i`- Re: realloc() - frequency, conditions, or experiences about relocation?1Bonita Montero
17 Jun 24 +- Re: realloc() - frequency, conditions, or experiences about relocation?1Janis Papanagnou
17 Jun 24 +- Re: realloc() - frequency, conditions, or experiences about relocation?1Michael S
18 Jun 24 +- Re: realloc() - frequency, conditions, or experiences about relocation?1Rosario19
25 Jun 24 `* Re: realloc() - frequency, conditions, or experiences about relocation?7Bonita Montero
25 Jun 24  +* Re: realloc() - frequency, conditions, or experiences about relocation?4Vir Campestris
25 Jun 24  i`* Re: realloc() - frequency, conditions, or experiences about relocation?3Bonita Montero
26 Jun 24  i `* Re: realloc() - frequency, conditions, or experiences about relocation?2Vir Campestris
26 Jun 24  i  `- Re: realloc() - frequency, conditions, or experiences about relocation?1Bonita Montero
25 Jun 24  `* Re: realloc() - frequency, conditions, or experiences about relocation?2DFS
25 Jun 24   `- Re: realloc() - frequency, conditions, or experiences about relocation?1Bonita Montero

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal