Re: realloc() - frequency, conditions, or experiences about relocation?

Liste des GroupesRevenir à ss math 
Sujet : Re: realloc() - frequency, conditions, or experiences about relocation?
De : malcolm.arthur.mclean (at) *nospam* gmail.com (Malcolm McLean)
Groupes : comp.lang.c
Date : 17. Jun 2024, 14:45:19
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v4pb4v$lhgk$1@dont-email.me>
References : 1 2 3 4
User-Agent : Mozilla Thunderbird
On 17/06/2024 10:55, Ben Bacarisse wrote:
Malcolm McLean <malcolm.arthur.mclean@gmail.com> writes:
 
On 17/06/2024 10:18, Ben Bacarisse wrote:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
>
In a recent thread realloc() was a substantial part of the discussion.
"Occasionally" the increased data storage will be relocated along
with the previously stored data. On huge data sets that might be a
performance factor. Is there any experience or are there any concrete
factors about the conditions when this relocation happens? - I could
imagine that it's no issue as long as you're in some kB buffer range,
but if, say, we're using realloc() to substantially increase buffers
often it might be an issue to consider. It would be good to get some
feeling about that internal.
There is obviously a cost, but there is (usually) no alternative if
contiguous storage is required.  In practice, the cost is usually
moderate and can be very effectively managed by using an exponential
allocation scheme: at every reallocation multiply the storage space by
some factor greater than 1 (I often use 3/2, but doubling is often used
as well).  This results in O(log(N)) rather than O(N) allocations as in
your code that added a constant to the size.  Of course, some storage is
wasted (that /might/ be retrieved by a final realloc down to the final
size) but that's rarely significant.
>
So can we work it out?
 What is "it"?
 
Let's assume for the moment that the allocations have a semi-normal
distribution,
 What allocations?  The allocations I talked about don't have that
distribution.
 
with negative values disallowed. Now ignoring the first few
values, if we have allocated, say, 1K, we ought to be able to predict the
value by integrating the distribution from 1k to infinity and taking the
mean.
 I have no idea what you are talking about.  What "value" are you looking
to calculate?
 
We have a continuously growing buffer, and we want the best strategy for reallocations as the stream of characters comes at us. So, given we now how many characters have arrived, can we predict how many will arrive, and therefore ask for the best amount when we reallocate, so that we neither make too many reallocation (reallocate on every byte received) or ask for too much (demand SIZE_MAX memory when the first byte is received).?
Your strategy for avoiding these extremes is exponential growth. You allocate a small amount for the first few bytes. Then you use exponential growth, with a factor of ether 2 or 1.5. My question is whether or not we can be cuter. And of course we need to know the statistical distribution of the input files. And I'm assuming a semi-normal distribution, ignoring the files with small values, which we will allocate enough for anyway.
And so we integrate the distribution between the point we are at and infinity. Then we tkae the mean. And that gives us a best estimate of how many bytes are to come, and therefore how much to grow the buffer by.
--
Check out my hobby project.
http://malcolmmclean.github.io/babyxrc

Date Sujet#  Auteur
21 Sep 24 o 

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal