Sujet : Re: realloc() - frequency, conditions, or experiences about relocation?
De : anton.txt (at) *nospam* gmail.moc (Anton Shepelev)
Groupes : comp.lang.c sci.stat.mathDate : 19. Jun 2024, 23:53:47
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <20240620015347.9bdcc4df03ab63d096375450@gmail.moc>
References : 1 2 3 4 5 6 7 8
User-Agent : Sylpheed 3.7.0 (GTK+ 2.24.30; i686-pc-mingw32)
Malcolm McLean:
We have to have some knowledge. And what we probaby know
is that the input is a file stored on someone's personal
computer. And someone has published on the statistical
distribution of such files And they have a log-normal
distribution with a mean and a median which he gives. So
with that informaton, we can work out, given that a file
is at least N characters, what is the prbablity that an
allocation of any size will contain the whole file, and
how many bytes, on average will be wasted.
Observe that the standard algorithm of exponential growth is
memoryless and self-similar in that in does not depend on
context, or the history of previous reallocations. These
properties belong to (or even identify?) the exponential
distribution. We can therefore assume that exponential-
growth strategy is ideal for exponentially distributed
buffer sizes, and under that assumption determine the
relation between the CDF values (p) corresponding to
consequent re-allcoations:
p = e^x/L ,
p0 = 1-e^(L*x0) ,
p1 = 1-e^(L*x1) ,
x1 = k*x0 (by our strategy), =>
p1 = 1-(1-p0)^k .
which does not depend on the distribution and lets us
generalise this approach for any distribution:
x1 = Q( 1 - ( 1 - CDF(x0) )^k )
where:
x0 : the required size
x1 : the new recommended capacity
Q(p) : the p-Quantile of the given distribution
CDF(x): the CDF of the given distribution
k>1 : balance between speed and space efficiency
-- () ascii ribbon campaign -- against html e-mail/\ www.asciiribbon.org -- against proprietary attachments