Sujet : Re: xxd -i vs DIY Was: C23 thoughts and opinions
De : bc (at) *nospam* freeuk.com (bart)
Groupes : comp.lang.cDate : 29. May 2024, 18:41:25
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v37pc5$186go$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
User-Agent : Mozilla Thunderbird
On 29/05/2024 16:32, Michael S wrote:
On Wed, 29 May 2024 15:16:06 +0100
bart <bc@freeuk.com> wrote:
Your timing is 0.6 seconds to read 88MB and write, what, 300MB of
text?
>
Much less. Only 193 MB. It seems, this DLL I was textualizing is stuffed
with small numbers. That explains big part of the difference.
I did another test with big 7z archive as an input:
Input size: 116255887
Output size: 425944020
$ time ../quick_xxd/bin_to_listmb /d/bin/tmp.7z uu.txt
real 0m1.170s
user 0m0.000s
sys 0m0.000s
Almost exactly 100 MB/s which is only 1.4-1.6 times faster than your
measurements.
Actually, the fastest timing I've got was 1.25 seconds (100MB input, 360MB output), but that was from my C version compiled with DMC (a 32-bit compiler). gcc was a bit slower.
Each to his own.
For me your code is unreadable, mostly due to very short names of
variables that give no hint of usage, absence of declarations (I'd
guess, you have them at the top of the function, for me it's no better
than not having them at all) and zero comments.
Besides, our snippets are not functionally identical. Yours don't handle
write failures. Practically, on "big" computer it's a reasonable choice,
because real I/O problems are unlikely to be detected at fwrite. They
tend to manifest themselves much later. But on comp.lang.c we like to
pretend that life is simpler and more black&white than it is in reality.
Below is a version with no declarations at all. It is in a dynamic scripting language.
It runs in 7.3 seconds (or 6.4 seconds if newlines are dispensed with).
It reads the input as a byte array, and assembles the output as a single string. The 'readbinarray' function conceivably be replaced by one based around 'mmap'.
------------------------------------
numtable ::= (0:)
for i in 0..255 do
numtable &:= tostr(i)+","
od
s::=""
k:=0
for bb in readbinfile("data100") do
s +:= numtable[bb]
if ++k = 21 then
s +:= '\n'
k := 0
fi
od
writestrfile("data.txt", s)
------------------------------------
An advantage of higher-level code is being able to trivually do stuff like this (output data in reverse order):
for bb in reverse(readbinfile("data100")) do
While functions like 'writestrfile' could also check that the resulting file size is the same length as the string being written.
Still looks complicated? Here's a one-line version:
writestrfile("data.txt", tostr(readbinfile("data100")))
However, the output looks like this:
(38, 111, ... 197)
This is acceptable syntax for my languages, but C would require braces:
writestrfile("data.txt", "{" + tostr(readbinfile("/c/data100"))[2..$-1] + "}")
It is convenient to deal with data in whole-file blobs. There are huge memory capacities available now; why not take advantage?