Liste des Groupes | Revenir à cl c |
On Sun, 26 May 2024 16:25:51 +0100The Baby X resource compiler has a 'binary' tag to embed binary data.
bart <bc@freeuk.com> wrote:
On 26/05/2024 14:18, Michael S wrote:Yes.On Sun, 26 May 2024 12:51:12 +0100>
bart <bc@freeuk.com> wrote:
On 26/05/2024 12:09, David Brown wrote:>On 26/05/2024 00:58, Keith Thompson wrote:>For a very large file, that could be a significant burden. (I>
don't have any numbers on that.)
I do :
>
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3017.htm#design-efficiency-metrics>
>
(That's from a proposal for #embed for C and C++. Generating the
numbers and parsing them is akin to using xxd.)
>
More useful links:
>
<https://thephd.dev/embed-the-details#results>
<https://thephd.dev/implementing-embed-c-and-c++>
>
(These are from someone who did a lot of the work for the
proposals, and prototype implementations, as far as I understand
it.)
>
>
>
Note that I can't say how much of a difference this will make in
real life. I don't know how often people need to include
multi-megabyte files in their code. It certainly is not at a
level where I would change any of my existing projects from
external generator scripts to using #embed, but I might use it in
future projects.
I've just done my own quick test (not in C, using embed in my
language):
>
[]byte clangexe = binclude("f:/llvm/bin/clang.exe")
>
proc main=
fprintln "clang.exe is # bytes", clangexe.len
end
>
>
This embeds the Clang C compiler which is 119MB. It took 1.3
seconds to compile (note my compiler is not optimised).
>
If I tried it using text: a 121M-line include file, with one number
per line, it took 144 seconds (I believe it used more RAM than was
available: each line will have occupied a 64-byte AST node, so
nearly 8GB, on a machine with only 6GB available RAM, much of
which was occupied).
On my old PC that was not the cheapest box in the shop, but is more
than 10 y.o. compilation speed for similarly organized (but much
smaller) text files is as following:
MSVC 18.00.31101 (VS 2013) - 1950 KB/sec
MSVC 19.16.27032 (VS 2017) - 1180 KB/sec
MSVC 19.20.27500 (VS 2019) - 1180 KB/sec
clang 17.0.6 - 547 KB/sec (somewhat better with hex text)
gcc 13.2.0 - 580 KB/sec
>
So, MSVC compilers, esp. an old one, are somewhat faster than yours.
But if there was swapping involved it's not comparable. How much
time does it take for your compiler to produce 5MB byte array from
text?
Are you talking about a 5MB array initialised like this:
>
unsigned char data[] = {
45,
67,
17,
... // 5M-3 more rows
};
>
The timing for 120M entries was challenging as it exceeded physicalThat's an interesting test as well, but I don't want to run it on my HW
memory. However that test I can also do with C compilers. Results for
120 million lines of data are:
>
DMC - Out-of-memory
>
Tiny C - Silently stopped after 13 second (I thought it
had finished but no)
>
lccwin32 - Insufficient memory
>
gcc 10.x.x - Out of memory after 80 seconds
>
mcc - (My product) Memory failure after 27 seconds
>
Clang - (Crashed after 5 minutes)
>
MM 144s (Compiler for my language)
>
So the compiler for my language did quite well, considering!
>
right now. May be, at night.
>Faster than new MSVC, but slower than old MSVC.
Back to the 5MB test:
>
Tiny C 1.7s 2.9MB/sec (Tcc doesn't use any IR)
>
mcc 3.7s 1.3MB/sec (my product; uses intermediate ASM)
>That's quite impressive.
DMC -- -- (Out of memory; 32-bit compiler)
>
lccwin32 3.9s 1.3MB/sec
>
gcc 10.x 10.6s 0.5MB/sec
>
clang 7.4s 0.7MB/sec (to object file only)
>
MM 1.4s 3.6MB/sec (compiler for my language)
>
MM 0.7 7.1MB/sec (MM optimised via C and gcc-O3)
>
Does it generate object files or goes directly to exe?
Even if later, it's still impressive.
As a reminder, when using my version of 'embed' in my language,No, I just blindly believe the paper.
embedding a 120MB binary file took 1.3 seconds, about 90MB/second.
>
>But both are much faster than compiling through text. Even "slow">
40MB/3 is 6-7 times faster than the fastest of compilers in my
tests.
Do you have a C compiler that supports #embed?
>
But it probably would be available in clang this year and in gcc around
start of the next year. At least I hope so.
It's generally understood that processing text is slow, ifI don't think that conversion from text to binary is a significant
representing byte-at-a-time data. If byte arrays could be represented
as sequences of i64 constants, it would improve matters. That could
be done in C, but awkwardly, by aliasing a byte-array with an
i64-array.
>
bottleneck here. In order to get a feeling of the things, I wrote a
tiny program that converts comma-separated list of integers to a binary
file. Something quite similar to 'xxd -r' but with input format that
is more fit to our requirements. Not identical to full requirements, of
course. My utility can't handle comments and probably few other things
that are allowed in C sources, but conversion part is pretty much the
same.
It runs at 6.700 MB/s with decimal input and at 9.1 MB/s with hex input.
That with SATA SSD of sort that went out of fashion before 2020.
So, it seems that at least in case gcc a conversion part constitutes
less than 10% of the total run time.
If you want to play with it yourself, here is my source:
-- list_to_bin.c
-- takes textual input from standard input
-- writes output to binary file
-- Usage:
-- list_to_bin oufile.bin < inp_file.txt
--
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
int main(int argz, char** argv)
{
if (argz > 1) {
FILE* fp = fopen(argv[1], "wb");
if (fp) {
char buf[2048];
_Bool look_for_comma = 0;
for (;;) {
if (fgets(buf, sizeof(buf), stdin) != buf)
break;
char* p = buf;
for (;;) {
char c = *p;
if (isgraph(c)) {
if (look_for_comma) {
if (c == ',') {
look_for_comma = 0;
++p;
} else {
goto done;
}
} else {
char* endp;
long val = strtol(p, &endp, 0);
if (endp==p) // not a number
goto done;
fputc((unsigned char)val, fp);
p = endp;
look_for_comma = 1;
}
} else {
if (c == 0)
break; // end of line
++p; // skip space or control character
}
}
}
done:
fclose(fp);
} else {
perror(argv[1]);
return 1;
}
}
return 0;
}
>
Les messages affichés proviennent d'usenet.