Bart <
bc@freeuk.com> wrote:
On 24/11/2024 15:00, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
On 24/11/2024 05:03, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
>
As for sizes:
>
c:\c>dir hello.exe
24/11/2024 00:44 2,048 hello.exe
>
c:\c>dir a.exe
24/11/2024 00:44 91,635 a.exe (48K with -s)
>
(At least that's one good thing of gcc writing out that weird a.exe each
time; I can compare both exes!)
>
AFAICS this is one-time Windows overhead + default layout rules for
the linker. On Linux I get 15952 bytes by defauls, 14472 after
striping. However, the actual code + data size is 1904 and even
in this most is crap needed to support extra features of C library.
>
In other words, this is mostly irrelevant, as people who want to
get size down can link it with different options to get smaller
size down. Actual hello world code size is 99 bytes when compiled
by gcc (default options) and 64 bytes by tcc.
>
I get a size of 3KB for tcc compiling hello.c under WSL.
That more or less agrees with file size that I reported. I
prefer to look at what 'size' reports and at looking at .o
files,
Oh, I thought you were reporting sizes of 99 and 64 bytes, in response
to tcc's 2048 bytes.
So I'm not sure what you mean by 'actual' size, unless it is the same as
this reported by my product here (comments added):
c:\cx>cc -v hello
Compiling hello.c to hello.exe
Code size: 34 bytes # .text
Idata size: 15 # .data
Code+Idata: 49
Zdata size: 0 # .bss
EXE size: 2,560
So at 49 bytes, I guess I win!
It looks so. Yes, I mean code + data size, if you have multiple
functions this adds up, while constant overhead remains constant.
On linux each program is supposed to have a header and that
puts absolute lower bound on size of the program (no neader =>
OS considers it as invalid). In modern programs you are
supposed to have separate code area, read-only data area and
mutable data area. In running program each of them consists
of integral number of pages. If you arrange them so that OS
can load them most easily, you get something like 12kB or 16kB
(actually a bit smaller as normally file will not contain
unused part of the last page). But if you add more code or
data the size will grow only sligthly or not at all: you
will see growth on last page and when one of inner pages
overflows and you need to start a new page.
It can also do manipulations that are harder in a 'softer', safer HLL.
(My scripting language however can still do most of those underhand things.)
Anything computational can be done in a HLL. You may wish to
play tricks to save time. Or possible some packing tricks to
save memory. But packing tricks can be done in HLL (say by
treating whole memory as a big array of u64), so this really
boils down to speed.
I'm sure that with Python, say, pretty much anything can be done given
enough effort. Even if it means cheating by using external add-on
modules to get around language limitations, like using Ctypes module,
which you will likely find uses C code.
I did not look how Python is doing its things. In one system that
I use there is rather general routine written in assembler which
can call routines using C call convention. The assembler routine
performs simple data convertion like removing tags so that C
sees raw machine integers or floats. It also knows which arguments
are supposed to go on the stack and which should be in registers.
There is less complete routine which allows callbacks from C,
this one abuses C (it is invalid C which happens to work OK in
all C compilers used to compile the system). There is a bunch
other assembler support routines, like access to arbitrary
bitstring, byte copy (used to copy arrays when needed), etc.
Rest is in the language itself: code generator knows about
references to bitstrings and in simple cases generates inline
code and passes general case to assembler support. There are
language defined data structures to represent external pointers
and functions. At higher level there is parser for C
declarations which can generate code to repack data structure
from C version to internal and back.
Concerning cheating, of course Python is cheating a lot. It has
several routines which work on sizeable pieces of data. Those
routines are coded in C or C++, so you get optimized C speed
when you call them.
This is different from having things part of the core language so they
become effortless and natural.
But, everything you've said seems to have backed up my remark that
people only seem to consider two possibilities:
* Either a scripting language where it doesn't matter that it's 1-2
magnitudes slower than native code
* Or a compiled language where it absolutely MUST be at least as fast as
gcc/clang-O3. Only 20 times faster than CPython is not enough!
You ignored what I wrote about compiled higher-level languages,
they exist, have speed competitive with your low-level language
and some people use them. Majority seem to go with interpreted
languages. Note that interpreted languages frequently have
large library of performance critical routines written in
lower level language. Do not be surprised that they want
optimizing compiler for them.
(In my JPEG timings I posted earlier today, CPython was 175 times slower
than gcc-O3. and 48-64 times slower than unoptimised C.
Applying the simplest optimsation, which I can tell you adds only 10% to
compilation time) made native code over 100 times faster than CPython,
and only 50 slower than gcc-O3. This was on a deliberately large input
Basically, if you are generating even the worst native code, then it
will already wipe the floor with any scripting language, when comparing
them both executing the same algorithm.)
But competion in not fair, the other side is cheating. Note
that using low-level language coding effort will be comparable
to C. You may save some time if you get better diagnostics.
There were studies claiming that stronger type checking reduces
effort needed to write correct program. But main increase in
productivity comes from higher-level constructs. Actually,
probably bigest gain is when you can reuse existing code, which
means to popular languages have very big advantage over less
popular ones. You need to rather strong advantages to
overcome popularity advantage of other language. Faster
compilation while nice have limited effect. And people have
ways to mitigate long compile times. So normal justification
for using low level language is "I need runtime speed". And
in such case it is natural to use compiler giving fastest
runtime speed.
-- Waldek Hebisch