Liste des Groupes | Revenir à cl c |
On 19/11/2024 01:53, Waldek Hebisch wrote:Bart <bc@freeuk.com> wrote:On 10/11/2024 06:00, Waldek Hebisch wrote:Bart <bc@freeuk.com> wrote:>>I'd would consider a much elaborate one putting the onus on external>
tools, and still having an unpredictable result to be the poor of the two.
>
You want to create a language that is easily compilable, no matter how
complex the input.
Normally time spent _using_ compiler should be bigger than time
spending writing compiler. If compiler gets enough use, it
justifies some complexity.
That doesn't add up: the more the compiler gets used, the slower it
should get?!
More complicated does not mean slower. Binary search or hash tables
are more complicated than linear search, but for larger data may
be much faster.
That's not the complexity I had in mind. The 100-200MB sizes of
LLVM-based compilers are not because they use hash-tables over linear
search.
More generaly, I want to minimize time spent by the programmer,
that is _sum over all iterations leading to correct program_ of
compile time and "think time". Compiler that compiles slower,
but allows less iterations due to better diagnostics may win.
Also, humans perceive 0.1s delay almost like no delay at all.
So it does not matter if single compilation step is 0.1s or
0.1ms. Modern computers can do a lot of work in 0.1s.
What's the context of this 0.1 seconds? Do you consider it long or short?
My tools can generally build my apps from scratch in 0.1 seconds; big
compilers tend to take a lot longer. Only Tiny C is in that ballpark.
So I'm failing to see your point here. Maybe you picked up that 0.1
seconds from an earlier post of mine and are suggesting I ought to be
able to do a lot more analysis within that time?
Yes. This may lead to some complexity. Simple approach is to
avoid obviously useless recompilation ('make' is doing this).
More complicated approach may keep some intermediate data and
try to "validate" them first. If previous analysis is valid,
then it can be reused. If something significant changes, than
it needs to be re-done. But many changes only have very local
effect, so at least theoretically re-using analyses could
save substantial time.
I consider compilation: turning textual source code into a form that can
be run, typically binary native code, to be a completely routine task
that should be as simple and as quick as flicking a light switch.
While anything else that might be a deep analysis of that program I
consider to be a quite different task. I'm not saying there is no place
for it, but I don't agree it should be integrated into every compiler
and always invoked.
Since now that last statement is the '0' value (any int value wil do).
What should my compiler report instead? What analysis should it be
doing? What would that save me from typing?
Currently in typed language that I use literal translation of
the example hits a hole in checks, that is the code is accepted.
Concerning needed analyses: one thing needed is representation of
type, either Pascal range type or enumeration type (the example
is _very_ unatural because in modern programming magic numbers
are avoided and there would be some symbolic representation
adding meaning to the numbers). Second, compiler must recognize
that this is a "multiway switch" and collect conditions.
The example came from C. Even if written as a switch, C switches do not
return values (and also are hard to even analyse as to which branch is
which).
In my languages, switches can return values, and a switch written as the
last statement of a function is considered to do so, even if each branch
uses an explicit 'return'. Then, it will consider a missing ELSE a 'hole'.
It will not do any analysis of the range other than what is necessary to
implement switch (duplicate values, span of values, range-checking when
using jump tables).
So the language may require you to supply a dummy 'else x' or 'return
x'; so what?
The alternative appears to be one of:
* Instead of 'else' or 'return', to write 'unreachable', which puts some
trust, not in the programmer, but some person calling your function
who does not have sight of the source code, to avoid calling it with
invalid arguments
Once
you have such representation (which may be desirable for other
reasons) it is easy to determine set of handled values. More
precisely, in this example we just have small number of discrete
values. More ambitious compiler may have list of ranges.
If type also specifies list of values or list of ranges, then
it is easy to check if all values of the type are handled.
The types are tyically plain integers, with ranges from 2**8 to 2**64.
The ranges associated with application needs will be more arbitrary.
If talking about a language with ranged integer types, then there might
be more point to it, but that is itself a can of worms. (It's hard to do
without getting halfway to implementing Ada.)
You can't do this stuff with the compilers David Brown uses; I'm
guessing you can't do it with your prefered ones either.
To recompile the typed system I use (about 0.4M lines) on new fast
machine I need about 53s. But that is kind of cheating:
- this time is for parallel build using 20 logical cores
- the compiler is not in the language it compiles (but in untyped
vesion of it)
- actuall compilation of the compiler is small part of total
compile time
On slow machine compile time can be as large as 40 minutes.
40 minutes for 400K lines? That's 160 lines per second; how old is this
machine? Is the compiler written in Python?
An untyped system that I use has about 0.5M lines and recompiles
itself in 16s on the same machine. This one uses single core.
On slow machine compile time may be closer to 2 minutes.
So 4K to 30Klps.
Again, compiler compile time is only a part of build time.
Actualy, one time-intensive part is creating index for included
documentation.
Which is not going to be part of a routine build.
Another is C compilation for a library file
(system has image-processing functions and low-level part of
image processing is done in C). Recomplation starts from
minimal version of the system, rebuilding this minimal
version takes 3.3s.
My language tools work on a whole program, where a 'program' is a single
EXE or DLL file (or a single OBJ file in some cases).
A 'build' then turns N source files into 1 binary file. This is the task
I am talking about.
A complete application may have several such binaries and a bunch of
other stuff. Maybe some source code is generated by a script. This part
is open-ended.
However each of my current projects is a single, self-contained binary
by design.
Anyway, I do not need cascaded recompilation than you present.
Both system above have incermental compilation, the second one
at statement/function level: it offers interactive prompt
which takes a statement from the user, compiles it and immediately
executes. Such statement may define a function or perform compilation.
Even on _very_ slow machine there is no noticable delay due to
compilation, unless you feed the system with some oversized statement
or function (presumably from a file).
This sounds like a REPL system. There, each line is a new part of the
program which is processed, executed and discarded.
In that regard, it
is not really what I am talking about, which is AOT compilation of a
program represented by a bunch of source files.
Or can a new line redefine something, perhaps a function definition,
previously entered amongst the last 100,000 lines? Can a new line
require compilation of something typed 50,000 lines ago?
What happens if you change the type of a global; are you saying that
none of the program codes needs revising?
An untyped system
What do you mean by an untyped system? To me it usually means
dynamically typed.
Les messages affichés proviennent d'usenet.