Bart <
bc@freeuk.com> wrote:
On 10/11/2024 06:00, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
I'd would consider a much elaborate one putting the onus on external
tools, and still having an unpredictable result to be the poor of the two.
>
You want to create a language that is easily compilable, no matter how
complex the input.
Normally time spent _using_ compiler should be bigger than time
spending writing compiler. If compiler gets enough use, it
justifies some complexity.
That doesn't add up: the more the compiler gets used, the slower it
should get?!
More complicated does not mean slower. Binary search or hash tables
are more complicated than linear search, but for larger data may
be much faster. Similarly, compiler may be simplified by using
simpler but slower methods and more complicated compiler may use
faster methods. This is particularly relevant here: simple compiler
may keep list of cases or ranges and linarly scan those. More
advanced one may use a say a tree structure.
More generaly, I want to minimize time spent by the programmer,
that is _sum over all iterations leading to correct program_ of
compile time and "think time". Compiler that compiles slower,
but allows less iterations due to better diagnostics may win.
Also, humans perceive 0.1s delay almost like no delay at all.
So it does not matter if single compilation step is 0.1s or
0.1ms. Modern computers can do a lot of work in 0.1s.
The sort of analysis you're implying I don't think belongs in the kind
of compiler I prefer. Even if it did, it would be later on in the
process than the point where the above restriction is checked, so
wouldn't exist in one of my compilers anyway.
Sure, you design your compiler as you like.
I don't like open-ended tasks like this where compilation time could end
up being anything. If you need to keep recompiling the same module, then
you don't want to repeat that work each time.
Yes. This may lead to some complexity. Simple approach is to
avoid obviously useless recompilation ('make' is doing this).
More complicated approach may keep some intermediate data and
try to "validate" them first. If previous analysis is valid,
then it can be reused. If something significant changes, than
it needs to be re-done. But many changes only have very local
effect, so at least theoretically re-using analyses could
save substantial time.
Concerning open-ended, may attitude is that compiler should make
effort which is open-ended in the sense that when new method is
discovered then compiler may be extended and do more work.
OTOH in "single true compiler" world, compiler may say "this is
too difficult, giving up". Of course, when trying something
very hard compiler is likely to run out of memory or user will
stop it. But compiler may give up earlier. Of course, this
is unacceptable for a standarized language, when people move
programs between different compiler. If compiler can legally
reject a program because of its limitation and is doing this
with significant probabliity, than portablity between compilers
is severly limited. But if there is a way to disable extra
checks, then this may work. This is one of reasones why
'gcc' has so many options: users that want it can get stronger
checking, but if they want 'gcc' will accept lousy code
too.
I am mainly concerned with clarity and correctness of source code.
So am I. I try to keep my syntax clean and uncluttered.
Dummy 'else' doing something may hide errors.
So can 'unreachable'.
>
Dummy 'else' signaling
error means that something which could be compile time error is
only detected at runtime.
Compiler that detects most errors of this sort is IMO better than
compiler which makes no effort to detect them. And clearly, once
problem is formulated in sufficiently general way, it becomes
unsolvable. So I do not expect general solution, but expect
resonable effort.
So how would David Brown's example work:
int F(int n) {
if (n==1) return 10;
if (n==2) return 20;
}
/You/ know that values -2**31 to 0 and 3 to 2**31-1 are impossible; the
compiler doesn't. It's likely to tell you that you may run into the end
of the function.
So what do you want the compiler to here? If I try it:
func F(int n)int =
if n=1 then return 10 fi
if n=2 then return 20 fi
end
It says 'else needed' (in that last statement). I can also shut it up
like this:
func F(int n)int = # int is i64 here
if n=1 then return 10 fi
if n=2 then return 20 fi
0
end
Since now that last statement is the '0' value (any int value wil do).
What should my compiler report instead? What analysis should it be
doing? What would that save me from typing?
Currently in typed language that I use literal translation of
the example hits a hole in checks, that is the code is accepted.
Concerning needed analyses: one thing needed is representation of
type, either Pascal range type or enumeration type (the example
is _very_ unatural because in modern programming magic numbers
are avoided and there would be some symbolic representation
adding meaning to the numbers). Second, compiler must recognize
that this is a "multiway switch" and collect conditions. Once
you have such representation (which may be desirable for other
reasons) it is easy to determine set of handled values. More
precisely, in this example we just have small number of discrete
values. More ambitious compiler may have list of ranges.
If type also specifies list of values or list of ranges, then
it is easy to check if all values of the type are handled.
normally you do not need very complex analysis:
>
I don't want to do any analysis at all! I just want a mechanical
translation as effortlessly as possible.
>
I don't like unbalanced code within a function because it's wrong and
can cause problems.
Well, I demand more from compiler than you do...
Perhaps you're happy for it to be bigger and slower too. Most of my
projects build more or less instantly. Here 'ms' is a version that runs
programs directly from source (the first 'ms' is 'ms.exe' and subsequent
ones are 'ms.m' the lead module):
c:\bx>ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms ms hello
Hello World! 21:00:45
This builds and runs 15 successive generations of itself in memory
before building and running hello.m; it took 1 second in all. (Now try
that with gcc!)
Here:
c:\cx>tm \bx\mm -runp cc sql
Compiling cc.m to <pcl>
Compiling sql.c to sql.exe
This compiles my C compiler from source but then it /interprets/ the IR
produced. This interpreted compiler took 6 seconds to build the 250Kloc
test file, and it's a very slow interpreter (it's used for testing and
debugging).
(gcc -O0 took a bit longer to build sql.c! About 7 seconds but it is
using a heftier windows.h.)
If I run the C compiler from source as native code (\bx\ms cc sql) then
building the compiler *and* sql.c takes 1/3 of a second.
You can't do this stuff with the compilers David Brown uses; I'm
guessing you can't do it with your prefered ones either.
To recompile the typed system I use (about 0.4M lines) on new fast
machine I need about 53s. But that is kind of cheating:
- this time is for parallel build using 20 logical cores
- the compiler is not in the language it compiles (but in untyped
vesion of it)
- actuall compilation of the compiler is small part of total
compile time
On slow machine compile time can be as large as 40 minutes.
An untyped system that I use has about 0.5M lines and recompiles
itself in 16s on the same machine. This one uses single core.
On slow machine compile time may be closer to 2 minutes.
Again, compiler compile time is only a part of build time.
Actualy, one time-intensive part is creating index for included
documentation. Another is C compilation for a library file
(system has image-processing functions and low-level part of
image processing is done in C). Recomplation starts from
minimal version of the system, rebuilding this minimal
version takes 3.3s.
Note that in both cases line counts are from 'wc'. Both systems
contain substantial amount of documentation, I tried to compensate
for this, but size measured in terms of LOC (that is excluding
comments, empty lines, non-code files) would be significantly
smaller.
Anyway, I do not need cascaded recompilation than you present.
Both system above have incermental compilation, the second one
at statement/function level: it offers interactive prompt
which takes a statement from the user, compiles it and immediately
executes. Such statement may define a function or perform compilation.
Even on _very_ slow machine there is no noticable delay due to
compilation, unless you feed the system with some oversized statement
or function (presumably from a file).
-- Waldek Hebisch