Bart <
bc@freeuk.com> wrote:
OK, if it's so simple, explain it to me.
Apparently the first line here needs a semicolon after }, the second
doesn't:
int X[1] = {0};
void Y() {}
Similarly here:
if (x) y;
if (x) {}
Why?
"Because that's what the grammar says" isn't a valid answer.
In first approximation, for some constructs it is clear where
the end is, so no need for semicolon, for some other semicolon
is needed to mark the end. This rule does no work in all cases,
for example 'continue' and 'break' are keywords and defining
corresponding statements without semicolon would lead to
no ambiguity. Similarly in case of 'goto'. 'return' is
more tricky, but is seems that one could define it without
semicolon. But here probably consistency won: 'break',
'contunue', 'goto' statement, 'return' statement are perceived
as simple statements and are covered by informal rule
"simple statement needs terminationg semicolon". '{}'
is a compound statement, hence not a simple statement.
Similarly, 'if' is a complex statement, and needs no semicolon
on its own. As you noted without terminating semicolon do-while
loop would be ambigious, so it needs semicolon. I do
not have example for declarations, but I suspect that defining
then without semicolon would be ambigious. Function
_definition_ is anambigious without terminating semicolon,
so why put is there.
Concerning "grammar says it", grammar for C90 from which one
can generate working parser has 74 nonterminals. You could
change some rules and still get working parser for a different
language. So in this sense part of grammar is purely
arbitrary. But other changes would lead to grammar that
fails to work. If you look at rules you will see
substantial similarities between some rules, so grammar
is really simpler than what size alone would would suggest.
So, having working and sane (that is relatively simple)
grammar puts restrictions on the language, some changes
simply do not fit. Some changes would lead to completely
different language, that was not an option for C, as
very first versions were intentionaly similar to earlier
languages and later there was a body of existing programs
and programmers.
You write about confusion. I think that what you present
grammarians would call "garden paths", that is perceiving/
trying to make up different rules than grammar rules. In
your 'if' example you ignore simple thing: 'if' needs no
terminator on its own. It is null statement that needs
terminating semicolon, and "empty" compound statement that
does not need a terminator. Null statement and compound
statement are quite different, in particular without
semicolon you would not know that null statement is there,
while compound statement can be easily recognized without
need for terminator. There is no special rule for "if with
null statement", if it were you would get needlessly complex
grammar.
In the first pair, one line is a declaration, another is
function definition. Again, quite different constructs,
one needing terminator, other not needing it.
Garden paths are common in natural language and people
cope quite well. So for normal (even beginer)
programmers garden paths are not a real problem: you get
confused once, learn the right way and go on. In many
cases learning is unconcious, you simply get used to
the way code is written, and when you make mistakes
compiler tells you that there is an error, so you
correct it.
-- Waldek Hebisch