On 5/27/2026 11:53 PM, Lawrence D’Oliveiro wrote:
On Wed, 27 May 2026 18:49:38 -0500, BGB wrote:
But, I am not personally as much of a fan of C++ as she is...
C++ syntax is so complex, the language spec has to add rules that say,
in case of ambiguity, that this interpretation is meant and not that.
Someone described this as “the principle of most surprise”.
Someone could almost come up with a language that is "like C++ but less horrible".
Core language:
Like a C / C# hybrid;
Base language could superficially resemble C++;
In any case, avoid needlessly changing basic syntax.
Designed to be easier to write a compiler;
Designed so that compiler isn't dead slow;
Goal should be that compiled code performance remain similar to C;
Shouldn't do things that would exclude it from C like use cases.
Type-system and memory model is mostly similar to C.
Could impose that declaration types work more like in C#;
SI + Interfaces rather than full MI;
Generic Types rather than Templates;
...
Parsing would follow a "can it reasonably be parsed as X" approach:
If something can syntactically be parsed as a cast/etc, assume it is so;
If this assumption turns out to be wrong, compiler error.
In this case, assume that types/etc may only appear in certain contexts, and can apply on "<type_expr> <identifier>" pattern recognition to detect declarations. Trying to put a declaration (or generic invocation) somewhere where it doesn't normally go, being a syntax error.
Say:
if(int i=0)
{ ... }
Would be regarded as illegal (but could be allowed as a special case in "for()" loops due to popularity).
Core type-system is C like:
Type consists of a base-type and any modifiers;
Base type names:
char, byte, short, int, long, float, double, ...
Type Modifiers:
signed, unsigned, const, volatile, ...
Where, const and volatile behave like in C.
Base type sizes (bits):
char 8 //ASCII / UTF-8
wchar 16 //UCS-2 / UTF-16
lchar 32 //UCS-4 / UTF-32
byte 8
sbyte 8
ubyte 8
short 16
int 32
long 64
float 32
double 64
Issue:
Would want to disallow compound type names, like "long long" or "long double".
As soon as the syntax allows this, an ambiguity is created that adversely effects parser speed.
One other option could be to provide a set of explicit sized types:
int8, int16, int32, int64, int128
uint8, uint16, uint32, uint64, uint128
float16, float32, float64, float128
...
In terms of representation, byte and char would be equivalent, except that it might make sense to treat 'char' as not normally an arithmetic type, so in order to perform arithmetic on char it would be cast to 'int' or similar.
Could assume that a "string" type exists, but primarily exists as an opaque "const char *" pointer.
Nominally, string literals could be stored prefixed with a length stored as a transposed UTF-8 codepoint (along with also having a NUL terminator). But, unlike "const char *", "string" would not allow pointer arithmetic, and so would always point to the start of the string literal, or to an explicitly interned string (doing otherwise would be erroneous).
While it is "trendy" in some languages to treat "string" as some sort of object type, actually doing string as an object is needlessly wasteful.
Likewise, storing a length as a prefix still allows "string.length" or similar to be O(1). Could assume default string format is UTF-8, but mostly treated as a blob of bytes.
While some languages (Java and C#) went over to UTF-16, this is needlessly wasteful of memory and breaks with C tradition.
Likely memory management:
Generic heap, new/delete;
No GC, as IMO no one thus far has made GC work sufficiently well.
Zones / arenas;
Initially resembles heap allocs,
but all objects in a zone can be bulk freed;
Some objects could be set to track their self-pointer;
Self-pointer is NULL'ed if object is freed.
Essentially, similar to Doom's Z_Malloc.
Automatic:
Freed by default as soon as parent frame exits.
OO:
Likely treat 'class' and 'struct' as distinct.
Objects are reference type by default (like in C#/Java);
However, will typically still have automatic lifetime if local.
Could likely still have RAII, but may manifest differently.
Structs would behave like C structs or C++ by_value classes.
Will explicitly forbid inheritance or virtual methods.
Could maybe have copy-constructors and destructors.
More likely to be used for C++ style RAII patterns.
The more restrictive object model would avoid a "big chunk of evil" that exists if trying to write a C++ compiler. Elimination of full MI eliminates a lot of complexity; as does eliminating inheritance on by-value types.
Declaration imports:
Would likely make sense to replace the reliance on "#include" with something more resembling the "import" mechanism from Java;
Though would differ in that the imports identify source files rather than classes;
Would likely treat package/import scope, name namespace scope, as two different entities. The package/import would resemble the mechanism in Java, but would instead merely import things at toplevel scope, and within an imported module.
Namespace would be used in a way more like that of C++ or C# namespaces:
namespace whatever { using whatever_else; ... }
It is likely that compiler would first generate a "declaration manifest" which would be used for these purposes.
Compiling Foo:
import bar;
Checks if bar has a manifest;
If yes:
Import manifest for bar;
Add 'bar' to object dependency graph;
Also import any of bar's dependencies.
If no:
Trigger frontend only compilation for 'bar';
If success:
Import manifest for bar;
Add 'bar' to object dependency graph;
Else:
Compiler error.
Compiler would likely deal with dependency compilation as a sort of stack machine, where imports are dealt with before compiling the main body of each module. This being to avoid excessive recursion and memory usage during dependency importing.
Would likely make sense to keep a C style preprocessor, but using it for headers could be discouraged (this being a great source of compile-time inefficiency in both C and C++).
...
Well, wont amount to much, just idle thoughts...
Haut de la page
Les messages affichés proviennent d'usenet.
NewsPortal