Newsportal USENET - Re: C23 thoughts and opinions

Re: C23 thoughts and opinions

Sujet : Re: C23 thoughts and opinions
De : david.brown (at) *nospam* hesbynett.no (David Brown)
Groupes : comp.lang.c
Date : 17. Jun 2024, 09:49:04

Autres entêtes

Organisation : A noiseless patient Spider
Message-ID : <v4ota0$ii30$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0

On 16/06/2024 21:00, bart wrote:

On 16/06/2024 15:54, David Brown wrote:
On 15/06/2024 21:27, bart wrote:
On 15/06/2024 18:17, David Brown wrote:
On 15/06/2024 00:39, bart wrote:
On 14/06/2024 22:30, Keith Thompson wrote:
>
Now that it's too late to change the definition, I've thought of
something that I think would have been a better way to specify #embed.
>
Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is
of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's not
too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in
hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null
character and C doesn't support zero-sized objects, uc"" is a syntax
error.
>
uc"..." string literals might be made even simpler, for example allowing
only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
>
That's something I added to string literals in my language within the last few months. Nothing do with embedding (but it can make hex sequences within strings more efficient, if that approach was used).
>
Writing byte-at-a-time hex data was always a bit fiddly:
>
     0x12, 0x34, 0xAB, ...
     "\x12\x34\xAB...
>
It was made worse by my preference for `x` being in lower case, and the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong.
>
What I did was create a new, variable-lenghth string escape sequence that looks like this:
>
   "ABC\h1234AB...\nopq"     // hex sequence between ABC & nopq
>
Hex digits after \h or \H are read in pairs. White space is allowed between pairs:
>
   "ABC\H 12 34 AB ...\nopq"
>
The only thing I wasn't sure about was the closing backslash, which looks at first like another escape code. But I think it is sound, although it can still be tweaked.
>
>
>
How often would something like that be useful? I would have thought that it is rare to see something that is basically text but has enough odd non-printing characters (other than the common \n, \t, \e) to make it worth the fuss. If you want to have binary data in something that looks like a string literal, then just use straight-up two hex digits per character - "4142431234ab". It's simpler to generate and parse. I don't see the benefit of something that mixes binary and text data.
>
That's not the same thing. That sequence "...1234..." occupies 4 bytes (with values 49 50 51 52), not two bytes (with values 0x12 and 0x34, or 18 and 52).
>
Here's an example of wanting to print '€4.99', first in C (note that my editor doesn't support Unicode so this stuff is needed):
>
    puts("\xE2\x82\xAC" "4.99");
>
The euro symbol occupies three bytes in UTF8. It's awkward to type: it has loads of backslashes, it keeps switching case and it needs more concentration.
>
Plus I had to split the string since apparently \x doesn't stop at two hex digits, it keeps going: it would have read \xAC4, which overflows the 8-bit width of a character anyway, so I don't know what the point is of reading more than 2 hex characters.
>
Using my feature, it looks like this:
>
     println "\H E2 82 AC\4.99"
>
>
I don't see any improvement of significance. The improvement, if any, is very minor.
The difference is that it can be typed fluently without that annoying \x between every number. Plus I can add white space for grouping without it affecting the data.

I realise you think your system is much nicer - otherwise you would not have implemented it! /I/ don't think it is a big improvement. It is certainly not big enough to be worth the effort of changing real languages or tools used by lots of people rather than just a single person. And I think the termination using "\" is a step backwards - now "\" is no longer an escape character, but has different purposes in different places. One and a half steps forward, one step back, is not worth the effort - especially when you can so easily go several steps forward with the format I suggested.

(I gather you have other conveniences for your language's printing features when converting various types, but that's a different matter.)
>
The obvious answer to writing this kind of thing is simply to switch to an editor that supports UTF-8.
It never happens that you want to type a bunch of hex byte values to initialise a byte array? OK.

It /does/ happen. In such cases, I type a bunch of hex values.
What doesn't happen is that I have a UTF-8 text and I choose to write that using hex values. I much prefer to write the UTF-8 text using an editor that supports UTF-8 and tools that work with UTF-8.

Why bother with the \H stuff? That's my point - use hex data for data, and text for text. Mixing these is not common enough to make it worth the extra fuss you have to give such negligible extra convenience.
>
My suggestion is that it could be helpful to have binary blobs written as hex digits without escapes anywhere, because it is /just/ binary data. I don't object to having optional spaces - that's a fine idea. But just write :
>
b"4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00"
b"B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00"
>
The extra "\H" adds nothing useful.
Is this a separate feature using 'b'?

Yes - that's the point. It would be for expressing binary blob data in a compact form as a string of hex digits, with or without spaces, and convenient for copy-and-paste from hex editors and other such sources. You could happily use h"..." rather than b"..." if you prefer. And I suppose it could be extended to support lumps bigger than 8 bits, but then endian issues complicate matters and I suspect it is not worth the effort.

Because in my scheme, \H is just another string escape code, which can be used in ordinary strings,

That is what I would want to avoid. Being able to mix such data is a disadvantage, not an advantage. (IMHO, of course.)

and b"" strings define char[] data which can include normal text data too.
So my example could have been written as b"MZ\h 90 00 03 ..."

And that kind of monstrosity is what I was trying to get away from.

I did look at having a separate feature, but I didn't want that. I ended up with these scheme for data-strings, here expressed using C types:
                    Can initialise:
   "abcd"           char* only
   s"abcd"           char*, char[] or any T[]; zero-terminated
   b"abcd"           char*, char[] or any T[]
sinclude"file"    char*, char[] or any T[]; zero-terminated
   binclude"file"    char*, char[] or any T[]

It is a mistake to have too many similar-looking alternatives with different rules as to when and where they can be used.
Changing existing languages is always difficult, or even impossible. But my suggestion here is that there should be two different kinds of literals:
"Hello, world!"
and
b"00 12 34"
The former is always a string, always UTF-8, in whatever format the language uses for strings (zero-terminated, Pascal style, or whatever). The later is a compact way of writing binary blobs in hex when needed, and is always a constant array of bytes.

The first 3 can include any string escapes including \H...\
The last two embed file data, binary or text. But if a normal C-style string is needed with no embedded zeros except at the end, sinclude should be used with a text file.

>
>
>
>
(The 's'/'b' prefixes are needed for strings to have a type of (in C terms) char[] rather than char*, a detail that C glosses over via some magic. 's' gives you a zero terminator, 'b' as used here doesn't. The "+" is used for compile-time string/data-string concatenation.)
>
In short, more is possible without needed to resort to tools. You can directly work from a hex dump.
>

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
22 May 24	C23 thoughts and opinions	524	David Brown
22 May 24	Re: C23 thoughts and opinions	355	Thiago Adams
22 May 24	Re: C23 thoughts and opinions	352	David Brown
22 May 24	Re: C23 thoughts and opinions	22	Thiago Adams
23 May 24	Re: C23 thoughts and opinions	21	David Brown
23 May 24	Re: C23 thoughts and opinions	20	Thiago Adams
23 May 24	Re: C23 thoughts and opinions	18	David Brown
23 May 24	Re: C23 thoughts and opinions	17	Thiago Adams
23 May 24	Re: C23 thoughts and opinions	16	Keith Thompson
24 May 24	Re: C23 thoughts and opinions	1	David Brown
24 May 24	Re: C23 thoughts and opinions	14	Thiago Adams
24 May 24	Re: C23 thoughts and opinions	13	Keith Thompson
24 May 24	Re: C23 thoughts and opinions	12	Thiago Adams
24 May 24	Re: C23 thoughts and opinions	11	Keith Thompson
25 May 24	Re: C23 thoughts and opinions	10	Thiago Adams
25 May 24	Re: C23 thoughts and opinions	4	Keith Thompson
25 May 24	Re: C23 thoughts and opinions	3	Thiago Adams
25 May 24	Re: C23 thoughts and opinions	2	David Brown
26 May 24	Re: C23 thoughts and opinions	1	Keith Thompson
25 May 24	Re: C23 thoughts and opinions	5	David Brown
25 May 24	Re: C23 thoughts and opinions	4	Thiago Adams
25 May 24	Re: C23 thoughts and opinions	2	David Brown
26 May 24	Re: C23 thoughts and opinions	1	bart
6 Jun 24	Re: C23 thoughts and opinions	1	Thiago Adams
23 May 24	Re: C23 thoughts and opinions	1	Thiago Adams
23 May 24	Re: C23 thoughts and opinions	323	Keith Thompson
23 May 24	Re: C23 thoughts and opinions	313	Thiago Adams
23 May 24	Re: C23 thoughts and opinions	312	bart
23 May 24	Re: C23 thoughts and opinions	309	David Brown
23 May 24	Re: C23 thoughts and opinions	308	Keith Thompson
24 May 24	Re: C23 thoughts and opinions	1	David Brown
25 May 24	Re: C23 thoughts and opinions	305	Keith Thompson
25 May 24	Re: C23 thoughts and opinions	304	David Brown
26 May 24	Re: C23 thoughts and opinions	303	Keith Thompson
26 May 24	Re: C23 thoughts and opinions	300	David Brown
26 May 24	Re: C23 thoughts and opinions	17	bart
26 May 24	Re: C23 thoughts and opinions	16	Michael S
26 May 24	Re: C23 thoughts and opinions	15	bart
26 May 24	Re: C23 thoughts and opinions	14	Michael S
26 May 24	Re: C23 thoughts and opinions	3	bart
26 May 24	Re: C23 thoughts and opinions	2	Michael S
26 May 24	Re: C23 thoughts and opinions	1	bart
26 May 24	Re: C23 thoughts and opinions	5	Malcolm McLean
26 May 24	Re: C23 thoughts and opinions	4	Michael S
27 May 24	Re: C23 thoughts and opinions	3	Lawrence D'Oliveiro
27 May 24	Re: C23 thoughts and opinions	1	Chris M. Thomasson
27 May 24	Re: C23 thoughts and opinions	1	David Brown
26 May 24	Re: C23 thoughts and opinions	1	Michael S
26 May 24	Re: C23 thoughts and opinions	1	bart
27 May 24	Re: C23 thoughts and opinions	1	Keith Thompson
27 May 24	Re: C23 thoughts and opinions	2	Lawrence D'Oliveiro
27 May 24	Re: C23 thoughts and opinions	1	Michael S
26 May 24	Re: C23 thoughts and opinions	1	Thiago Adams
27 May 24	Re: C23 thoughts and opinions	66	Keith Thompson
27 May 24	Re: C23 thoughts and opinions	62	David Brown
28 May 24	Re: C23 thoughts and opinions	61	Keith Thompson
28 May 24	Re: C23 thoughts and opinions	60	David Brown
28 May 24	Re: C23 thoughts and opinions	59	Keith Thompson
28 May 24	Re: C23 thoughts and opinions	1	Michael S
29 May 24	Re: C23 thoughts and opinions	57	David Brown
14 Jun 24	Re: C23 thoughts and opinions	56	Keith Thompson
15 Jun 24	Re: C23 thoughts and opinions	12	bart
15 Jun 24	Re: C23 thoughts and opinions	11	David Brown
15 Jun 24	Re: C23 thoughts and opinions	10	bart
16 Jun 24	Re: C23 thoughts and opinions	5	Lawrence D'Oliveiro
16 Jun 24	Re: C23 thoughts and opinions	4	bart
16 Jun 24	Re: C23 thoughts and opinions	1	Lawrence D'Oliveiro
16 Jun 24	Re: C23 thoughts and opinions	2	Chris M. Thomasson
17 Jun 24	Re: C23 thoughts and opinions	1	Lawrence D'Oliveiro
16 Jun 24	Re: C23 thoughts and opinions	4	David Brown
16 Jun 24	Re: C23 thoughts and opinions	3	bart
17 Jun 24	Re: C23 thoughts and opinions	1	David Brown
17 Jun 24	Re: C23 thoughts and opinions	1	Michael S
15 Jun 24	Re: C23 thoughts and opinions	3	David Brown
16 Jun 24	Re: C23 thoughts and opinions	2	Lawrence D'Oliveiro
16 Jun 24	Re: C23 thoughts and opinions	1	David Brown
17 Jun 24	Hex string literals (was Re: C23 thoughts and opinions)	40	Keith Thompson
17 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	20	David Brown
18 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	18	Keith Thompson
18 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	2	Lawrence D'Oliveiro
18 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	1	Keith Thompson
18 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	15	David Brown
19 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	6	Keith Thompson
19 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	5	David Brown
19 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	4	Kaz Kylheku
19 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	3	Michael S
19 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	1	bart
19 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	1	Michael S
19 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	8	Lawrence D'Oliveiro
19 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	6	David Brown
21 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	5	Lawrence D'Oliveiro
21 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	3	David Brown
22 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	2	Lawrence D'Oliveiro
22 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	1	David Brown
21 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	1	James Kuyper
19 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	1	Keith Thompson
18 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	1	Lawrence D'Oliveiro
17 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	5	Richard Kettlewell
17 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	1	Richard Kettlewell
18 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	3	Keith Thompson
18 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	1	Lawrence D'Oliveiro
18 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	1	Richard Kettlewell
17 Jun 24	Re: Hex string literals (was Re: C23 thoughts and opinions)	14	bart
28 May 24	Re: C23 thoughts and opinions	2	Keith Thompson
28 May 24	Re: C23 thoughts and opinions	1	Malcolm McLean
27 May 24	Re: C23 thoughts and opinions	121	Lawrence D'Oliveiro
28 May 24	xxd -i vs DIY Was: C23 thoughts and opinions	94	Michael S
28 May 24	Re: C23 thoughts and opinions	2	Keith Thompson
12 Jun 24	Re: C23 thoughts and opinions	1	Bonita Montero
23 May 24	Re: C23 thoughts and opinions	2	Keith Thompson
23 May 24	Re: C23 thoughts and opinions	7	Thiago Adams
23 May 24	Re: C23 thoughts and opinions	2	David Brown
23 May 24	Re: C23 thoughts and opinions	6	Michael S
23 May 24	Re: C23 thoughts and opinions	2	Lawrence D'Oliveiro
22 May 24	Re: C23 thoughts and opinions	10	Malcolm McLean
22 May 24	Re: C23 thoughts and opinions	9	Chris M. Thomasson
23 May 24	Re: C23 thoughts and opinions	2	Lawrence D'Oliveiro
23 May 24	Re: C23 thoughts and opinions	14	Michael S
23 May 24	Re: C23 thoughts and opinions - why so conservative?	37	Michael S
23 May 24	Re: C23 thoughts and opinions	94	Bonita Montero
25 May 24	Re: C23 thoughts and opinions	2	Thiago Adams