Liste des Groupes | Revenir à cl c |
On 16/06/2024 15:54, David Brown wrote:I realise you think your system is much nicer - otherwise you would not have implemented it! /I/ don't think it is a big improvement. It is certainly not big enough to be worth the effort of changing real languages or tools used by lots of people rather than just a single person. And I think the termination using "\" is a step backwards - now "\" is no longer an escape character, but has different purposes in different places. One and a half steps forward, one step back, is not worth the effort - especially when you can so easily go several steps forward with the format I suggested.On 15/06/2024 21:27, bart wrote:The difference is that it can be typed fluently without that annoying \x between every number. Plus I can add white space for grouping without it affecting the data.On 15/06/2024 18:17, David Brown wrote:>On 15/06/2024 00:39, bart wrote:>On 14/06/2024 22:30, Keith Thompson wrote:>
>Now that it's too late to change the definition, I've thought of>
something that I think would have been a better way to specify #embed.
>
Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is
of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's not
too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in
hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null
character and C doesn't support zero-sized objects, uc"" is a syntax
error.
>
uc"..." string literals might be made even simpler, for example allowing
only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
That's something I added to string literals in my language within the last few months. Nothing do with embedding (but it can make hex sequences within strings more efficient, if that approach was used).
>
Writing byte-at-a-time hex data was always a bit fiddly:
>
0x12, 0x34, 0xAB, ...
"\x12\x34\xAB...
>
It was made worse by my preference for `x` being in lower case, and the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong.
>
What I did was create a new, variable-lenghth string escape sequence that looks like this:
>
"ABC\h1234AB...\nopq" // hex sequence between ABC & nopq
>
Hex digits after \h or \H are read in pairs. White space is allowed between pairs:
>
"ABC\H 12 34 AB ...\nopq"
>
The only thing I wasn't sure about was the closing backslash, which looks at first like another escape code. But I think it is sound, although it can still be tweaked.
>
>
How often would something like that be useful? I would have thought that it is rare to see something that is basically text but has enough odd non-printing characters (other than the common \n, \t, \e) to make it worth the fuss. If you want to have binary data in something that looks like a string literal, then just use straight-up two hex digits per character - "4142431234ab". It's simpler to generate and parse. I don't see the benefit of something that mixes binary and text data.
That's not the same thing. That sequence "...1234..." occupies 4 bytes (with values 49 50 51 52), not two bytes (with values 0x12 and 0x34, or 18 and 52).
>
Here's an example of wanting to print '€4.99', first in C (note that my editor doesn't support Unicode so this stuff is needed):
>
puts("\xE2\x82\xAC" "4.99");
>
The euro symbol occupies three bytes in UTF8. It's awkward to type: it has loads of backslashes, it keeps switching case and it needs more concentration.
>
Plus I had to split the string since apparently \x doesn't stop at two hex digits, it keeps going: it would have read \xAC4, which overflows the 8-bit width of a character anyway, so I don't know what the point is of reading more than 2 hex characters.
>
Using my feature, it looks like this:
>
println "\H E2 82 AC\4.99"
>
I don't see any improvement of significance. The improvement, if any, is very minor.
It /does/ happen. In such cases, I type a bunch of hex values.(I gather you have other conveniences for your language's printing features when converting various types, but that's a different matter.)It never happens that you want to type a bunch of hex byte values to initialise a byte array? OK.
>
The obvious answer to writing this kind of thing is simply to switch to an editor that supports UTF-8.
Yes - that's the point. It would be for expressing binary blob data in a compact form as a string of hex digits, with or without spaces, and convenient for copy-and-paste from hex editors and other such sources. You could happily use h"..." rather than b"..." if you prefer. And I suppose it could be extended to support lumps bigger than 8 bits, but then endian issues complicate matters and I suspect it is not worth the effort.Why bother with the \H stuff? That's my point - use hex data for data, and text for text. Mixing these is not common enough to make it worth the extra fuss you have to give such negligible extra convenience.Is this a separate feature using 'b'?
>
My suggestion is that it could be helpful to have binary blobs written as hex digits without escapes anywhere, because it is /just/ binary data. I don't object to having optional spaces - that's a fine idea. But just write :
>
b"4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00"
b"B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00"
>
The extra "\H" adds nothing useful.
Because in my scheme, \H is just another string escape code, which can be used in ordinary strings,That is what I would want to avoid. Being able to mix such data is a disadvantage, not an advantage. (IMHO, of course.)
and b"" strings define char[] data which can include normal text data too.And that kind of monstrosity is what I was trying to get away from.
So my example could have been written as b"MZ\h 90 00 03 ..."
I did look at having a separate feature, but I didn't want that. I ended up with these scheme for data-strings, here expressed using C types:It is a mistake to have too many similar-looking alternatives with different rules as to when and where they can be used.
Can initialise:
"abcd" char* only
s"abcd" char*, char[] or any T[]; zero-terminated
b"abcd" char*, char[] or any T[]
sinclude"file" char*, char[] or any T[]; zero-terminated
binclude"file" char*, char[] or any T[]
The first 3 can include any string escapes including \H...\
The last two embed file data, binary or text. But if a normal C-style string is needed with no embedded zeros except at the end, sinclude should be used with a text file.
>
>
>>>
(The 's'/'b' prefixes are needed for strings to have a type of (in C terms) char[] rather than char*, a detail that C glosses over via some magic. 's' gives you a zero terminator, 'b' as used here doesn't. The "+" is used for compile-time string/data-string concatenation.)
>
In short, more is possible without needed to resort to tools. You can directly work from a hex dump.
Les messages affichés proviennent d'usenet.