Liste des Groupes | Revenir à cl c |
On 15/06/2024 21:27, bart wrote:The difference is that it can be typed fluently without that annoying \x between every number. Plus I can add white space for grouping without it affecting the data.On 15/06/2024 18:17, David Brown wrote:I don't see any improvement of significance. The improvement, if any, is very minor.On 15/06/2024 00:39, bart wrote:>On 14/06/2024 22:30, Keith Thompson wrote:>
>Now that it's too late to change the definition, I've thought of>
something that I think would have been a better way to specify #embed.
>
Define a new kind of string literal, with a "uc" prefix. `uc"foo"` is
of type `unsigned char[3]`. (Or `const unsigned char[3]`, if that's not
too radical.) Unlike other string literals, there is no implicit
terminating '\0'. Arbitrary byte values can of course be specified in
hexadecimal: uc"\x01\x02\x03\x04". Since there's no terminating null
character and C doesn't support zero-sized objects, uc"" is a syntax
error.
>
uc"..." string literals might be made even simpler, for example allowing
only hex digits and not requiring \x (uc"01020304" rather than
uc"\x01\x02\x03\x04"). That's probably overkill. uc"..." literals
could be useful in other contexts, and programmers will want
flexibility. Maybe something like hex"01020304" (embedded spaces could
be ignored) could be defined in addition to uc"\x01\x02\x03\x04".
That's something I added to string literals in my language within the last few months. Nothing do with embedding (but it can make hex sequences within strings more efficient, if that approach was used).
>
Writing byte-at-a-time hex data was always a bit fiddly:
>
0x12, 0x34, 0xAB, ...
"\x12\x34\xAB...
>
It was made worse by my preference for `x` being in lower case, and the hex digits in upper case, otherwise 0XABC or 0Xab or 0xab look wrong.
>
What I did was create a new, variable-lenghth string escape sequence that looks like this:
>
"ABC\h1234AB...\nopq" // hex sequence between ABC & nopq
>
Hex digits after \h or \H are read in pairs. White space is allowed between pairs:
>
"ABC\H 12 34 AB ...\nopq"
>
The only thing I wasn't sure about was the closing backslash, which looks at first like another escape code. But I think it is sound, although it can still be tweaked.
>
>
How often would something like that be useful? I would have thought that it is rare to see something that is basically text but has enough odd non-printing characters (other than the common \n, \t, \e) to make it worth the fuss. If you want to have binary data in something that looks like a string literal, then just use straight-up two hex digits per character - "4142431234ab". It's simpler to generate and parse. I don't see the benefit of something that mixes binary and text data.
That's not the same thing. That sequence "...1234..." occupies 4 bytes (with values 49 50 51 52), not two bytes (with values 0x12 and 0x34, or 18 and 52).
>
Here's an example of wanting to print '€4.99', first in C (note that my editor doesn't support Unicode so this stuff is needed):
>
puts("\xE2\x82\xAC" "4.99");
>
The euro symbol occupies three bytes in UTF8. It's awkward to type: it has loads of backslashes, it keeps switching case and it needs more concentration.
>
Plus I had to split the string since apparently \x doesn't stop at two hex digits, it keeps going: it would have read \xAC4, which overflows the 8-bit width of a character anyway, so I don't know what the point is of reading more than 2 hex characters.
>
Using my feature, it looks like this:
>
println "\H E2 82 AC\4.99"
>
(I gather you have other conveniences for your language's printing features when converting various types, but that's a different matter.)It never happens that you want to type a bunch of hex byte values to initialise a byte array? OK.
The obvious answer to writing this kind of thing is simply to switch to an editor that supports UTF-8.
Why bother with the \H stuff? That's my point - use hex data for data, and text for text. Mixing these is not common enough to make it worth the extra fuss you have to give such negligible extra convenience.Is this a separate feature using 'b'? Because in my scheme, \H is just another string escape code, which can be used in ordinary strings, and b"" strings define char[] data which can include normal text data too.
My suggestion is that it could be helpful to have binary blobs written as hex digits without escapes anywhere, because it is /just/ binary data. I don't object to having optional spaces - that's a fine idea. But just write :
b"4D 5A 90 00 03 00 00 00 04 00 00 00 FF FF 00 00"
b"B8 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00"
The extra "\H" adds nothing useful.
>
(The 's'/'b' prefixes are needed for strings to have a type of (in C terms) char[] rather than char*, a detail that C glosses over via some magic. 's' gives you a zero terminator, 'b' as used here doesn't. The "+" is used for compile-time string/data-string concatenation.)
>
In short, more is possible without needed to resort to tools. You can directly work from a hex dump.
Les messages affichés proviennent d'usenet.