Sujet : Re: "A diagram of C23 basic types"
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.lang.cDate : 03. Apr 2025, 10:17:59
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vsljr0$70lp$2@dont-email.me>
References : 1 2 3 4 5 6
User-Agent : Mozilla Thunderbird
On 4/3/2025 3:15 AM, Janis Papanagnou wrote:
On 03.04.2025 09:38, David Brown wrote:
On 03/04/2025 05:43, Janis Papanagnou wrote:
>
In many other languages you have abstractions of numeric types, not
every implicitly encoding variant revealed an the programming level.
>
That's often fine within a program, but sometimes you need to exchange
data with other programs. In particular, C is the standard language for
common libraries - being able to reliably and consistently exchange data
with other languages and other machines is thus very important.
I consider this an important point! - My background is a bit different,
though; I was working in system environments not restricted to single
languages, single OSes, or systems originating from the same vendor
or even the same country. For data exchange it was important to have
a standard transfer syntax independent of the data types of a specific
programming language. - Don't get me wrong; in the past I've also used
those byte-reverting library functions (for endian'ness), sometimes an
object serialization, but also CORBA, and preferable even ASN.1 (with
an associated transfer syntax). - Being platform/language independent
requires of course an abstraction layer.
This "flexibility" of various sorts of numeric "subtypes", be it in
Fortran, Algol 68, or "C", always appeared odd to me. Things like the
ranged types (say as Pascal or Ada provided) seemed more appropriate
to me for a high-level language.
My world is a bit different, where byte-level handling of memory and aggressive bit-twiddling tends to dominate pretty much everything.
It is rare to go much higher level than this.
Not used XSN.1 much, and have often taken a different approach WRT TLV formats.
One approach I had used that is "kinda nifty" is to assume that Tag/TWOCC/FOURCC values may only contain values in the range of 20..7E, and lengths are stored as a bitwise inverse, so:
(20-7E) (80-FF): Byte-Tag, Length=0-127
(20-7E) (20-7E) XX (80-FF): TWOCC-Tag, Length=0-32767
...
This makes it possible to unambiguously disambiguate a shorter tag from a longer one, while also allowing for simple encode/decode logic.
Where, 00..1F and 7F can either be seen as special markers (such as for stream resync), special uses (such as using 00 to pad a shorter tag in cases where a longer length field was needed), or may indicate that decoding has broken.
(20-7E) (20-7E) 00 00 XX XX XX (80-FF)
TWOCC was promoted to 4 bytes, say because length was too long.
For the most part, I often tend to use TWOCC tags, as FOURCC is often unnecessary and bytes aren't free (but, FOURCC's make sense for top-level magic values).
Byte-Tags are usually limited to context-specific uses. When used with FOURCC's, it is superficially similar to RIFF apart from the bitwise-inverted length fields. If using 'LIST' chunks or similar, basically the same as RIFF in this sense.
In many contexts, structs would be represented as fixed structs rather than decomposed into individual tags for each member.
Usually no IDL's or IDL compilers, just "ye olde while() loop" or similar.
...
Janis