Sujet : Re: Rationale for aligning data on even bytes in a Unix shell file?
De : janis_papanagnou+ng (at) *nospam* hotmail.com (Janis Papanagnou)
Groupes : comp.lang.cDate : 27. Apr 2025, 00:12:14
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vujp8e$3dkg1$1@dont-email.me>
References : 1 2 3
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
On 26.04.2025 23:34, Keith Thompson wrote:
scott@slp53.sl.home (Scott Lurndal) writes:
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
In a "C" file (of the Kornshell software) I stumbled across this
comment: "Each command in the history file starts on an even byte
and is null-terminated."
>
I wonder what's the reason behind that even-byte-alignment, on "C"
level or on Unix/files level. Any ideas?
>
Possibly to support 16-bit character sets?
I don't think it supports 16-bit character sets.
Currently it doesn't. - I'm not sure it's intended for a possible
16 bit extension; I wouldn't think so but don't know.
There was some note in the source about supporting 8 bit characters
in "version 1" (instead of 7 bit ASCII, in "version 0"), IIRC. (So
at least it could be possible in principle to support 16 bit with a
"version 2", or so.)
Unlike bash history files, which are plain text, ksh history files
are in a binary format.
I don't know whether the format includes any multi-byte integers.
No, as far as I could see it's (besides the \0-terminated strings)
and the occasional \0 padding byte occasionally just line markers
0x82 0x00 0xNN 0xNN 0xNN 0x00 and some "undo" marker with a version
number 0x81 0x00 (e.g.). So these markers also fit in multiples of
16 bits. (Not sure how these sequences would conflict with 16 bit
characters that have the same encoding.)
If it does, reading such values directly into memory might be easier
on some platforms if they're aligned.
The relevant source file is src/cmd/ksh93/edit/history.c, in
<https://github.com/ksh93/ksh>. It has functions to manipulate the
history file, but I don't see a full description of the file format.
Somewhere in that file I found it... <lookup> ...yes, a comment at
the top of the file. You can find some more details when searching
for the CPP tokens "HIST_CMDNO" and "HIST_UNDO".
Janis