Sujet : Re: Simple string conversion from UCS2 to ISO8859-1
De : 643-408-1753 (at) *nospam* kylheku.com (Kaz Kylheku)
Groupes : comp.lang.cDate : 22. Feb 2025, 02:20:20
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <20250221171137.949@kylheku.com>
References : 1
User-Agent : slrn/pre1.0.4-9 (Linux)
On 2025-02-21, pozz <
pozzugno@gmail.com> wrote:
I want to write a simple function that converts UCS2 string into ISO8859-1:
>
void ucs2_to_iso8859p1(char *ucs2, size_t size);
>
ucs2 is a string of type "00480065006C006C006F" for "Hello". I'm passing
size because ucs2 isn't null terminated.
This kind of normalizing is a good way of introducing injection
exploits.
Suppose the input is some syntax that has been validated; the decision
is trusted after that. The normalization to the 8-bit character set can
produce characters which are special in the syntax, changing its
meaning.
In Microsoft Windows, there is an example of such a problem. Programs
which use GetCommandLineA to get the argument string before parsing it
into arguments are vulnerable to argument injection. The attacker
specifies a piece of datum to be used by program A as an argument in
calling program B such that when the datum is decimated to the 8 bit
character set, quotes appear in it, creating additional arguments to
program B.
again. But I saw the code "2019" (apostrophe) that can be rendered as
0x27 in ISO8859-1.
... and that's a common quoting character in various data syntaxes, oops!
What could go wrong?
I think in 2025 we shouldn't have to be crippling Unicode data to fit
some ISO Latin (or any other 8 bit) character set; we should be rooting
out technologies and situations which do that.
-- TXR Programming Language: http://nongnu.org/txrCygnal: Cygwin Native Application Library: http://kylheku.com/cygnalMastodon: @Kazinator@mstdn.ca