Newsportal USENET - Re: Simple string conversion from UCS2 to ISO8859-1

Re: Simple string conversion from UCS2 to ISO8859-1

Sujet : Re: Simple string conversion from UCS2 to ISO8859-1
De : janis_papanagnou+ng (at) *nospam* hotmail.com (Janis Papanagnou)
Groupes : comp.lang.c
Date : 21. Feb 2025, 14:06:03

Autres entêtes

Organisation : A noiseless patient Spider
Message-ID : <vp9tnr$3dca2$1@dont-email.me>
References : 1 2 3
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0

On 21.02.2025 13:42, pozz wrote:

Il 21/02/2025 13:05, Richard Damon ha scritto:
On 2/21/25 6:40 AM, pozz wrote:
I want to write a simple function that converts UCS2 string into
ISO8859-1:
>
void ucs2_to_iso8859p1(char *ucs2, size_t size);
>
ucs2 is a string of type "00480065006C006C006F" for "Hello". I'm
passing size because ucs2 isn't null terminated.
>

[...]

>
It is trivial to convert "0000"-"007F" chars: it's a simple cast from
unsigned int to char.
>
Note, I think you will find that it is that 0000-00FF that match. (as
I remember ISO8859-1 was the base for starting Unicode).

I second that.

>
It isn't so simple to convert higher codes. For example, the small e
with grave "00E8" can be converted to 0xE8 in ISO8859-1, so it's
trivial again. But I saw the code "2019" (apostrophe) that can be
rendered as 0x27 in ISO8859-1.
>
To be correct, u2019 isn't 0x27, its just character that looks a lot
like it.

Yes, but as a first approximation, 0x27 is much better than '?' for u2019.

Note that there are _standard names_ assigned with the characters.
These are normative what the characters represent. - I strongly
suggest to not twist these standards by assigning different
characters; you will do no one a favor but inflict only confusion
and harm.

Is there a simplified mapping table that can be written with if/switch?
>
if (code < 0x80) {
   *dst++ = (char)code;
} else {
   switch (code) {
   case 0x2019: *dst++ = 0x27; break; // Apostrophe
   case 0x...: *dst++ = ...; break;
   default: *ds++ = ' ';
   }
}
>
I'm not searching a very detailed and correct mapping, but just a
"sufficient" implementation.
>
Then you have to decide which are sufficient mappings. No character
above FF *IS* the character below, but some have a close
approximation, so you will need to decide what to map.

Yes, I have to decide, but it is a very big problem (there are thousands
of Unicode symbols that can be approximated to another ISO8859-1 code).
I'm wondering if such an approximation is just implemented somewhere.

I've just made a run across the names of UCS-2 and ISO 8859-1, based
on their normative names and, as mentioned above already; they match
one-to-one in the ranges 0000-00FF and 00-FF respectively.

BTW; you may want to consider using ISO 8859-15 (Latin 9) instead
of ISO 8859-1 (Latin 1); Latin 1 is widely outdated, and Latin 9
contains a few other characters like the € (Euro Sign). If that is
possible for your context you have to map a handful of characters.

Janis

For example, what iconv() does in this case?

Les messages affichés proviennent d'usenet.

Date	Sujet	#	Auteur
21 Feb 25	Simple string conversion from UCS2 to ISO8859-1	65	pozz
21 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	29	Richard Damon
21 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	28	pozz
21 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	16	Janis Papanagnou
21 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Janis Papanagnou
21 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	14	Keith Thompson
21 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	13	Janis Papanagnou
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	12	David Brown
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	5	Janis Papanagnou
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	David Brown
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	3	Lawrence D'Oliveiro
24 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	2	Janis Papanagnou
24 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Lawrence D'Oliveiro
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	6	Richard Damon
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	David Brown
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	2	Janis Papanagnou
23 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Richard Damon
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Lawrence D'Oliveiro
23 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Waldek Hebisch
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Richard Damon
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	10	Lawrence D'Oliveiro
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	9	Janis Papanagnou
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	3	Lawrence D'Oliveiro
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	2	Janis Papanagnou
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Lawrence D'Oliveiro
23 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	James Kuyper
23 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Lawrence D'Oliveiro
23 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	3	Kaz Kylheku
24 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	2	Janis Papanagnou
24 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Lawrence D'Oliveiro
21 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	2	David Brown
21 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	pozz
21 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	30	Keith Thompson
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	29	David Brown
24 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	28	pozz
24 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	27	Lawrence D'Oliveiro
25 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	2	pozz
25 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Lawrence D'Oliveiro
25 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	24	pozz
25 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	23	Richard Damon
25 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	22	pozz
25 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	15	David Brown
26 Feb 25	[OT] Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	14	Janis Papanagnou
26 Feb 25	Re: [OT] Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	2	David Brown
26 Feb 25	Re: [OT] Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	1	Janis Papanagnou
26 Feb 25	Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	11	Lawrence D'Oliveiro
27 Feb 25	Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	10	Janis Papanagnou
27 Feb 25	Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	9	David Brown
27 Feb 25	Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	1	Richard Heathfield
27 Feb 25	Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	5	bart
28 Feb 25	Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	2	Lawrence D'Oliveiro
28 Feb 25	Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	1	Janis Papanagnou
28 Feb 25	Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	1	James Kuyper
28 Feb 25	Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	1	David Brown
28 Feb 25	Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	2	Janis Papanagnou
28 Feb 25	Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)	1	David Brown
25 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	3	Lawrence D'Oliveiro
25 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	pozz
26 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Richard Damon
26 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	3	Lawrence D'Oliveiro
26 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	2	Keith Thompson
26 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	David Brown
22 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Kaz Kylheku
25 Feb 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Richard Harnden
1 Mar 25	Re: Simple string conversion from UCS2 to ISO8859-1	1	Geoff