Re: Simple string conversion from UCS2 to ISO8859-1

Liste des GroupesRevenir à cl c 
Sujet : Re: Simple string conversion from UCS2 to ISO8859-1
De : pozzugno (at) *nospam* gmail.com (pozz)
Groupes : comp.lang.c
Date : 21. Feb 2025, 15:53:02
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vpa40d$3a0k4$6@dont-email.me>
References : 1 2
User-Agent : Mozilla Thunderbird
Il 21/02/2025 15:23, David Brown ha scritto:
On 21/02/2025 12:40, pozz wrote:
I want to write a simple function that converts UCS2 string into ISO8859-1:
>
void ucs2_to_iso8859p1(char *ucs2, size_t size);
>
ucs2 is a string of type "00480065006C006C006F" for "Hello". I'm passing size because ucs2 isn't null terminated.
>
I know I can use iconv() feature, but I'm on an embedded platform without an OS and without iconv() function.
>
It is trivial to convert "0000"-"007F" chars: it's a simple cast from unsigned int to char.
>
It isn't so simple to convert higher codes. For example, the small e with grave "00E8" can be converted to 0xE8 in ISO8859-1, so it's trivial again. But I saw the code "2019" (apostrophe) that can be rendered as 0x27 in ISO8859-1.
>
Is there a simplified mapping table that can be written with if/switch?
>
if (code < 0x80) {
   *dst++ = (char)code;
} else {
   switch (code) {
     case 0x2019: *dst++ = 0x27; break;  // Apostrophe
     case 0x...: *dst++ = ...; break;
     default: *ds++ = ' ';
   }
}
>
I'm not searching a very detailed and correct mapping, but just a "sufficient" implementation.
>
>
 <https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane>
 As has been mentioned by others, 0 - 0xff should be a direct translation (with the possible exception of Latin-9 differences).
 <https://en.wikipedia.org/wiki/ISO/IEC_8859-15>
  When you look that BMP blocks above the first two blocks (0 - 0x7f, 0x80 - 0xff), you will quickly see that virtually none of them make any sense to support in the way you are thinking.  Just because a couple of the characters in the Thaana block look a bit like quotation marks, does not mean it makes any sense to try to transliterate them.  Realistically, you can at most make use of a few punctuation symbols (like 0x2019 above), and maybe approximate forms for some extended Latin alphabet characters that you will never see in practice.  Oh, and you might be able to support those spam emails that use Greek and other letters that look like Latin letters such as "ՏΡ𐊠Ꮇ" to fool filters.  And that's assuming you have output support for the full Latin-1 or Latin-9 range.
  Unicode is rarely much use unless you want and can provide good support for non-Latin alphabets.  Otherwise your translations are going to be so limited and simple that they are barely worth the effort and won't cover anything useful.
  So here I would say that whoever provides the text, provides it in Latin-9 encoding.  There's no point in allowing external translators to use whatever characters they feel is best in their language, and then your code makes some kind of odd approximation giving results that look different.  If someone really wants to use the letter "ā" that is found in the Latin Extended A block, how do /you/ know whether the best Latin-9 match is "a", "ã", "ä", or something different like "aa" or an alternative spelling of the word?  Maybe the rules are different for Latvian and Anglicised Mandarin.
  When we have worked with multiple languages on small embedded systems (too small for big fonts and UTF-8), we have used one of three techniques :
 1. Insist that the external translators provide strings in Latin-9 only (or even just ASCII when the system was more restricted).
 2. Use primarily ASCII, with a few user-defined characters per language (that's useful for old-style character displays with space for perhaps 8 user-defined characters).
 3. Use a PC program to figure out the characters actually used in the strings, and put them into a single table indexing a generated list of bitmap glyphs, also generated by the program (from freely available fonts).  The source is, naturally, UTF-8 - the strings stored in the embedded system are not in any standard encoding representing characters, but now hold glyph table indices.
  Your idea here sounds to me like a lot of work for virtually no benefit.
Yes, you're right. My question comes from an SMS text received by a 4G network modem. The reply to AT+CMGR command for a specific SMS reported the text in UCS2. The SMS was one sent by the mobile operator with balance of the prepaid SIM card.
The text included the apostrophe coded as U+2019 instead of U+0027. I suspect the developer that wrote the text in the mobile operator systems was using UTF-8 (or UTF-16) and inserted exactly U+2019 (maybe it did wrong).
Anyway I think I can live without that.

Date Sujet#  Auteur
21 Feb 25 * Simple string conversion from UCS2 to ISO8859-165pozz
21 Feb 25 +* Re: Simple string conversion from UCS2 to ISO8859-129Richard Damon
21 Feb 25 i`* Re: Simple string conversion from UCS2 to ISO8859-128pozz
21 Feb 25 i +* Re: Simple string conversion from UCS2 to ISO8859-116Janis Papanagnou
21 Feb 25 i i+- Re: Simple string conversion from UCS2 to ISO8859-11Janis Papanagnou
21 Feb 25 i i`* Re: Simple string conversion from UCS2 to ISO8859-114Keith Thompson
21 Feb 25 i i `* Re: Simple string conversion from UCS2 to ISO8859-113Janis Papanagnou
22 Feb 25 i i  `* Re: Simple string conversion from UCS2 to ISO8859-112David Brown
22 Feb 25 i i   +* Re: Simple string conversion from UCS2 to ISO8859-15Janis Papanagnou
22 Feb 25 i i   i+- Re: Simple string conversion from UCS2 to ISO8859-11David Brown
22 Feb 25 i i   i`* Re: Simple string conversion from UCS2 to ISO8859-13Lawrence D'Oliveiro
24 Feb 25 i i   i `* Re: Simple string conversion from UCS2 to ISO8859-12Janis Papanagnou
24 Feb 25 i i   i  `- Re: Simple string conversion from UCS2 to ISO8859-11Lawrence D'Oliveiro
22 Feb 25 i i   `* Re: Simple string conversion from UCS2 to ISO8859-16Richard Damon
22 Feb 25 i i    +- Re: Simple string conversion from UCS2 to ISO8859-11David Brown
22 Feb 25 i i    +* Re: Simple string conversion from UCS2 to ISO8859-12Janis Papanagnou
23 Feb 25 i i    i`- Re: Simple string conversion from UCS2 to ISO8859-11Richard Damon
22 Feb 25 i i    +- Re: Simple string conversion from UCS2 to ISO8859-11Lawrence D'Oliveiro
23 Feb 25 i i    `- Re: Simple string conversion from UCS2 to ISO8859-11Waldek Hebisch
22 Feb 25 i +- Re: Simple string conversion from UCS2 to ISO8859-11Richard Damon
22 Feb 25 i `* Re: Simple string conversion from UCS2 to ISO8859-110Lawrence D'Oliveiro
22 Feb 25 i  `* Re: Simple string conversion from UCS2 to ISO8859-19Janis Papanagnou
22 Feb 25 i   +* Re: Simple string conversion from UCS2 to ISO8859-13Lawrence D'Oliveiro
22 Feb 25 i   i`* Re: Simple string conversion from UCS2 to ISO8859-12Janis Papanagnou
22 Feb 25 i   i `- Re: Simple string conversion from UCS2 to ISO8859-11Lawrence D'Oliveiro
23 Feb 25 i   +- Re: Simple string conversion from UCS2 to ISO8859-11James Kuyper
23 Feb 25 i   +- Re: Simple string conversion from UCS2 to ISO8859-11Lawrence D'Oliveiro
23 Feb 25 i   `* Re: Simple string conversion from UCS2 to ISO8859-13Kaz Kylheku
24 Feb 25 i    `* Re: Simple string conversion from UCS2 to ISO8859-12Janis Papanagnou
24 Feb 25 i     `- Re: Simple string conversion from UCS2 to ISO8859-11Lawrence D'Oliveiro
21 Feb 25 +* Re: Simple string conversion from UCS2 to ISO8859-12David Brown
21 Feb 25 i`- Re: Simple string conversion from UCS2 to ISO8859-11pozz
21 Feb 25 +* Re: Simple string conversion from UCS2 to ISO8859-130Keith Thompson
22 Feb 25 i`* Re: Simple string conversion from UCS2 to ISO8859-129David Brown
24 Feb 25 i `* Re: Simple string conversion from UCS2 to ISO8859-128pozz
24 Feb 25 i  `* Re: Simple string conversion from UCS2 to ISO8859-127Lawrence D'Oliveiro
25 Feb 25 i   +* Re: Simple string conversion from UCS2 to ISO8859-12pozz
25 Feb 25 i   i`- Re: Simple string conversion from UCS2 to ISO8859-11Lawrence D'Oliveiro
25 Feb 25 i   `* Re: Simple string conversion from UCS2 to ISO8859-124pozz
25 Feb 25 i    `* Re: Simple string conversion from UCS2 to ISO8859-123Richard Damon
25 Feb 25 i     `* Re: Simple string conversion from UCS2 to ISO8859-122pozz
25 Feb 25 i      +* Re: Simple string conversion from UCS2 to ISO8859-115David Brown
26 Feb 25 i      i`* [OT] Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)14Janis Papanagnou
26 Feb 25 i      i +* Re: [OT] Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)2David Brown
26 Feb 25 i      i i`- Re: [OT] Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)1Janis Papanagnou
26 Feb 25 i      i `* Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)11Lawrence D'Oliveiro
27 Feb 25 i      i  `* Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)10Janis Papanagnou
27 Feb 25 i      i   `* Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)9David Brown
27 Feb 25 i      i    +- Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)1Richard Heathfield
27 Feb 25 i      i    +* Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)5bart
28 Feb 25 i      i    i+* Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)2Lawrence D'Oliveiro
28 Feb 25 i      i    ii`- Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)1Janis Papanagnou
28 Feb 25 i      i    i+- Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)1James Kuyper
28 Feb 25 i      i    i`- Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)1David Brown
28 Feb 25 i      i    `* Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)2Janis Papanagnou
28 Feb 25 i      i     `- Re: Standards (was Re: Simple string conversion from UCS2 to ISO8859-1)1David Brown
25 Feb 25 i      +* Re: Simple string conversion from UCS2 to ISO8859-13Lawrence D'Oliveiro
25 Feb 25 i      i+- Re: Simple string conversion from UCS2 to ISO8859-11pozz
26 Feb 25 i      i`- Re: Simple string conversion from UCS2 to ISO8859-11Richard Damon
26 Feb 25 i      `* Re: Simple string conversion from UCS2 to ISO8859-13Lawrence D'Oliveiro
26 Feb 25 i       `* Re: Simple string conversion from UCS2 to ISO8859-12Keith Thompson
26 Feb 25 i        `- Re: Simple string conversion from UCS2 to ISO8859-11David Brown
22 Feb 25 +- Re: Simple string conversion from UCS2 to ISO8859-11Kaz Kylheku
25 Feb 25 +- Re: Simple string conversion from UCS2 to ISO8859-11Richard Harnden
1 Mar 25 `- Re: Simple string conversion from UCS2 to ISO8859-11Geoff

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal