Re: Unicode in strings

Liste des GroupesRevenir à c arch 
Sujet : Re: Unicode in strings
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 31. May 2024, 18:14:19
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v3d0hj$2amga$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12
User-Agent : Mozilla Thunderbird
On 5/30/2024 11:25 AM, Anton Ertl wrote:
Stefan Monnier <monnier@iro.umontreal.ca> writes:
I'm not sure the codepoint-oriented API is the best option, but it's not
completely clear what *is* the best option.  You mention a byte-oriented
API and you might be right that it's a better option, but in the case of
Emacs that's what we used in Emacs-20.1 but it worked really poorly
because of backward compatibility issues.  I think if we started from
scratch now (i.e. without having to contend with backward compatibility,
and with a better understanding of Unicode (which barely existed back
then)) it might work better, indeed, but that's not been an option
 Plus, editors are among the very few uses where you have to deal with
individual characters, so the "treat it as opaque string" approach
that works so well for most other code is not good enough there.  The
command-line editor of Gforth is one case where we use the xchar words
(those for dealing with code points of UTF-8).
 
Yeah.
For text editors, this is one of the few cases it makes sense to use 32 or 64 bit characters (say, combining the 'character' with some additional metadata such as formatting).
Though, one thing that makes sense for text editors is if only the "currently being edited" lines are fully unpacked, whereas the others can remain in a more compact form (such as UTF-8), and are then unpacked as they come into view (say, treating the editor window as a 32-entry modulo cache or similar).
For the rest, say, one can have, say, a big buffer, with an array of lines giving the location and size of the line's text in the buffer.
If a line is modified, it can be reallocated at the end of the buffer, and if the buffer gets full, it can be "repacked" and/or expanded as needed. When written back to a file, the buffer lines can be emitted in-order to the text file.
Not entirely sure how other text editors manage things here, not really looked into it.

- anton

Date Sujet#  Auteur
31 May 24 * Re: Unicode in strings11BGB
31 May 24 +* Re: Unicode in strings6MitchAlsup1
31 May 24 i`* Re: Unicode in strings5BGB
31 May 24 i +* Re: Unicode in strings2MitchAlsup1
31 May 24 i i`- Re: Unicode in strings1BGB
3 Jun 24 i `* Re: Unicode in strings2Lawrence D'Oliveiro
4 Jun 24 i  `- Re: Unicode in strings1Lawrence D'Oliveiro
3 Jun 24 +* Re: Unicode in strings2Lawrence D'Oliveiro
3 Jun 24 i`- Re: Unicode in strings1BGB
4 Jun 24 `* Re: Unicode in strings2Stefan Monnier
5 Jun 24  `- Re: Unicode in strings1Lawrence D'Oliveiro

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal