Re: Unicode in strings

Liste des GroupesRevenir à c arch 
Sujet : Re: Unicode in strings
De : monnier (at) *nospam* iro.umontreal.ca (Stefan Monnier)
Groupes : comp.arch
Date : 04. Jun 2024, 21:03:39
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <jwvcyowmr0r.fsf-monnier+comp.arch@gnu.org>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13
User-Agent : Gnus/5.13 (Gnus v5.13)
For text editors, this is one of the few cases it makes sense to use 32 or
64 bit characters (say, combining the 'character' with some additional
metadata such as formatting).

Even just 64bit is very tight to encode all the information in an emoji.

Though, one thing that makes sense for text editors is if only the
"currently being edited" lines are fully unpacked, whereas the others can
remain in a more compact form (such as UTF-8), and are then unpacked as they
come into view (say, treating the editor window as a 32-entry modulo cache
or similar).

You sufficiently rarely need to care about "character boundaries" that
such encoding/decoding is probably not worthwhile (especially if you
consider the case of multi-MB lines).

It's easy enough to move through UTF-8 itself.

Not entirely sure how other text editors manage things here, not really
looked into it.

Several different options.
Emacs uses a gap buffer, which is a quite primitive approach which in
theory has poor worst case behavior but works surprisingly well in
practice (especially with the speed at which current CPUs can copy/move
large chunks of memory).
Others use structures like ropes.

https://coredumped.dev/2023/08/09/text-showdown-gap-buffers-vs-ropes/


        Stefan

Date Sujet#  Auteur
31 May 24 * Re: Unicode in strings11BGB
31 May 24 +* Re: Unicode in strings6MitchAlsup1
31 May 24 i`* Re: Unicode in strings5BGB
31 May 24 i +* Re: Unicode in strings2MitchAlsup1
31 May 24 i i`- Re: Unicode in strings1BGB
3 Jun 24 i `* Re: Unicode in strings2Lawrence D'Oliveiro
4 Jun 24 i  `- Re: Unicode in strings1Lawrence D'Oliveiro
3 Jun 24 +* Re: Unicode in strings2Lawrence D'Oliveiro
3 Jun 24 i`- Re: Unicode in strings1BGB
4 Jun 24 `* Re: Unicode in strings2Stefan Monnier
5 Jun 24  `- Re: Unicode in strings1Lawrence D'Oliveiro

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal