Re: Byte Addressability And Beyond

Liste des GroupesRevenir à c arch 
Sujet : Re: Byte Addressability And Beyond
De : anton (at) *nospam* mips.complang.tuwien.ac.at (Anton Ertl)
Groupes : comp.arch
Date : 30. May 2024, 16:35:37
Autres entêtes
Organisation : Institut fuer Computersprachen, Technische Universitaet Wien
Message-ID : <2024May30.173537@mips.complang.tuwien.ac.at>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14
User-Agent : xrn 10.11
EricP <ThatWouldBeTelling@thevillage.com> writes:
Stefan Monnier wrote:
I've not dealt with UTF-8 or code points but that's because I've not
written software that interacts with the non 1-byte character markets.
But even something as simple as sanitizing a character string to feed
into SQL will have to.
AFAIK you can do that by treating the UTF-8 byte sequence as if it were
an ASCII byte-sequence: all the Unicode weirdness is neatly stashed in
bytes >127 which aren't used by SQL itself anyway.
        Stefan
Of course with apologies to Herr Koenig's umlauts. :-)
>
And what of all those new Asian customers your company was hoping
to get by dealing with them in their native written language???
You could always explain to the company president that
you only work in ASCII so they should just get used to it.
 
I think you misunderstand: the code written to sanitize an ASCII string to
feed into SQL will *just work* to sanitize a UTF-8 string to feed
into SQL, no matter how many funny characters and joiners and combiners
and emojis you have in there.
 
That's part of the reason why UTF-8 is so popular: you can surprisingly
often treat it as "good old ASCII".
 
 
        Stefan
>
Ok, you accept international character data, you just don't have to
check >127 characters for "drop table" etc commands.

Actually what you check for is meta-characters like ; " '.  They are
all ASCII, so as long as your code is 8-bit-clean, your SQL string
sanitizer needs to know nothing about UTF-8.

I don't think you are being paranoid enough.
I still think you have to validate or sanitize the >127 string to
ensure the code sequences only contain well formed characters.

Then run your string through a checker/normalizer before or
afterwards.  No need to complicate your SQL sanitizer by trying to do
both at the same time.  But if you want the last bit of performance by
doing both at the same time, then you certainly don't want to convert
to UTF-32 and back.

Random hack thought #1: if the string I send starts with an umlaut as
the first code point, which doesn't display because it is invalid.

I found that hard to understand.  Do you mean that the string starts
with a composing diaresis code point and is invalid because it has no
preceding basis with which to compose?  The string may fail at the
Unicode checking/normalization stage (depending on what it checks).

Then someone edits the first char to a/o/u and magically it changes
to a different character, and deposits now go to a different account.

If someone can edit the string, and that changes where deposits go to,
someone can do that even with no Unicode involved.  E.g., if someone
can change "EricP" to "Ertl".  However, my impression is that banks
use account numbers (pure ASCII) for deposits, names are used only for
validation; so if you provide the wrong name, a money transfer may
fail to go through (not sure what happens if a deposit does not go
through), but won't be to the wrong account.

Random hack thought #2: If a character has multiple combiner code points,
does changing the order create a different character or do they map to
the same display character? Or worse, maybe combiner code point order
sensitivity is character dependent, some are, some are not.
If they do display the same, then I might create two accounts that
look identical but index differently, and redirect deposits.

That's solved by normalization.

Here's a story from work I had to do a while ago: users provided data
through some tools written in Python, that data was somehow aggregated
into one csv file (maybe with cat), and there was a Python3 script I
had to run for processing the data.  Now some users provided the data
as Latin-1 and some as UTF-8, so the csv file contained a mixture of
that.  The Python3 script dutyfully reported an error on reading the
csv file as guidelines recomment.  This was the wrong thing to do in
this application, as continuing to have this mixture was harmless.

I then wrote a small program (in Gforth) that converted such mixed
files to UTF-8, and that was one of the few uses of the Gforth words
for dealing with UTF-8 that I needed (in most other cases strings are
treated just as opaque data).  The principle was to see if the next
bytes were an UTF-8 code point or ASCII; if so, just output them.  If
they were neither, the next byte is a Latin-1 character, and is
converted to UTF-8.  Fortunately, there is no overlap between the
Latin-1 characters that occured in these data and the bytes that start
a non-ASCII UTF-8 code point.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
  Mitch Alsup, <c17fcd89-f024-40e7-a594-88a85ac10d20o@googlegroups.com>

Date Sujet#  Auteur
1 May 24 * Byte Addressability And Beyond590Lawrence D'Oliveiro
1 May 24 +* Re: Byte Addressability And Beyond431John Levine
1 May 24 i+* Re: Byte Addressability And Beyond409Lawrence D'Oliveiro
1 May 24 ii+* Re: Byte Addressability And Beyond3John Levine
1 May 24 iii+- Re: Byte Addressability And Beyond1John Levine
1 May 24 iii`- Re: Byte Addressability And Beyond1Lawrence D'Oliveiro
1 May 24 ii+- Re: Byte Addressability And Beyond1Michael S
1 May 24 ii`* Re: Byte Addressability And Beyond404John Levine
2 May 24 ii +* Re: Byte Addressability And Beyond382Lawrence D'Oliveiro
2 May 24 ii i+* Re: Byte Addressability And Beyond4John Levine
2 May 24 ii ii`* Re: Byte Addressability And Beyond3Lawrence D'Oliveiro
2 May 24 ii ii `* Re: Byte Addressability And Beyond2John Levine
5 May 24 ii ii  `- Re: Byte Addressability And Beyond1Lawrence D'Oliveiro
2 May 24 ii i+* Re: Byte Addressability And Beyond367John Savard
2 May 24 ii ii+* Re: Byte Addressability And Beyond2MitchAlsup1
11 May 24 ii iii`- Re: Byte Addressability And Beyond1John Savard
4 May 24 ii ii`* Re: Byte Addressability And Beyond364Lawrence D'Oliveiro
8 May 24 ii ii `* Re: Byte Addressability And Beyond363John Savard
8 May 24 ii ii  +* Re: Byte Addressability And Beyond2Lawrence D'Oliveiro
10 May 24 ii ii  i`- Re: Byte Addressability And Beyond1David Brown
8 May 24 ii ii  `* Re: Byte Addressability And Beyond360MitchAlsup1
8 May 24 ii ii   `* Re: Byte Addressability And Beyond359John Levine
8 May 24 ii ii    +* Re: Byte Addressability And Beyond357Lawrence D'Oliveiro
9 May 24 ii ii    i`* Re: Byte Addressability And Beyond356John Levine
10 May 24 ii ii    i +* Re: Byte Addressability And Beyond354David Brown
10 May 24 ii ii    i i`* Re: Byte Addressability And Beyond353Anton Ertl
11 May 24 ii ii    i i `* Re: Byte Addressability And Beyond352David Brown
11 May 24 ii ii    i i  `* Re: Byte Addressability And Beyond351Anton Ertl
11 May 24 ii ii    i i   +* Re: Byte Addressability And Beyond158David Brown
11 May 24 ii ii    i i   i+- Re: Byte Addressability And Beyond1Anton Ertl
27 May 24 ii ii    i i   i`* Re: Byte Addressability And Beyond156Lawrence D'Oliveiro
27 May 24 ii ii    i i   i `* Re: Byte Addressability And Beyond155John Levine
27 May 24 ii ii    i i   i  `* Re: Byte Addressability And Beyond154Lawrence D'Oliveiro
27 May 24 ii ii    i i   i   `* Re: Byte Addressability And Beyond153John Levine
27 May 24 ii ii    i i   i    +* Re: Byte Addressability And Beyond149John Levine
27 May 24 ii ii    i i   i    i+- Re: Byte Addressability And Beyond1MitchAlsup1
28 May 24 ii ii    i i   i    i`* Re: Byte Addressability And Beyond147Lawrence D'Oliveiro
28 May 24 ii ii    i i   i    i +- Re: encoding conversion, Byte Addressability And Beyond1John Levine
28 May 24 ii ii    i i   i    i `* Re: Byte Addressability And Beyond145Thomas Koenig
29 May 24 ii ii    i i   i    i  +* Re: Byte Addressability And Beyond137Lawrence D'Oliveiro
29 May 24 ii ii    i i   i    i  i`* Re: Byte Addressability And Beyond136Anton Ertl
29 May 24 ii ii    i i   i    i  i +* Re: Byte Addressability And Beyond12Stefan Monnier
29 May 24 ii ii    i i   i    i  i i+* Re: Byte Addressability And Beyond10Stefan Monnier
29 May 24 ii ii    i i   i    i  i ii+* Re: Byte Addressability And Beyond3John Levine
30 May 24 ii ii    i i   i    i  i iii`* Re: Byte Addressability And Beyond2George Neuner
4 Jun 24 ii ii    i i   i    i  i iii `- Re: Byte Addressability And Beyond1George Neuner
30 May 24 ii ii    i i   i    i  i ii`* Re: Byte Addressability And Beyond6Anton Ertl
4 Jun 24 ii ii    i i   i    i  i ii +- Re: Byte Addressability And Beyond1Lawrence D'Oliveiro
4 Jun 24 ii ii    i i   i    i  i ii `* Re: Byte Addressability And Beyond4Stefan Monnier
7 Jun 24 ii ii    i i   i    i  i ii  +- Re: Byte Addressability And Beyond1Terje Mathisen
7 Jun 24 ii ii    i i   i    i  i ii  `* Re: Character non-equivalence, was Byte Addressability And Beyond2John Levine
9 Jun 24 ii ii    i i   i    i  i ii   `- Re: Character non-equivalence, was Byte Addressability And Beyond1Lawrence D'Oliveiro
30 May 24 ii ii    i i   i    i  i i`- Re: Byte Addressability And Beyond1Lawrence D'Oliveiro
30 May 24 ii ii    i i   i    i  i +* Re: Byte Addressability And Beyond117Lawrence D'Oliveiro
30 May 24 ii ii    i i   i    i  i i+* Re: architectural goals, Byte Addressability And Beyond66John Levine
30 May 24 ii ii    i i   i    i  i ii+- Re: architectural goals, Byte Addressability And Beyond1Stephen Fuld
30 May 24 ii ii    i i   i    i  i ii+* Re: architectural goals, Byte Addressability And Beyond22Anton Ertl
30 May 24 ii ii    i i   i    i  i iii`* Re: architectural goals, Byte Addressability And Beyond21Thomas Koenig
30 May 24 ii ii    i i   i    i  i iii +* Re: architectural goals, Byte Addressability And Beyond8Michael S
30 May 24 ii ii    i i   i    i  i iii i+- Re: architectural goals, Byte Addressability And Beyond1Thomas Koenig
30 May 24 ii ii    i i   i    i  i iii i+* Re: IBM architectural goals, Byte Addressability And Beyond5John Levine
30 May 24 ii ii    i i   i    i  i iii ii+* Re: IBM architectural goals, Byte Addressability And Beyond2Michael S
30 May 24 ii ii    i i   i    i  i iii iii`- Re: IBM architectural goals, Byte Addressability And Beyond1John Levine
30 May 24 ii ii    i i   i    i  i iii ii`* Re: IBM architectural goals, Byte Addressability And Beyond2Thomas Koenig
30 May 24 ii ii    i i   i    i  i iii ii `- Re: IBM architectural goals, Byte Addressability And Beyond1John Levine
30 May 24 ii ii    i i   i    i  i iii i`- Re: architectural goals, Byte Addressability And Beyond1Anton Ertl
30 May 24 ii ii    i i   i    i  i iii +* Re: architectural goals, Byte Addressability And Beyond3Anton Ertl
30 May 24 ii ii    i i   i    i  i iii i+- Re: architectural goals, Byte Addressability And Beyond1John Levine
30 May 24 ii ii    i i   i    i  i iii i`- Re: architectural goals, Byte Addressability And Beyond1Thomas Koenig
31 May 24 ii ii    i i   i    i  i iii +* Re: architectural goals, Byte Addressability And Beyond5Terje Mathisen
1 Jun 24 ii ii    i i   i    i  i iii i`* Re: architectural goals, Byte Addressability And Beyond4Thomas Koenig
1 Jun 24 ii ii    i i   i    i  i iii i `* Re: architectural goals, Byte Addressability And Beyond3Anton Ertl
2 Jun 24 ii ii    i i   i    i  i iii i  `* Re: architectural goals, Byte Addressability And Beyond2John Levine
4 Jun 24 ii ii    i i   i    i  i iii i   `- Re: architectural goals, Byte Addressability And Beyond1Stefan Monnier
4 Jun 24 ii ii    i i   i    i  i iii `* Re: architectural goals, Byte Addressability And Beyond4Lawrence D'Oliveiro
4 Jun 24 ii ii    i i   i    i  i iii  +- Re: architectural goals, Byte Addressability And Beyond1MitchAlsup1
4 Jun 24 ii ii    i i   i    i  i iii  +- Re: architectural goals, Byte Addressability And Beyond1Lynn Wheeler
4 Jun 24 ii ii    i i   i    i  i iii  `- Re: architectural goals, Byte Addressability And Beyond1Stefan Monnier
31 May 24 ii ii    i i   i    i  i ii`* Re: architectural goals, Byte Addressability And Beyond42John Savard
31 May 24 ii ii    i i   i    i  i ii `* Re: architectural goals, Byte Addressability And Beyond41John Levine
1 Jun 24 ii ii    i i   i    i  i ii  +* Re: architectural goals, Byte Addressability And Beyond31John Savard
1 Jun 24 ii ii    i i   i    i  i ii  i+* Re: architectural goals, Byte Addressability And Beyond20Thomas Koenig
2 Jun 24 ii ii    i i   i    i  i ii  ii+* Re: architectural goals, Byte Addressability And Beyond6John Savard
2 Jun 24 ii ii    i i   i    i  i ii  iii`* Re: architectural goals, Byte Addressability And Beyond5Thomas Koenig
2 Jun 24 ii ii    i i   i    i  i ii  iii +* Re: architectural goals, Byte Addressability And Beyond3John Levine
3 Jun 24 ii ii    i i   i    i  i ii  iii i`* Re: architectural goals, Byte Addressability And Beyond2OrangeFish
3 Jun 24 ii ii    i i   i    i  i ii  iii i `- Re: architectural goals, Byte Addressability And Beyond1John Levine
4 Jun 24 ii ii    i i   i    i  i ii  iii `- Re: architectural goals, Byte Addressability And Beyond1Lawrence D'Oliveiro
4 Jun 24 ii ii    i i   i    i  i ii  ii`* Re: architectural goals, Byte Addressability And Beyond13Lawrence D'Oliveiro
5 Jun 24 ii ii    i i   i    i  i ii  ii `* Re: architectural goals, Byte Addressability And Beyond12Lawrence D'Oliveiro
5 Jun 24 ii ii    i i   i    i  i ii  ii  +- Re: architectural goals, Byte Addressability And Beyond1Lawrence D'Oliveiro
6 Jun 24 ii ii    i i   i    i  i ii  ii  `* Re: architectural goals, Byte Addressability And Beyond10George Neuner
6 Jun 24 ii ii    i i   i    i  i ii  ii   +* Re: architectural goals, Byte Addressability And Beyond6John Levine
7 Jun 24 ii ii    i i   i    i  i ii  ii   i+* Re: architectural goals, Byte Addressability And Beyond4Lawrence D'Oliveiro
7 Jun 24 ii ii    i i   i    i  i ii  ii   ii`* Re: architectural goals, Byte Addressability And Beyond3Stephen Fuld
7 Jun 24 ii ii    i i   i    i  i ii  ii   ii `* Re: architectural goals, Byte Addressability And Beyond2Lawrence D'Oliveiro
7 Jun 24 ii ii    i i   i    i  i ii  ii   ii  `- Re: architectural goals, Byte Addressability And Beyond1Stephen Fuld
7 Jun 24 ii ii    i i   i    i  i ii  ii   i`- Re: architectural goals, Byte Addressability And Beyond1Terje Mathisen
6 Jun 24 ii ii    i i   i    i  i ii  ii   +- Re: architectural goals, Byte Addressability And Beyond1Lynn Wheeler
6 Jun 24 ii ii    i i   i    i  i ii  ii   +- Re: architectural goals, Byte Addressability And Beyond1OrangeFish
7 Jun 24 ii ii    i i   i    i  i ii  ii   `- Re: architectural goals, Byte Addressability And Beyond1Lawrence D'Oliveiro
2 Jun 24 ii ii    i i   i    i  i ii  i`* Re: architectural goals, Byte Addressability And Beyond10John Dallman
2 Jun 24 ii ii    i i   i    i  i ii  +- Re: architectural goals, Byte Addressability And Beyond1Michael S
2 Jun 24 ii ii    i i   i    i  i ii  +- Re: architectural goals, Byte Addressability And Beyond1John Dallman
4 Jun 24 ii ii    i i   i    i  i ii  `* Re: architectural goals, Byte Addressability And Beyond7Lawrence D'Oliveiro
30 May 24 ii ii    i i   i    i  i i+* Re: Byte Addressability And Beyond49Stephen Fuld
30 May 24 ii ii    i i   i    i  i i`- Re: Byte Addressability And Beyond1Anton Ertl
30 May 24 ii ii    i i   i    i  i +* Re: Byte Addressability And Beyond2Lawrence D'Oliveiro
30 May 24 ii ii    i i   i    i  i `* Re: Byte Addressability And Beyond4Terje Mathisen
30 May 24 ii ii    i i   i    i  `* Re: Byte Addressability And Beyond7Terje Mathisen
28 May 24 ii ii    i i   i    `* Re: Byte Addressability And Beyond3Lawrence D'Oliveiro
12 May 24 ii ii    i i   +* Re: python text, Byte Addressability And Beyond14John Levine
12 May 24 ii ii    i i   `* Re: Byte Addressability And Beyond178Thomas Koenig
27 May 24 ii ii    i `- Re: Byte Addressability And Beyond1Lawrence D'Oliveiro
8 May 24 ii ii    `- Re: Byte Addressability And Beyond1Michael S
2 May 24 ii i`* Re: Byte Addressability And Beyond10MitchAlsup1
2 May 24 ii +* Re: Byte Addressability And Beyond3Michael S
2 May 24 ii `* Re: Byte Addressability And Beyond18Anton Ertl
1 May 24 i+* Byte Order (was: Byte Addressability And Beyond)4Anton Ertl
1 May 24 i`* Re: Byte Addressability And Beyond17Stefan Monnier
1 May 24 +* Re: Byte Addressability And Beyond40MitchAlsup1
1 May 24 +* Re: Byte Addressability And Beyond15Thomas Koenig
1 May 24 +* Re: Byte Addressability And Beyond3Michael S
2 May 24 +* Re: Byte Addressability And Beyond4Lawrence D'Oliveiro
3 May 24 +* Re: Byte Addressability And Beyond75Anton Ertl
5 May 24 +* Re: Byte Addressability And Beyond20John Savard
5 May 24 `- Re: Byte Addressability And Beyond1John Savard

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal