URIs within URIs: google.com/url?q= et al.

Liste des GroupesRevenir à c misc 
Sujet : URIs within URIs: google.com/url?q= et al.
De : ivan (at) *nospam* siamics.netREMOVE.invalid (Ivan Shmakov)
Groupes : comp.misc comp.infosystems.www.misc
Suivi-à : comp.misc
Date : 20. Dec 2024, 19:42:28
Autres entêtes
Organisation : Dbus-free station.
Message-ID : <7CehOkmKaRKK7ejb@violet.siamics.net>
References : 1 2 3 4 5 6 7 8
On 2024-11-28, Mike Spencer wrote:

[Cross-posting to news:comp.infosystems.www.misc just in case,
but setting Followup-To: comp.misc still.  Feel free to disregard,
though; if anything, I'll be monitoring both groups for some
time for responses.]

 > Here's a curiosity:

 > Google also sends all of your clicks on search results back through
 > Google.  I assume y'all knew that.

 > If you search for (say):

 >   leon "the professional"

 > you get:

 > https://www.google.com/url
 > ?q=https://en.wikipedia.org/wiki/L%25C3%25A9on:_The_Professional
 > &sa=U&ved=2ahUKEwi [snip tracking hentracks/data]

 > Note that the "real" URL which Google proposes to proxy for you
 > contains non-ASCII characters:

 >   en.wikipedia.org/wiki/L%25C3%25A9on:_The_Professional

 > Wikipedia does *not* *have* a page connected to that URL!  But if you
 > click the link and send it back through Google, you reach the right
 > Wikipedia page that *does* exist:

 >   en.wikipedia.org/wiki/Leon:_The_Professional

And this page clearly states (search for "Redirected from" there)
that it was reached via an alias.  If you follow the "Article"
link from there, it'll lead you to .../L%C3%A9on:_The_Professional
instead, which is the proper URI for that Wikipedia article.

Think of it.  Suppose that Google has to return something like
http://example.com/?o=p&q=http://example.net/ as one of the
results.  Can you just put it after google.com/url?q= directly
without ambiguity?  You'd get:

http://google.com/url?q=http://example.com/?o=p&q=http://example.net/&...
                                               ^1                    ^2

Normally, the URI would start after ?q= and go until the first ^1
occurence of &, but in this case, it'd be actually the second ^2
that terminates the intended URI.  Naturally, Google avoids it
by %-encoding the ?s and &s, like:

http://google.com/url?q=http://example.com/%3fo=p%26q=http://example.net/&...

By the same merit, they need to escape %s themselves, should
the original URI contain any, so e. g. http://example.com/%d1%8a
becomes .../url?q=http://example.com/%25d1%258a&... .

Of course, Google didn't invent any of this: unless I be mistaken,
that's how HTML <form method="get" />s have worked from the get-go.
And you /do/ need something like Hello%3f%20%20Anybody%20home%3f
to put it after /guestbook?comment=.

FWIW, I tend to use the following Perl bits for %-encoding and
decoding, respectively:

s {[^0-9A-Za-z/_.-]}{${ \sprintf ("%%%02x", ord ($&)); }}g;
s {%([0-9a-fA-F]{2})}{${ \chr (hex ($1)); }}g;

 > AFAICT, when spidering the net, Google finds the page that *does*
 > exist, modifies it according to (opaque, unknown) rules of orthography
 > and delivers that to you.  When you send that link back through
 > Google, Google silently reverts the imposed orthographic "correction"
 > so that the link goes to an existing page.

 > Isn't the weird?

There's this bit near the end of the .../Leon:_The_Professional
(line split for readability):

<script type="application/ld+json">{
"@context":"https:\/\/schema.org",
"@type":"Article",
"name":"L\u00e9on: The Professional",
"url":"https:\/\/en.wikipedia.org\/wiki\/L%C3%A9on:_The_Professional",
[...]

I'm pretty certain that Google /does/ parse JSON-LD like in the
above, so I can only presume that when it finds a Web document
that points to a different "url": in this way, it (sometimes?)
uses the latter in preference to the original URI.

I've been thinking of adopting JSON-LD for my own Web pages
(http://am-1.org/~ivan/ , http://users.am-1.org/~ivan/ , etc.),
but so far have only used (arguably better readable)
http://microformats.org/wiki/microformats2 (that I hope search
engines will at some point add support for.)  Consider, e. g.:

http://pin13.net/mf2/?url=http://am-1.org/~ivan/qinp-2024/112.l-system.en.xhtml

Note that ?url= above needs the exact same %-treatment as does
Google's /url?q=.  Naturally, the HTML form at http://pin13.net/mf2/
will do it for you.  (Or, rather: instruct your Web user agent
to do so.)

Date Sujet#  Auteur
25 Nov 24 * terminal only for two weeks85Retrograde
25 Nov 24 +- Re: terminal only for two weeks1D
25 Nov 24 +* Re: terminal only for two weeks13Lawrence D'Oliveiro
26 Nov 24 i+* Re: terminal only for two weeks2Mike Spencer
26 Nov 24 ii`- Re: terminal only for two weeks1Lawrence D'Oliveiro
26 Nov 24 i+* Re: terminal only for two weeks2yeti
26 Nov 24 ii`- Re: terminal only for two weeks1Lawrence D'Oliveiro
30 Nov 24 i`* Re: terminal only for two weeks8candycanearter07
30 Nov 24 i +* Re: terminal only for two weeks2yeti
1 Dec 24 i i`- Re: terminal only for two weeks1candycanearter07
30 Nov 24 i `* Re: terminal only for two weeks5Lawrence D'Oliveiro
1 Dec 24 i  `* Re: terminal only for two weeks4candycanearter07
2 Dec 24 i   `* Re: terminal only for two weeks3Lawrence D'Oliveiro
2 Dec 24 i    `* Re: terminal only for two weeks2candycanearter07
2 Dec 24 i     `- Re: terminal only for two weeks1Lawrence D'Oliveiro
26 Nov 24 +* Re: terminal only for two weeks51John McCue
26 Nov 24 i`* Re: terminal only for two weeks50D
26 Nov 24 i +* Re: terminal only for two weeks48yeti
26 Nov 24 i i`* Re: terminal only for two weeks47D
26 Nov 24 i i +* Re: terminal only for two weeks10Computer Nerd Kev
27 Nov 24 i i i`* Re: terminal only for two weeks9D
27 Nov 24 i i i `* Re: terminal only for two weeks8Computer Nerd Kev
28 Nov 24 i i i  +- Re: terminal only for two weeks1yeti
28 Nov 24 i i i  `* Re: terminal only for two weeks6D
28 Nov 24 i i i   `* Re: terminal only for two weeks5Computer Nerd Kev
28 Nov 24 i i i    +- Re: terminal only for two weeks1D
29 Nov 24 i i i    `* Re: terminal only for two weeks3yeti
29 Nov 24 i i i     `* Re: terminal only for two weeks2D
29 Nov 24 i i i      `- Re: terminal only for two weeks1D
26 Nov 24 i i `* Re: terminal only for two weeks36Mike Spencer
27 Nov 24 i i  +* Re: terminal only for two weeks7D
28 Nov 24 i i  i`* Re: terminal only for two weeks6Mike Spencer
28 Nov 24 i i  i +- Re: terminal only for two weeks1Lawrence D'Oliveiro
28 Nov 24 i i  i +- Re: terminal only for two weeks1D
20 Dec 24 i i  i `* URIs within URIs: google.com/url?q= et al.3Ivan Shmakov
20 Dec 24 i i  i  +- Re: URIs within URIs: google.com/url?q= et al.1Andy Burns
22 Dec 24 i i  i  `- Re: URIs within URIs: google.com/url?q= et al.1Mike Spencer
4 Dec 24 i i  `* Re: terminal only for two weeks28Oregonian Haruspex
4 Dec 24 i i   `* Re: terminal only for two weeks27Lawrence D'Oliveiro
4 Dec 24 i i    `* Re: terminal only for two weeks26candycanearter07
5 Dec 24 i i     +* Re: terminal only for two weeks23Lawrence D'Oliveiro
7 Dec 24 i i     i`* Re: terminal only for two weeks22Computer Nerd Kev
8 Dec 24 i i     i `* Re: terminal only for two weeks21root
13 Jan 25 i i     i  +* Re: terminal only for two weeks19Bozo User
13 Jan 25 i i     i  i+* Re: terminal only for two weeks3D
13 Jan 25 i i     i  ii`* Re: terminal only for two weeks2Computer Nerd Kev
14 Jan 25 i i     i  ii `- Re: terminal only for two weeks1D
16 Jan 25 i i     i  i`* web15Ivan Shmakov
16 Jan 25 i i     i  i `* Re: web14Computer Nerd Kev
17 Jan 25 i i     i  i  +* Re: web2yeti
22 Mar 25 i i     i  i  i`- Re: web1anthk
18 Jan 25 i i     i  i  `* Re: web11Ivan Shmakov
19 Jan 25 i i     i  i   +* Re: web3Computer Nerd Kev
29 Jan 25 i i     i  i   i`* Re: web2candycanearter07
4 Feb 25 i i     i  i   i `- Re: web1Lawrence D'Oliveiro
19 Jan 25 i i     i  i   `* Re: web7Ben Collver
19 Jan 25 i i     i  i    `* Re: web6yeti
19 Jan 25 i i     i  i     +- Re: web1Sn!pe
19 Jan 25 i i     i  i     +- Re: web1Ivan Shmakov
20 Jan 25 i i     i  i     +* Re: web2Ben Collver
24 Jan 25 i i     i  i     i`- Re: web1Ivan Shmakov
20 Jan 25 i i     i  i     `- Re: web1news
13 Jan 25 i i     i  `- Re: terminal only for two weeks1Bozo User
5 Dec 24 i i     `* Re: terminal only for two weeks2yeti
16 Jan 25 i i      `- Re: terminal only for two weeks1yeti
22 Mar 25 i `- Re: terminal only for two weeks1anthk
28 Nov 24 +- Re: terminal only for two weeks1Anssi Saari
13 Jan 25 `* Re: terminal only for two weeks18Bozo User
13 Jan 25  `* Re: terminal only for two weeks17Salvador Mirzo
13 Jan 25   `* Re: terminal only for two weeks16D
13 Jan 25    `* Re: terminal only for two weeks15Salvador Mirzo
14 Jan 25     +- Re: terminal only for two weeks1D
16 Jan 25     `* Re: terminal only for two weeks13Salvador Mirzo
16 Jan 25      +- Re: terminal only for two weeks1Rich
16 Jan 25      +- Re: terminal only for two weeks1Computer Nerd Kev
21 Jan 25      +- Re: terminal only for two weeks1Lawrence D'Oliveiro
23 Jan 25      `* Re: terminal only for two weeks9Ivan Shmakov
12 Feb 25       `* Re: terminal only for two weeks8Salvador Mirzo
16 Feb 25        `* Re: terminal only for two weeks7Jerry Peters
17 Feb 25         `* Re: terminal only for two weeks6Salvador Mirzo
17 Feb 25          +- Re: terminal only for two weeks1Salvador Mirzo
17 Feb 25          +* Re: terminal only for two weeks2Lawrence D'Oliveiro
19 Feb 25          i`- Re: terminal only for two weeks1Salvador Mirzo
17 Feb 25          `* Re: terminal only for two weeks2Scott Dorsey
19 Feb 25           `- Re: terminal only for two weeks1Salvador Mirzo

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal