Sujet : Re: Newsgroups files
De : sysop (at) *nospam* endofthelinebbs.com (Nigel Reed)
Groupes : news.admin.peeringDate : 04. Mar 2025, 00:33:59
Autres entêtes
Organisation : End Of The Line BBS
Message-ID : <20250303173359.6c3af31a@wibble.sysadmininc.com>
References : 1 2
User-Agent : Claws Mail 4.3.1git13 (GTK 3.24.41; x86_64-pc-linux-gnu)
On Mon, 3 Mar 2025 22:55:15 +0100
Julien ÉLIE <
iulius@nom-de-mon-site.com.invalid> wrote:
Hi Nigel,
One sample group from 16 peers. the first thing, so many different
encodings. I've got ASCII, UTF-8, ISO-8859-1, WINDOWS-1252, even one
identifying as GB18030.
Next, 8 servers agree on one description, 3 on another, 2 more on
yet another, and finally 3 think the group is moderated.
How did things get in such a mixed up state?
Because there originally wasn't any standard for the encoding of
control articles. Most of them did not declare anything (the usual
encoding locally used by the sender was assumed - like gb18030 for
cn.*, koi8-u for ukr.* [my sympathy to them!], big5 for tw.*,
iso-8859-15 for fr.*, cp1252 for most of the others, etc.).
Only "recently" a new version of the standard recommended the use of
UTF-8.
That why you end up seeing mixed and incoherent encodings in existing
news servers. Not all of them run a version which implements the new
interoperable state of art (UTF-8) to parse control articles. And if
the descriptions pre-date the receival of new control articles, not
all the news administrators have manually homogenized the
descriptions to UTF-8. (No blame in my sentence, just a fact.)
What is even worse when trying to automate this, is when the
majority of servers have the wrong description or it's half and
half.
Just use
https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/heads/main/website/data/newsgroups.utf8
:)
That's a good start but I still have 36,519 groups in my active file
that aren't in your list.
-- End Of The Line BBS - Plano, TXtelnet endofthelinebbs.com 23