Sujet : Re: Newsgroups files
De : sysop (at) *nospam* endofthelinebbs.com (Nigel Reed)
Groupes : news.admin.peeringDate : 04. Mar 2025, 00:13:34
Autres entêtes
Organisation : End Of The Line BBS
Message-ID : <20250303171334.785ee79e@wibble.sysadmininc.com>
References : 1 2 3 4
User-Agent : Claws Mail 4.3.1git13 (GTK 3.24.41; x86_64-pc-linux-gnu)
On Mon, 3 Mar 2025 22:40:57 +0100
Julien ÉLIE <
iulius@nom-de-mon-site.com.invalid> wrote:
Hi Nigel,
I'm probably just going to get a script to pull the most popular of
the descriptions for the list and ignore the moderated part unless
the group has moderated in its name or a majority think its
moderated when do a manual check on those.
I would suggest to instead just use the latest known descriptions
(from checkgroups when they are sent).
I maintain the list encoded in UTF-8 (the standard according to RFCs)
here:
https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/heads/main/website/data/newsgroups.utf8
Also, FWIW, the same list in pure ASCII:
https://raw.githubusercontent.com/Julien-Elie/usenet-hierarchies/refs/heads/main/website/data/newsgroups.ascii
The usual master file for these descriptions has unfortunately mixed
charsets (like windows-1252 for some descriptions, UTF-8 for others,
ISO-8859-xx variants, etc.):
https://ftp.isc.org/pub/usenet/CONFIG/newsgroups
That's why I generate the above first two lists :)
Feel free to use!
Yes, we've sort of had this discussion before about encoding. This one
is more about the inconsistency of the labeling of the groups.
In the newsgroups list above, pretty much every group that contains
non-standard A-Z letters is garbled.
Probably because it's ISO-8859 when I'm using UTF-8. The cn.* groups
are definitely garbled.
I'll just do my best to make a valid UTF-8 file for my server.
-- End Of The Line BBS - Plano, TXtelnet endofthelinebbs.com 23