Archiving Usenet 2003-2025

Liste des GroupesRevenir à ns misc 
Sujet : Archiving Usenet 2003-2025
De : jsevans (at) *nospam* sdf.org (Jason Evans)
Groupes : news.software.misc
Date : 01. Jun 2025, 14:34:55
Autres entêtes
Message-ID : <slrn103olo7.gbt.jsevans@jbsd1.home.local>
User-Agent : slrn/1.0.3 (OpenBSD)
A few months ago, I posted about my Usenet archiver application. Since then,
I have completely retooled it, rewrote it in Python, and it is now a very
capable tool.

In January, I began a project that I had started many times before but never
finished. That is, archiving Usenet Newsgroups from 2003 until the current
year. To do this, I am using a paid Usenet provider and downloading all
newsgroups in the mbox format and compressing them with gzip. I've been doing
this since January. You might be wondering why I have been doing this since
January and I'm still not done? That's because paid Usenet providers prioritize
binary groups over text groups. I am not archiving binary groups, but when one
slips under my radar, I can easily see that far more of it has been downloaded
compared to other newsgroups in the same amount of time.

Anyway, since January, I have downloaded approximately 2TB of Newsgroups. What
newsgroups have I downloaded? The list so far is on my GitHub linked below. If
there are any well-known groups that are missing, please let me know, and I
will add them to my queue. You might be wondering where do I get my list of
newsgroups. I began with the semi-official list from isc.org.
(https://ftp.isc.org/usenet/CONFIG/newsgroups.gz) I have only omitted the
following: test groups, e.g., misc.test, binary groups, and some alt groups
that deal with pedophilia. Next, I got a list of newsgroups that are
carried by eternal-september, and I started a new queue based on that,
downloading all of the groups that are not in the isc list. There are a lot
of them, and I'm hoping to have them done in the coming weeks. I am
downloading approximately 95 newsgroups at a time in parallel. The limit
from my Usenet provider is 100 downloads at a time.

I'll update again later when I begin uploading them to the Internet Archive.

https://github.com/tgeek77/usenet_archiver/blob/main/fetch_log.txt

Date Sujet#  Auteur
1 Jun 25 * Archiving Usenet 2003-20256Jason Evans
13 Jun 25 `* Re: Archiving Usenet 2003-20255Billy G. (go-while)
14 Jun 25  +- Re: Archiving Usenet 2003-20251Urs Janßen
16 Jun 25  +* Re: Archiving Usenet 2003-20252Colin Macleod
23 Jun 25  i`- Re: Archiving Usenet 2003-20251Colin Macleod
23 Jun 25  `- Re: Archiving Usenet 2003-20251Jason Evans

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal