Sujet : Re: Post DB
De : noreply (at) *nospam* mixmin.net (D)
Groupes : news.software.readersDate : 10. Sep 2024, 16:35:22
Autres entêtes
Organisation : dizum.com - The Internet Problem Provider
Message-ID : <20240910.163522.c0b2fc24@mixmin.net>
References : 1
On 9 Sep 2024 10:44:36 GMT,
ram@zedat.fu-berlin.de (Stefan Ram) wrote:
I'm not 100% sure I'm barking up the right tree (newsgroup) here,
but whatever.
So, yesterday I was chewing the fat about how to whip up a database
for posts retrieved from newsservers.
I'm picturing some program that pulls newsgroups from newsservers
and dumps them into a database.
In my mind's eye, a post looks something like this, give or take:
Path: A
Message-ID: B
Body: C
. But if you snag the same post from a different server, it might
look like this:
Message-ID: B
Path: D
Body: C
. At first blush, you'd end up with the same body stored multiple
times in the database. Talk about a waste of space!
To trim the fat, we could rejigger these posts so all the variable
stuff is up front:
Path: A
Message-ID: B
Body: C
and
Path: D
Message-ID: B
Body: C
Now the tail end of both posts is identical, so we can toss that
in a separate table at position 0.
The posts themselves would then just contain the different parts
and a pointer to the shared bit that's only stored once:
Path: A
Rest: 0
Path: D
Rest: 0
0:
Message-ID: B
Body: C
. This way, you could store the same post from multiple newsservers
without eating up your hard drive space like it's In-N-Out fries.
twelve server samples of your article headers show remarkable consistency:
1 path, 2 from, 3 newsgroups, 4 subject, 5 date, 6 organization, 7 lines,
8 expires, 9 message-id, 10 mime-version, 11 content-type, 12 content-
transfer-encoding, 13 x-trace, 14 cancel-lock, 15 x-copyright, 16 x-no-
archive, 17 archive, 18 x-no-archive-readme, 19 x-no-html, 20 content-
language, 21 xref (first sample full headers, then snipped for brevity):
news:news.alphared.netPath: alphared!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: news.software.readers
Subject: Post DB
Date: 9 Sep 2024 10:44:36 GMT
Organization: Stefan Ram
Lines: 63
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de b02EqmO53gQ7jbmmMP85UgkDHjtKodMUvyU6kuS12ifm6t
Cancel-Lock: sha1:BiPE/2gBrIau46RUtTtIhXqrOSQ= sha256:ciJQo1bvZST9PNWeu73aWJv3mxLLHhWyjI7ehRRUSH4=
X-Copyright: (C) Copyright 2024 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
Xref: alphared news.software.readers:11775
>
I'm not 100% sure I'm barking up the right tree (newsgroup) here,
news:news.alt119.netPath: news.alt119.net!peer.alt119.net!news.samoylyk.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: news.software.readers
Subject: Post DB
Date: 9 Sep 2024 10:44:36 GMT
Organization: Stefan Ram
Lines: 63
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
I'm not 100% sure I'm barking up the right tree (newsgroup) here,
news:news.blueworldhosting.comPath: nnrp.usenet.blueworldhosting.com!!spool1.usenet.blueworldhosting.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: news.software.readers
Subject: Post DB
Date: 9 Sep 2024 10:44:36 GMT
Organization: Stefan Ram
Lines: 63
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
I'm not 100% sure I'm barking up the right tree (newsgroup) here,
news:news.dizum.netPath: sewer!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: news.software.readers
Subject: Post DB
Date: 9 Sep 2024 10:44:36 GMT
Organization: Stefan Ram
Lines: 63
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
I'm not 100% sure I'm barking up the right tree (newsgroup) here,
news:freenews.netfront.netPath: news.netfront.net!border-2.nntp.ord.giganews.com!nntp.giganews.com!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: news.software.readers
Subject: Post DB
Date: 9 Sep 2024 10:44:36 GMT
Organization: Stefan Ram
Lines: 63
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
I'm not 100% sure I'm barking up the right tree (newsgroup) here,
news:news.i2pn2.orgPath: i2pn2.org!rocksolid2!news.neodome.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: news.software.readers
Subject: Post DB
Date: 9 Sep 2024 10:44:36 GMT
Organization: Stefan Ram
Lines: 63
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
I'm not 100% sure I'm barking up the right tree (newsgroup) here,
news:news.neodome.netPath: news.neodome.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: news.software.readers
Subject: Post DB
Date: 9 Sep 2024 10:44:36 GMT
Organization: Stefan Ram
Lines: 63
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
I'm not 100% sure I'm barking up the right tree (newsgroup) here,
news:news.mixmin.netPath: news.mixmin.net!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: news.software.readers
Subject: Post DB
Date: 9 Sep 2024 10:44:36 GMT
Organization: Stefan Ram
Lines: 63
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
I'm not 100% sure I'm barking up the right tree (newsgroup) here,
news:news.novabbs.orgPath: rocksolid2!news.neodome.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: news.software.readers
Subject: Post DB
Date: 9 Sep 2024 10:44:36 GMT
Organization: Stefan Ram
Lines: 63
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
I'm not 100% sure I'm barking up the right tree (newsgroup) here,
news:paganini.bofh.teamPath: paganini.bofh.team!newsfeed.bofh.team!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: news.software.readers
Subject: Post DB
Date: 9 Sep 2024 10:44:36 GMT
Organization: Stefan Ram
Lines: 63
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
I'm not 100% sure I'm barking up the right tree (newsgroup) here,
news:news.samoylyk.netPath: news.samoylyk.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: news.software.readers
Subject: Post DB
Date: 9 Sep 2024 10:44:36 GMT
Organization: Stefan Ram
Lines: 63
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
I'm not 100% sure I'm barking up the right tree (newsgroup) here,
news:news.usenet.ovhPath: usenet.ovh!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: news.software.readers
Subject: Post DB
Date: 9 Sep 2024 10:44:36 GMT
Organization: Stefan Ram
Lines: 63
Expires: 1 Jul 2025 11:59:58 GMT
Message-ID: <database-20240909114248@ram.dialup.fu-berlin.de>
snip
>
I'm not 100% sure I'm barking up the right tree (newsgroup) here,