Re: Archive Any And All Text Usenet

Liste des GroupesRevenir à na peering 
Sujet : Re: Archive Any And All Text Usenet
De : ross.a.finlayson (at) *nospam* gmail.com (Ross Finlayson)
Groupes : news.admin.peering news.software.nntp
Date : 10. Mar 2024, 22:42:54
Autres entêtes
Message-ID : <PpmcnTx1QcTKtHP4nZ2dnZfqnPudnZ2d@giganews.com>
References : 1 2
User-Agent : Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0
On 03/10/2024 12:23 PM, Stefan Ram wrote:
Ross Finlayson <ross.a.finlayson@gmail.com> wrote or quoted:
The idea is that each "message", "post", has an ID,
>
   Special file systems for news storage, such as the
   Cyclical News Filesystem (CNFS), have been developed.
>
   But, as mentioned by immibis, SQL databases can be
   very efficient today when used by someone with an
   education in relational databases.
>
   For example, I have a filesystem here that sometimes
   starts to behave strangely or become slow once there
   are several 10,000 files in a single directory. Or,
   maybe it's just the user interface not the file system.
   But you should make some tests to see whether the fs
   can actually support your requirements.
>
Hey, thanks, it's very practical, and the
idea that a database will make for the
normalization and the maintenance of
indices and implementing its own access
pattern speaks to a really great idea
about in-between, "a file system contents
in a file", like a tape archive or zip file,
with regards to serial access, and random
access, usually with regards to memory-mapping
the file, access patterns, according to organization.
Of course, one might aver that any such organization
this way, of the coordinates of messages, according
to partitions by group and date, and Message-Id,
or for example Content-Id in the world of external
references and Internet Messages, has a sort of
normal form equi-interpretable, what one might
call "the physical interface" and "the logical interface".
The access most usually involves an index, which
according to either a hash-code or a sort,
results binary-tree or phonebook (alphabetical,
lexicographic) lookup. Here the file-system
implements this and the database implements
this, then with regards to usual index files like
"the groups file", "the overview file", and these
kinds of things.  The idea is that groups and dates
naturally partition this.
Here the idea for AAAATU is to have a physical form,
that's very fungible. Files are fungible it's as simple
as that. Databases like sqlite exactly sort of define
how the data the datums have access patterns
according to their coordinates, then that a SQL
interpreter and SQL executor, implementing access
patternry, sure is a great thing.
The great thing here is basically for posterity,
this notion of the "digital preservation",
and for curation of a library of AAAATU,
with a goal to fill in all the coordinates,
and be able to reference and access then
according to the partitions of the group
and date, the Message-Id's posts' messages.
The text-based Internet Protocols have a great
affinity toward each other, NNTP and IMAP and
whatever HTTP is resources and SMTP and POP3,
with regards to great conventions like mbox and maildir,
or for example sqlite files or otherwise, "the store",
of the files, vis-a-vis, the ephemeral, or discardable,
the runtime's access patternry's access.
It certainly makes sense for the runtime, to
both have monolithic maintained store, while,
fungible composable much-much-slower file
accesses. This is where the filesystems have
their limits, and, the runtime has limits of
file handles, with regards to the guarantees
of file system or a notion of "atomic rename",
the consistency, the coherency, of the access-patternry,
the data.
One of the main goals here seems "write-once-read-many",
in a world that's muchly "write-once-read-never".
I.e. the goal's archival vis-a-vis convenience, the ephemeral.
What I'd like to think is that these days, that
multiple terabytes of data, is not an outrageous
fortune, about "on-line, warm-line, and cold-line",
"data" and "data lakes" and "data glaciers", these
kinds of ideas, what represent simply enough the
locality of the data, the levels of the tradeoffs of
time vis-a-vis size, here that I don't necessarily
care about when so much as if, as it were.
Then the effort seems that while each message
declares exactly what groups it's in, then with
regards to best-reckoning what date it was,
then as with regards to: X no-archive, control
messages, and cancel messages, supersedes,
and otherwise the semantics of control or
with regards to site policy, that they key idea
is to establish for any post that existed,
and still exists, that it exists at exactly
one date in any number of groups.
So with this in mind, I surely find it agreeable
that a, "database file format", has underneath
it an idea of, "a filesystem representation",
and about making a usual sort of logical interface
and physical interface, what is a convention,
that has a spec, and is fungible, with bog-standard
tools, the most usual built-ins of file semantics.
Then the idea is that anybody who has regular
hierarchical newgroups, funnels those all together
in an archival the archaeological, making sort
of curated collections, for digital preservation,
sorting out when message uniqueness and integrity
is so or not so, for each, from a world of mbox files
which are figured to be linear in time, or maildir,
or otherwise most usually the date attribute,
then that anybody can indicate the range of
those coordinates groups and dates and
thusly is derived a given edition's, world of
the posts', world of the posts.
Then anybody can just use that as data,
while at the same time, of course each post
is fundamentally the poster's, in the public
space, not the public domain.

Date Sujet#  Auteur
10 Mar 24 * Re: Archive Any And All Text Usenet2Stefan Ram
10 Mar 24 `- Re: Archive Any And All Text Usenet1Ross Finlayson

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal