Sujet : Re: Predictive failures
De : blockedofcourse (at) *nospam* foo.invalid (Don Y)
Groupes : sci.electronics.designDate : 19. Apr 2024, 04:08:17
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <uvsn7d$2n8f9$2@dont-email.me>
References : 1 2 3 4
User-Agent : Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.2.2
On 4/18/2024 6:27 PM, Glen Walpert wrote:
On Thu, 18 Apr 2024 15:05:07 -0700, Don Y wrote:
The same applies to secondary storage media. How will you know if
some-rarely-accessed-file is intact and ready to be referenced WHEN
NEEDED -- if you aren't doing patrol reads/scrubbing to verify that it
is intact, NOW?
>
[One common flaw with RAID implementations and naive reliance on that
technology]
RAID, even with backups, is unsuited to high reliability storage of large
databases. Distributed storage can be of much higher reliability:
https://telnyx.com/resources/what-is-distributed-storage
<https://towardsdatascience.com/introduction-to-distributed-data-
storage-2ee03e02a11d>
This requires successful retrieval of any n of m data files, normally from
different locations, where n can be arbitrarily smaller than m depending
on your needs. Overkill for small databases but required for high
reliability storage of very large databases.
This is effectively how I maintain my archive. Except that the
media are all "offline", requiring a human operator (me) to
fetch the required volumes in order to locate the desired files.
Unlike mirroring (or other RAID technologies), my scheme places
no constraints as to the "containers" holding the data. E.g.,
DISK43 /somewhere/in/filesystem/ fileofinterest
DISK21 >some>other>place anothernameforfile
CDROM77 \yet\another\place archive.type /where/in/archive foo
Can all yield the same "content" (as verified by their prestored signatures).
Knowing the hash of each object means you can verify its contents from a
single instance instead of looking for confirmation via other instance(s)
[Hashes take up considerably less space than a duplicate copy would]
This makes it easy to create multiple instances of particular "content"
without imposing constraints on how it is named, stored, located, etc.
I.e., pull a disk out of a system, catalog its contents, slap an adhesive
label on it (to be human-readable) and add it to your store.
(If I could mount all of the volumes -- because I wouldn't know which volume
might be needed -- then access wouldn't require a human operator, regardless
of where the volumes were actually mounted or the peculiarities of the
systems on which they are mounted! But, you can have a daemon that watches to
see WHICH volumes are presently accessible and have it initiate a patrol
read of their contents while the media are being accessed "for whatever OTHER
reason" -- and track the time/date of last "verification" so you know which
volumes haven't been checked, recently)
The inconvenience of requiring human intervention is offset by the lack of
wear on the media (as well as BTUs to keep it accessible) and the ease of
creating NEW content/copies. NOT useful for data that needs to be accessed
frequently but excellent for "archives"/repositories -- that can be mounted,
accessed and DUPLICATED to online/nearline storage for normal use.