Liste des Groupes | Revenir à c arch |
Terje Mathisen wrote:
Stephen Fuld wrote:Terje Mathisen wrote:
Stephen Fuld wrote:Scott Lurndal wrote:
Michael S <already5chosen@yahoo.com> writes:On Mon, 3 Jun 2024 08:03:53 -0000 (UTC)encryption. Inst=
Lawrence D'Oliveiro <ldo@nz.invalid> wrote:
On Thu, 30 May 2024 18:31:46 +0000, MitchAlsup1 wrote:
=2030 years ago you could say the same thing about=20
encryption. =20
I don=E2=80=99t think newer CPUs have been optimized
foread,work >> better on current CPUs.=20we see newer encryption algorithms (or ways of using
them) that
I think moderate efficiency on CPU, not too low, but not
high either, is a requirement for (symmetric-key) cipher.
Esp. when the key is 128-bit or shorter.
Most modern CPUs have instruction set support for symmetric
ciphers such as AES, SM2/SM3 as well as message digest/hash
(SHA1, SHA256 et al).
High throughput encryption has been done by hardware
accelerators for decades now (e.g. bbn or ncypher HSM boxes
sitting on a SCSI bus; now such HSM are an integral part of
many SoC).
Queston. For a modern general purpose CPU, if you are
including all the logic to implement encryption instructions,
is it much more to include the control/sequencing logic to do
it and not tie up the rest of the CPU logic to do the
encryption? Furthermore, an "inbuilt" accelerator could
interface directly with the I/O hardware of the CPU (e.g.
PCI), saving the "intermediate" step of writing the encrypted
data to memory.
That logic already exists, in the form of a single thread/core
dedicated to the job.
With 30-100 cores on a single die, it becomes very cheap to
dedicate one of them to babysit such a process, compared to the
cost of making a custom chunk of VLSI to do the same. This is
particularly true because the logic needed in the babysitting
process is mostly straight line, with a very limited number of
hard-to-predict branches.
I.e. h.264 CABAC decoding has three branches per bit decoded, at
least one of them impossible to predict or work around with
clever coding. Here it makes perfect sense to have a chunk of
hw to handle the heavy lifting. Monitoring block
encryption/decryption not so much.
I may be missing something, but while your proposal addresses the
first part of my proposal, I think it doesn't adress the second.
That is, for data coming from/going to some external source, you
are still doing "unnecessary" memory traffic, which takes memory
bandwidth and increases latency.
Usually, when a CPU needs to work on something, it will need to get
the data into $L1 anyway? It is only when the work is simply to be a
pipeline that having a way to bypass the CPU completely really makes
a difference, right?
Right. But my point is that the CPU never really need to "work" on
the encrypted data. It it frequently only sent to, or received from
the network or a storage device, hence the pipelined approach has
advantages.
Les messages affichés proviennent d'usenet.