Liste des Groupes | Revenir à c arch |
On 2024-09-08 2:06 p.m., MitchAlsup1 wrote:I haven't really understood how it could be implemented.On Sun, 8 Sep 2024 3:22:55 +0000, Robert Finch wrote:Still trying to grasp the virtual vector method. Been wondering if it can be implemented using renamed registers.
>On 2024-09-07 10:41 a.m., MitchAlsup1 wrote:>On Sat, 7 Sep 2024 2:27:40 +0000, Robert Finch wrote:>
>Making the scalar register file a subset of the vector register file.>
And renaming only vector elements.
>
There are eight elements in a vector register and each element is
128-bits wide. (Corresponding to the size of a GPR). Vector register
file elements are subject to register renaming to allow the full power
of the OoO machine to be used to process vectors. The issue is that with
both the vector and scalar registers present for renaming there are a
lot of registers to rename. It is desirable to keep the number of
renamed registers (including vector elements) <= 256 total. So, the 64
scalar registers are aliased with the first eight vector registers.
Leaving only 24 truly available vector registers. Hm. There are 1024
physical registers, so maybe going up to about 300 renamable register
would not hurt.
Why do you think a vector register file is the way to go ??
I think vector use is somewhat dubious, but they have some uses. In many
cases data can be processed just fine without vector registers. In the
current project vector instructions use the scalar functional units to
compute, making them no faster than scalar calcs. But vectors have a lot
of code density where parallel computation on multiple data items using
a single instruction is desirable. I do not know why people use vector
registers in general, but they are present in some modern architectures.
There is no doubt that much code can utilize vector arrangements, and
that a processor should be very efficient in performing these work
loads.
>
The problem I see is that CRAY-like vectors vectorize instructions
instead of vectorizing loops. Any kind of flow control within the
loop becomes tedious at best.
>
On the other hand, the Virtual Vector Method vectorizes loops and
can be implemented such that it performs as well as CRAY-like
vector machines without the overhead of a vector register file.
In actuality there are only 6-bits of HW flip-flops governing
VVM--compared to 4 KBytes for CRAY-1.
>Qupls vector registers are 512 bits wide (8 64-bit elements). Bigfoot’s>
vector registers are 1024 bits wide (8 128-bit elements).
When properly abstracted, one can dedicate as many or few HW
flip-flops as staging buffers for vector work loads to suit
the implementation at hand. A GBOoO may utilize that 4KB
file of CRAY-1 while the little low power core 3-cache lines.
Both run the same ASM code and both are efficient in their own
sense of "efficient".
>
So, instead of having ~500 vector instructions and ~1000 SIMD
instructions one has 2 instructions and a medium scale state
machine.
>
Qupls has RISC-V style vector / SIMD registers. For Q+ every instruction can be a vector instruction, as there are bits indicating which registers are vector registers in the instruction. All the scalar instructions become vector. This cuts down on some of the bloat in the ISA. There is only a handful of vector specific instructions (about eight I think). The drawback is that the ISA is 48-bits wide. However, the code bloat is less than 50% as some instructions have dual- operations. Branches can increment or decrement and loop. Bigfoot uses a postfix word to indicate to use the vector form of the instruction. Bigfoot’s code density is a lot better being variable length, but I suspect it will not run as fast. Bigfoot and Q+ share a lot of the same code. Trying to make the guts of the cores generic.In my case, the core ended up generic enough that it can support both BJX2 and RISC-V. Could almost make sense to lean more heavily into this (trying to consolidate more things and better optimize costs).
Les messages affichés proviennent d'usenet.