On Tue, 18 Jun 2024 16:54:04 +0000,
mitchalsup@aol.com (MitchAlsup1)
wrote:
The semantics of instructions in a loop are subtly altered such
that they can be vectorized and to execute multi-lane style.
I've decided that I will not be able to use the one from the original
Concertina, and will need to design a VVM-like instruction for
Concertina II from scratch.
Unlike yours, it won't be...subtle.
The action of the instruction which begins the loop will, I think, be
basically the same as yours. It willl issue successive iterations of
the loop starting in consecutive cycles.
To do so, though, that instruction will contain a number of fields in
which to specify parameters:
(3 bits) An index register, which is initialized to zero at the start
of the loop, and "incremented" (the quote marks are, of course,
because it won't really be the same register on each iteration) for
subsequent iterations.
(3 bits) The power of two which is to serve as the increment.
(8 bits) A register mask, in which a 1 bit corresponds to a register
used for intermediate results within the loop. This will become a
forwarding node rather than a register; all other registers can only
be read, and serve as constant values only. The index register set up
previously does not need to be indicated by this.
(2 bits) This indicates which of the four groups of 8 registers in a
bank of 32 registers the register mask applies to.
(1 bit) This indicates whether we're talking about the integer
registers or the floating-point ones.
In addition, in the long version of the instruction, there's a 16-bit
register mask for the short vector registers.
Because iterations are independent, one can't handle a stride in the
natural efficient manner of adding the stride value to a second
pointer register. This could be a common source of error, so I feel
the need to make some provision for this.
One scheme I am considering would be to include one bit in the
instruction that begins a loop to indicate the loop contains a
preamble. The preambles execute serially, and when they conclude,
everything that follows is issued immediately, to execute in parallel
(but now with a multi-cycle offset) to previous iterations.
Upon reflection, this doesn't waste a huge amount of time, so it is
better to go with it than including fields for stride value and a
second counter register in the loop start instruction.
Since the preambles do execute serially, the "end preamble"
instruction would point to the loop start instruction. Instead of full
memory-reference, though, it would just include a short value that is
a negative program-relative address.
Iterations that execute in parallel, though, don't "branch back"
anywhere, so the loop end instruction has no parameters. At least
something is like your VVM.
So this is how I take your VVM concept, and mess it up by making it
unnecessarily complicated; basically, because I don't want to make an
ISA that requires implementations to be, so to speak, "intelligent".
(i.e. upon the first store into a register in the loop, categorize
that register as a node reference)
John Savard