Liste des Groupes | Revenir à c arch |
BGB wrote:Yeah.
On 4/9/2024 7:28 PM, MitchAlsup1 wrote:BGB-Alt wrote:
>Also the blob of constants needed to be within 512 bytes of the load instruction, which was also kind of an evil mess for branch handling (and extra bad if one needed to spill the constants in the middle of a basic block and then branch over it).In My 66000 case, the constant is the word following the instruction.
Easy to find, easy to access, no register pollution, no DCache pollution.
I didn't design SuperH, Hitachi did...Usually they were spilled between basic-blocks, with the basic-block needing to branch to the following basic-block in these cases.Also 8-bit branch displacements are kinda lame, ...Why do that to yourself ??
Yeah.And, if one wanted a 16-bit branch:
MOV.W (PC, 4), R0 //load a 16-bit branch displacement
BRA/F R0
.L0:
NOP // delay slot
.WORD $(Label - .L0)Also kinda bad...Can you say Yech !!
But, at what cost...Things like memcpy/memmove/memset/etc, are function calls in cases when not directly transformed into register load/store sequences.>
My 66000 does not convert them into LD-ST sequences, MM is a single inst-
ruction.
>I have no high-level memory move/copy/set instructions.You have the power to fix it.........
Only loads/stores...
This is an area where "slides" work well, the main cost is mostly the bulk that the slide adds to the binary (albeit, it is one-off).For small copies, can encode them inline, but past a certain size this becomes too bulky.A copy loop makes more sense for bigger copies, but has a high overhead for small to medium copy.So, there is a size range where doing it inline would be too bulky, but a loop caries an undesirable level of overhead.All the more reason to put it (a highly useful unit of work) into an
instruction.
Yeah, but it makes the CPU logic more expensive.Ended up doing these with "slides", which end up eating roughly several kB of code space, but was more compact than using larger inline copies.Say (IIRC):Versus::
128 bytes or less: Inline Ld/St sequence
129 bytes to 512B: Slide
Over 512B: Call "memcpy()" or similar.
1-infinity: use MM instruction.
Within a thread, it is fine.The slide generally has entry points in multiples of 32 bytes, and operates in reverse order. So, if not a multiple of 32 bytes, the last bytes need to be handled externally prior to branching into the slide.Does this remain sequentially consistent ??
More or less, though I think the idea of Duff's device is specifically in the way that it abuses the while-loop and switch constructs.Though, this is only used for fixed-size copies (or "memcpy()" when value is constant).Say:__memcpy64_512_ua:
MOV.Q (R5, 480), R20
MOV.Q (R5, 488), R21
MOV.Q (R5, 496), R22
MOV.Q (R5, 504), R23
MOV.Q R20, (R4, 480)
MOV.Q R21, (R4, 488)
MOV.Q R22, (R4, 496)
MOV.Q R23, (R4, 504)__memcpy64_480_ua:
MOV.Q (R5, 448), R20
MOV.Q (R5, 456), R21
MOV.Q (R5, 464), R22
MOV.Q (R5, 472), R23
MOV.Q R20, (R4, 448)
MOV.Q R21, (R4, 456)
MOV.Q R22, (R4, 464)
MOV.Q R23, (R4, 472)....__memcpy64_32_ua:Duff's device in any other name.
MOV.Q (R5), R20
MOV.Q (R5, 8), R21
MOV.Q (R5, 16), R22
MOV.Q (R5, 24), R23
MOV.Q R20, (R4)
MOV.Q R21, (R4, 8)
MOV.Q R22, (R4, 16)
MOV.Q R23, (R4, 24)
RTS
Les messages affichés proviennent d'usenet.