Re: My 66000 and High word facility

Liste des GroupesRevenir à c arch 
Sujet : Re: My 66000 and High word facility
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 12. Aug 2024, 20:27:22
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v9dnmv$3efnj$1@dont-email.me>
References : 1 2 3 4 5 6 7
User-Agent : Mozilla Thunderbird
On 8/12/2024 12:36 PM, MitchAlsup1 wrote:
On Mon, 12 Aug 2024 6:29:36 +0000, Anton Ertl wrote:
 
Brett <ggtgp@yahoo.com> writes:
Anton Ertl <anton@mips.complang.tuwien.ac.at> wrote:
Brett <ggtgp@yahoo.com> writes:
The lack of CPU’s with 64 registers is what makes for a market, that 4%
that could benefit have no options to pick from.
>
They had:
>
SPARC: Ok, only 32 GPRs available at a time, but more in hardware
through the Window mechanism.
>
AMD29K: IIRC a 128-register stack and 64 additional registers
>
IA-64: 128 GPRs and 128 FPRs with register stack and rotating register
files to make good use of them.
>
All antiques no longer available.
>
SPARC is still available: <https://en.wikipedia.org/wiki/SPARC> says:
>
|Fujitsu will also discontinue their SPARC production [...] end-of-sale
|in 2029, of UNIX servers and a year later for their mainframe.
>
No word of when Oracle will discontinue (or has discontinued) sales,
but both companies introduced their last SPARC CPUs in 2017.
>
In any case, my point still stands: these architectures were
available, and the large number of registers failed to give them a
decisive advantage.  Maybe it even gave them a decisive disadvantage:
AMD29K and IA-64 never had OoO implementations, and SPARC got them
only with the Fujitsu SPARC64 V in 2002 and the Oracle SPARC T4 in
2011, years after Intel, MIPS, HP switched to OoO im 1995/1996 and
Power and Alpha switched in 1998 (POWER3, 21264).
>
Where is your 4% number coming from?
>
The 4% number is poor memory and a guess.
Here is an antique paper on the issue:
>
https://www.eecs.umich.edu/techreports/cse/00/CSE-TR-434-00.pdf
>
Interesting.  I only skimmed the paper, but I read a lot about
inlining and interprocedural register allocation.  SPARCs register
windows and AMD29K's and IA-64's register stacks were intended to be
useful for that, but somehow the other architectures did not suffer a
big-enough disadvantage to make them adopt one of these concepts, and
that's despite register windows/stacks working even for indirect calls
(e.g., method calls in the general case), where interprocedural
register allocation or inlining don't help.
>
It seems to me that with OoO the cycle cost of spilling and refilling
on call boundaries was lowered: the spills can be delayed until the
computation is complete, and the refills can start early because the
stack pointer tends to be available early.
>
And recent OoO CPUs even have zero-cycle store-to-load forwarding, so
even if the called function is short, the spilling and refilling
around it (if any) does not increase the latency of the value that's
spilled and refilled.  But that consideration is only relevant for
Intel APX, ARM A64 and RISC-V went for 32 registers several years
before zero-cycle store-to-load-forwarding was implemented.
>
One other optimization that they use the additional registers for is
"register promotion", i.e., putting values from memory into registers
for a while (if absence of aliasing can be proven).  One interesting
aspect here is that register promotion with 64 or 256 registers (RP-64
and RP-256) is usually not much better (if better at all) than
register promotion with 32 registers (RP-32); see Figure 1.  So
register promotion does not make a strong case for more registers,
either, at least in this paper.
 With full access to constants, there is even less need to promote
addresses or immediates into registers as you can simply poof them
up anything you want one.
There are tradeoffs still, if constants need space to encode...
Inline is still better than a memory load, granted.
May make sense to consolidate multiple uses of a value into a register rather than try encoding them as an immediate each time.
...
For example, when I was working on adding the code to display HDR pixels to the screen (need conversion to RGB555).
First attempt:
TKGDI_CopyPixelSpan_GetRGB24H:
MOVU.L (R4), R6
PSHUF.W R5, 0x00, R20 //word shuffle
PSHUF.W R5, 0x55, R21
PLDCM8UH R6, R16 //FP8U to Binary16
PMUL.H R16, R20, R16 //Scale
PADD.H R16, R21, R16 //Bias
MOV 0x3C003C003C003C00, R17 // 4x 1.0
PADD.H R16, R17, R18 // Map to 1.0 .. 1.999
TSTQ 0x0000C000C000C000, R18
BF .L1
.L0:
MOV 0xFFFF000000000000, R7 //alpha ones
PCVTH2UW R18, R19 //Convert to packed word
OR R19, R7, R5 //Set alpha all ones
RGB5PCK64 R5, R2 //convert to RGB555
RTS
.L1:
TSTQ 0x00000000C000, R18
AND?F 0xFFFFFFFF0000, R18
OR?F 0x000000003BFF, R18
TSTQ 0x0000C0000000, R18
AND?F 0xFFFF0000FFFF, R18
OR?F 0x00003BFF0000, R18
TSTQ 0xC00000000000, R18
AND?F 0x0000FFFFFFFF, R18
OR?F 0x3BFF00000000, R18
BRA .L0
Which was valid ASM in my case, but the constants are still bulky.
Unsigned SIMD convert works over the 1.0 to 1.999 range or so. The RGB555 converter needs alpha set so that it knows pixel is opaque (otherwise, it may try to use the alpha encoding and reduce color fidelity).
For now, it assumes opaque images for screen and window framebuffers.
Then noted that I already had a few instructions for the purpose of range clamping, so it became:
TKGDI_CopyPixelSpan_GetRGB24H:
MOVU.L (R4), R6
PSHUF.W R5, 0x00, R20
PSHUF.W R5, 0x55, R21
PLDCM8UH R6, R16
MOV 0x3C003C003C003C00, R17
PMUL.H R16, R20, R16
MOV 0x3FFF3FFF3FFF3FFF, R22
PADD.H R16, R21, R16
PADD.H R16, R17, R18

PCMPGT.H R22, R18
PCSELT.W R22, R18, R18
PCMPGT.H R18, R17
PCSELT.W R17, R18, R18
MOV 0xFFFF000000000000, R7
PCVTH2UW R18, R19
OR R19, R7, R5
RGB5PCK64 R5, R2
RTS
Worked a little better...
But, still, not very fast.
Then ended up doing a version that converted 4 pixels in parallel and had a special case for no scaling or biasing the HDR values, and added alternate entry points for 32-bit and 24 bit pixels.
No longer as concise, but around 3x faster.
Not much bundling as a lot of these are "Lane 1 only" ops, or cases where bundling would not have any benefit.
<===
TKGDI_CopyPixelSpan_GetRGB32x4H:
TST R5, R5
BT TKGDI_CopyPixelSpan_GetRGB32x4H_NoScale
MOVU.L (R4,  0), R16
MOVU.L (R4,  4), R17
MOVU.L (R4,  8), R18
MOVU.L (R4, 12), R19
TKGDI_CopyPixelSpan_GetRGB32x4H_P1:
PLDCM8UH R16, R16
PLDCM8UH R17, R17
PLDCM8UH R18, R18
PLDCM8UH R19, R19
PSHUF.W R5, 0x00, R21
PSHUF.W R5, 0x55, R23
MOV 0x3C003C003C003C00, R20
MOV 0x3FFF3FFF3FFF3FFF, R22
PMUL.H R16, R21, R16
PMUL.H R17, R21, R17
PMUL.H R18, R21, R18
PMUL.H R19, R21, R19
PADD.H R16, R23, R16
PADD.H R17, R23, R17
PADD.H R18, R23, R18
PADD.H R19, R23, R19
PADD.H R16, R20, R16
PADD.H R17, R20, R17
PADD.H R18, R20, R18
PADD.H R19, R20, R19

PCMPGT.H R22, R16
PCSELT.W R22, R16, R16
PCMPGT.H R22, R17
PCSELT.W R22, R17, R17
PCMPGT.H R22, R18
PCSELT.W R22, R18, R18
PCMPGT.H R22, R19
PCSELT.W R22, R19, R19
PCMPGT.H R16, R20
PCSELT.W R20, R16, R16
PCMPGT.H R17, R20
PCSELT.W R20, R17, R17
PCMPGT.H R18, R20
PCSELT.W R20, R18, R18
PCMPGT.H R19, R20
PCSELT.W R20, R19, R19
MOV 0xFFFF000000000000, R3
PCVTH2UW R16, R4
PCVTH2UW R17, R5
OR R4, R3, R4 | PCVTH2UW R18, R6
OR R5, R3, R5 | PCVTH2UW R19, R7
OR R6, R3, R6 | RGB5PCK64 R4, R4
OR R7, R3, R7 | RGB5PCK64 R5, R5
RGB5PCK64 R6, R6
RGB5PCK64 R7, R7
MOVLLW R5, R4, R4
MOVLLW R7, R6, R6
MOVLLD R6, R4, R2
RTS
.balign 4
TKGDI_CopyPixelSpan_GetRGB32x4H_NoScale:
MOVU.L (R4,  0), R16
MOVU.L (R4,  4), R17
MOVU.L (R4,  8), R18
MOVU.L (R4, 12), R19
TKGDI_CopyPixelSpan_GetRGB32x4H_P1NS:
MOV 0x3C003C003C003C00, R20
MOV 0x3FFF3FFF3FFF3FFF, R22
MOV 0xFFFF000000000000, R3
PLDCM8UH R16, R16
PLDCM8UH R17, R17
PLDCM8UH R18, R18
PLDCM8UH R19, R19
PADD.H R16, R20, R16
PADD.H R17, R20, R17
PADD.H R18, R20, R18
PADD.H R19, R20, R19

PCMPGT.H R22, R16
PCSELT.W R22, R16, R16
PCMPGT.H R22, R17
PCSELT.W R22, R17, R17
PCMPGT.H R22, R18
PCSELT.W R22, R18, R18
PCMPGT.H R22, R19
PCSELT.W R22, R19, R19
PCVTH2UW R16, R4
PCVTH2UW R17, R5
OR R4, R3, R4 | PCVTH2UW R18, R6
OR R5, R3, R5 | PCVTH2UW R19, R7
OR R6, R3, R6 | RGB5PCK64 R4, R4
OR R7, R3, R7 | RGB5PCK64 R5, R5
RGB5PCK64 R6, R6
RGB5PCK64 R7, R7
MOVLLW R5, R4, R4
MOVLLW R7, R6, R6
MOVLLD R6, R4, R2
RTS
TKGDI_CopyPixelSpan_GetRGB24x4H:
ADD R4, 3, R21 //(*1)
ADD R4, 6, R22
ADD R4, 9, R23
MOVU.L (R4 ), R16
MOVU.L (R21), R17
MOVU.L (R22), R18
MOVU.L (R23), R19
TST R5, R5
BT TKGDI_CopyPixelSpan_GetRGB32x4H_P1NS
BRA TKGDI_CopyPixelSpan_GetRGB32x4H_P1
// *1: Needed because BJX2 lacks misaligned Load/Store displacements.
===>
This case makes it tempting to redefine "PCVTH2UW" to perform its own range clamping (at present, out of range values will wrap).

>
- anton

Date Sujet#  Auteur
10 Aug 24 * My 66000 and High word facility94Brett
10 Aug 24 +* Re: My 66000 and High word facility92MitchAlsup1
11 Aug 24 i`* Re: My 66000 and High word facility91Brett
11 Aug 24 i +- Re: My 66000 and High word facility1Thomas Koenig
11 Aug 24 i +* Re: My 66000 and High word facility61Anton Ertl
11 Aug 24 i i+* Re: My 66000 and High word facility20Brett
12 Aug 24 i ii`* Re: My 66000 and High word facility19Anton Ertl
12 Aug 24 i ii +* Re: My 66000 and High word facility17MitchAlsup1
12 Aug 24 i ii i`* Re: My 66000 and High word facility16BGB
12 Aug 24 i ii i `* Re: My 66000 and High word facility15MitchAlsup1
12 Aug 24 i ii i  `* Re: My 66000 and High word facility14BGB
13 Aug 24 i ii i   `* Re: My 66000 and High word facility13MitchAlsup1
13 Aug 24 i ii i    `* Re: My 66000 and High word facility12BGB
13 Aug 24 i ii i     `* Re: My 66000 and High word facility11MitchAlsup1
13 Aug 24 i ii i      `* Re: My 66000 and High word facility10BGB
13 Aug 24 i ii i       `* Re: My 66000 and High word facility9MitchAlsup1
13 Aug 24 i ii i        +* Re: My 66000 and High word facility5Thomas Koenig
13 Aug 24 i ii i        i+* Re: My 66000 and High word facility3MitchAlsup1
14 Aug 24 i ii i        ii`* Re: My 66000 and High word facility2Thomas Koenig
14 Aug 24 i ii i        ii `- Re: My 66000 and High word facility1MitchAlsup1
14 Aug 24 i ii i        i`- Re: My 66000 and High word facility1BGB
14 Aug 24 i ii i        `* Re: My 66000 and High word facility3BGB
15 Aug 24 i ii i         `* Re: My 66000 and High word facility2MitchAlsup1
15 Aug 24 i ii i          `- Re: My 66000 and High word facility1BGB
15 Aug 24 i ii `- Re: My 66000 and High word facility1MitchAlsup1
11 Aug 24 i i+- Re: My 66000 and High word facility1Niklas Holsti
11 Aug 24 i i+* Re: My 66000 and High word facility31BGB
12 Aug 24 i ii`* Re: My 66000 and High word facility30Brett
12 Aug 24 i ii +* Re: My 66000 and High word facility2Terje Mathisen
16 Oct 24 i ii i`- Re: My 66000 and High word facility1Paul A. Clayton
15 Aug 24 i ii +* Re: My 66000 and High word facility25MitchAlsup1
15 Aug 24 i ii i`* Re: My 66000 and High word facility24Brett
15 Aug 24 i ii i `* Re: My 66000 and High word facility23Brett
15 Aug 24 i ii i  `* Re: My 66000 and High word facility22Stephen Fuld
16 Aug 24 i ii i   `* Re: My 66000 and High word facility21Brett
16 Aug 24 i ii i    +- Re: My 66000 and High word facility1Brett
16 Aug 24 i ii i    `* Re: My 66000 and High word facility19MitchAlsup1
17 Aug 24 i ii i     `* Re: My 66000 and High word facility18Brett
17 Aug 24 i ii i      +* Re: My 66000 and High word facility8Thomas Koenig
17 Aug 24 i ii i      i`* Re: My 66000 and High word facility7Brett
18 Aug 24 i ii i      i +* Re: My 66000 and High word facility5Thomas Koenig
18 Aug 24 i ii i      i i`* Re: My 66000 and High word facility4MitchAlsup1
18 Aug 24 i ii i      i i +- Re: My 66000 and High word facility1Brett
18 Aug 24 i ii i      i i `* Re: My 66000 and High word facility2Thomas Koenig
19 Aug 24 i ii i      i i  `- Re: My 66000 and High word facility1BGB
19 Aug 24 i ii i      i `- Re: My 66000 and High word facility1BGB
17 Aug 24 i ii i      `* Re: My 66000 and High word facility9MitchAlsup1
17 Aug 24 i ii i       `* Re: My 66000 and High word facility8Brett
18 Aug 24 i ii i        +* Re: My 66000 and High word facility2MitchAlsup1
18 Aug 24 i ii i        i`- Re: My 66000 and High word facility1Brett
19 Aug 24 i ii i        `* Re: My 66000 and High word facility5Stefan Monnier
19 Aug 24 i ii i         +- Re: My 66000 and High word facility1BGB
19 Aug 24 i ii i         `* Re: My 66000 and High word facility3MitchAlsup1
19 Aug 24 i ii i          +- Re: My 66000 and High word facility1Thomas Koenig
20 Aug 24 i ii i          `- Re: My 66000 and High word facility1Michael S
20 Aug 24 i ii `* Re: My 66000 and High word facility2Stefan Monnier
21 Aug 24 i ii  `- Re: My 66000 and High word facility1BGB
15 Aug 24 i i`* Re: My 66000 and High word facility8MitchAlsup1
15 Aug 24 i i +* Re: My 66000 and High word facility3Anton Ertl
15 Aug 24 i i i`* Re: My 66000 and High word facility2Michael S
15 Aug 24 i i i `- Re: My 66000 and High word facility1MitchAlsup1
15 Aug 24 i i `* Re: My 66000 and High word facility4Michael S
15 Aug 24 i i  `* Re: My 66000 and High word facility3Stephen Fuld
15 Aug 24 i i   `* Re: My 66000 and High word facility2Michael S
15 Aug 24 i i    `- Re: My 66000 and High word facility1MitchAlsup1
19 Aug 24 i `* Re: My 66000 and High word facility28MitchAlsup1
19 Aug 24 i  `* Re: My 66000 and High word facility27Brett
19 Aug 24 i   `* Re: My 66000 and High word facility26MitchAlsup1
20 Aug 24 i    +* Re: My 66000 and High word facility3Brett
20 Aug 24 i    i`* Re: My 66000 and High word facility2MitchAlsup1
20 Aug 24 i    i `- Re: My 66000 and High word facility1Brett
20 Aug 24 i    `* number of registers (was: My 66000 and High word facility)22Anton Ertl
20 Aug 24 i     `* Re: number of registers21MitchAlsup1
20 Aug 24 i      +* Re: number of registers13Michael S
20 Aug 24 i      i`* Re: number of registers12MitchAlsup1
21 Aug 24 i      i +* Re: number of registers6Brett
21 Aug 24 i      i i+* Re: number of registers4MitchAlsup1
21 Aug 24 i      i ii+* Re: number of registers2Brett
23 Aug 24 i      i iii`- Re: number of registers1Brett
22 Aug 24 i      i ii`- Re: number of registers1Stephen Fuld
21 Aug 24 i      i i`- Re: number of registers1Anton Ertl
21 Aug 24 i      i `* Re: number of registers5Anton Ertl
21 Aug 24 i      i  +* Re: number of registers3Stephen Fuld
21 Aug 24 i      i  i`* Re: number of registers2Anton Ertl
21 Aug 24 i      i  i `- Re: number of registers1Stephen Fuld
21 Aug 24 i      i  `- Re: number of registers1Anton Ertl
20 Aug 24 i      `* Re: number of registers7MitchAlsup1
21 Aug 24 i       `* Re: number of registers6Anton Ertl
21 Aug 24 i        +* Re: number of registers3Michael S
21 Aug 24 i        i`* Re: number of registers2Anton Ertl
21 Aug 24 i        i `- Re: number of registers1Michael S
21 Aug 24 i        `* Re: number of registers2MitchAlsup1
21 Aug 24 i         `- Re: number of registers1Michael S
10 Aug 24 `- Re: My 66000 and High word facility1MitchAlsup1

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal