Liste des Groupes | Revenir à cl forth |
dxf <dxforth@gmail.com> writes:The catch with SSE is there's nothing like FCHS or FABS
so depending on how one implements them, results vary across implementations.
You can see in Gforth how to implement FNEGATE and FABS with SSE2:
see fnegate
Code fnegate
0x000055e6a78a8274: add $0x8,%rbx
0x000055e6a78a8278: xorpd 0x24d8f(%rip),%xmm15 # 0x55e6a78cd010
0x000055e6a78a8281: mov %r15,%r9
0x000055e6a78a8284: mov (%rbx),%rax
0x000055e6a78a8287: jmp *%rax
end-code
ok
0x55e6a78cd010 16 dump
55E6A78CD010: 00 00 00 00 00 00 00 80 - 00 00 00 00 00 00 00 00
ok
see fabs
Code fabs
0x000055e6a78a84fe: add $0x8,%rbx
0x000055e6a78a8502: andpd 0x24b15(%rip),%xmm15 # 0x55e6a78cd020
0x000055e6a78a850b: mov %r15,%r9
0x000055e6a78a850e: mov (%rbx),%rax
0x000055e6a78a8511: jmp *%rax
end-code
ok
0x55e6a78cd020 16 dump
55E6A78CD020: FF FF FF FF FF FF FF 7F - 00 00 00 00 00 00 00 00
The actual implementation is the xorpd instruction for FNEGATE, and in
the andpd instruction for FABS. The memory locations contain masks:
for FNEGATE only the sign bit is set, for FABS everything but the sign
bit is set.
Sure you can implement FNEGATE and FABS in more complicated ways, but
you can also implement them in more complicated ways if you use the
387 instruction set. Here's an example of more complicated
implementations:
see fnegate
FNEGATE
( 004C4010 4833C0 ) XOR RAX, RAX
( 004C4013 F34D0F7EC8 ) MOVQ XMM9, XMM8
( 004C4018 664C0F6EC0 ) MOVQ XMM8, RAX
( 004C401D F2450F5CC1 ) SUBSD XMM8, XMM9
( 004C4022 C3 ) RET/NEXT
( 19 bytes, 5 instructions )
ok
see fabs
FABS
( 004C40B0 E8FBEFFFFF ) CALL 004C30B0 FS@
( 004C40B5 4885DB ) TEST RBX, RBX
( 004C40B8 488B5D00 ) MOV RBX, [RBP]
( 004C40BC 488D6D08 ) LEA RBP, [RBP+08]
( 004C40C0 0F8D05000000 ) JNL/GE 004C40CB
( 004C40C6 E845FFFFFF ) CALL 004C4010 FNEGATE
( 004C40CB C3 ) RET/NEXT
( 28 bytes, 7 instructions )
Les messages affichés proviennent d'usenet.