On 4/21/2024 6:31 PM, MitchAlsup1 wrote:
BGB wrote:
On 4/21/2024 1:57 PM, MitchAlsup1 wrote:
BGB wrote:
>
One of the things that I notice with My 66000 is when you get all the constants you ever need at the calculation OpCodes, you end up with FEWER instructions that "go random places" such as instructions that
<well> paste constants together. This leave you with a data dependent
string of calculations with occasional memory references. That is::
universal constants gets rid of the easy to pipeline extra instructions
leaving the meat of the algorithm exposed.
>
Possibly true.
RISC-V tends to have a lot of extra instructions due to lack of big constants and lack of indexed addressing.
You forgot the "every one an his brother" design of the ISA>
And, BJX2 has a lot of frivolous register-register MOV instructions.
I empower you to get rid of them....
OK, I more meant that the compiler is prone to emit lots of:
MOV Reg, Reg
Rather than, say, the ISA listing being full of redundant "MOV Reg, Reg" encodings...
But, getting rid of them has been an ongoing battle of compiler fiddling.
Many were popping up from odd corners, say:
Register allocation issues (mostly involving function call/return);
Type casts and promotions between equivalent representations (*1);
...
*1: Had eliminated some of these, by allowing temporaries to be coerced directly into different types in some cases. But, "for reasons" doesn't really work with some other types of variables. But, say, int->long, or casting between pointer types, etc, can be done without needing to do anything to the value in the register.
But, yes, performance and code density would be better with fewer frivolous register MOVs.
<snip>
If you design around the notion of a 3R1W register file, FMAC and INSERT
fall out of the encoding easily. Done right, one can switch it into a 4R
or 4W register file for ENTER and EXIT--lessening the overhead of call/ret.
>
>
Possibly.
>
It looks like some savings could be possible in terms of prologs and epilogs.
>
As-is, these are generally like:
MOV LR, R18
MOV GBR, R19
ADD -192, SP
MOV.X R18, (SP, 176) //save GBR and LR
MOV.X ... //save registers
>
Why not an instruction that saves LR and GBR without wasting instructions
to place them side by side prior to saving them ??
>
I have an optional MOV.C instruction, but would need to restructure the code for generating the prologs to make use of them in this case.
Say:
MOV.C GBR, (SP, 184)
MOV.C LR, (SP, 176)
Though, MOV.C is considered optional.
There is a "MOV.C Lite" option, which saves some cost by only allowing it for certain CR's (mostly LR and GBR), which also sort of overlaps with (and is needed) by RISC-V mode, because these registers are in GPR land for RV.
But, in any case, current compiler output shuffles them to R18 and R19 before saving them.
WEXMD 2 //specify that we want 3-wide execution here
>
//Reload GBR, *1
MOV.Q (GBR, 0), R18
MOV 0, R0 //special reloc here
MOV.Q (GBR, R0), R18
MOV R18, GBR
>
Correction:
>> MOV.Q (R18, R0), R18
It is gorp like that that lead me to do it in HW with ENTER and EXIT.
Save registers to the stack, setup FP if desired, allocate stack on SP, and decide if EXIT also does RET or just reloads the file. This would require 2 free registers if done in pure SW, along with several MOVs...
>
Possibly.
The partial reason it loads into R0 and uses R0 as an index, was that I defined this mechanism before jumbo prefixes existed, and hadn't updated it to allow for jumbo prefixes.
No time like the present...
OK. Made this change.
Only a minor change to my compiler and PE loader.
Well, and if I used a direct displacement for GBR (which, along with PC, is always BYTE Scale), this would have created a hard limit of 64 DLL's per process-space (I defined it as Disp24, which allows a more reasonable hard upper limit of 2M DLLs per process-space).
In my case, restricting myself to 32-bit IP relative addressing, GOT can
be anywhere within ±2GB of the accessing instruction and can be as big as one desires.
In this case:
GBR points to the start of ".data" for a given PE image;
This starts with a pointer to a table of GBR pointers for every DLL in the process;
Each DLL is assigned an index into this table, fixed up at load time;
The magic ritual, when perfored, will get GBR pointing at the ".data"/".bss" sections for that particular DLL.
But, say, one loads the EXE and DLLs.
One creates a program instance by allocating memory for each of the data/bss sections, copying the data section from the base image, and putting it in the table. Then jumping to the entry point with the EXE's section in GBR.
One can fire up a new instance by allocating a new set of data areas, jumping to the entry point as before. This instance does not need to know or care that the prior instance exists, even if both exist in the same address space, and have all their code at the same addresses (since the ".text" sections are shared between all instances).
Normal PC-relative GOT's can't do this. You would either need multiple address spaces, or multiple loaded copies of each image.
Granted, nowhere near even the limit of 64 as of yet. But, I had noted that Windows programs would often easily exceed this limit, with even a fairly simple program pulling in a fairly large number of random DLLs, so in any case, a larger limit was needed.
Due to the way linkages work in My 66000, each DLL gets its own GOT.
So there is essentially no bounds on how many can be present/in-use.
A LD of a GOT[entry] gets a pointer to the external variable.
A CALX of GOT[entry] is a call through the GOT table using std ABI.
{{There is no PLT}}
OK.
Had done it a little different:
Imported function gets a stub, generally like:
Foo:
MOV.Q (PC, 4), R1
JMP R1
_imp_Foo: .QWORD 0
Import table (or IAT / Import Address Table) points at _imp_Foo and fixes it up to point at the imported function.
Had defined an alternate version:
Foo:
_imp_Foo:
BRA Abs48
The loader would see and special-case the BRA Abs48 instruction.
But, this latter form ran into a problem:
Things will violently explode if the EXE and DLL (or one DLL and another) are not in the same ISA mode (say, Baseline vs XG2).
Which means, I am back to the less efficient option of needing to load then branch.
Granted:
MOV Imm64, R1
JMP R1
Could also work, and saves a few clock cycles.
Doesn't currently extend to global variables, but I don't really feel this is a huge loss. Might fix eventually.
Generally, loader is hard-coded to assume import by name, as I didn't feel it worth the bother to try to deal with importing by 16-bit ordinal number.
One potential optimization here is that the main EXE will always be 0 in the process, so this sequence could be reduced to, potentially:
MOV.Q (GBR, 0), R18
MOV.C (R18, 0), GBR
Early on, I did not have the constraint that main EXE was always 0, and had initially assumed it would be treated equivalently to a DLL.
//Generate Stack Canary, *2
MOV 0x5149, R18 //magic number (randomly generated)
VSKG R18, R18 //Magic (combines input with SP and magic numbers)
MOV.Q R18, (SP, 144)
>
...
function-specific stuff
...
>
MOV 0x5149, R18
MOV.Q (SP, 144), R19
VSKC R18, R19 //Validate canary
...
>
>
*1: This part ties into the ABI, and mostly exists so that each PE image can get GBR reloaded back to its own ".data"/".bss" sections (with
>
Universal displacements make GBR unnecessary as a memory reference can
be accompanied with a 16-bit, 32-bit, or 64-bit displacement. Yes, you can read GOT[#i] directly without a pointer to it.
>
If I were doing a more conventional ABI, I would likely use (PC, Disp33s) for accessing global variables.
Even those 128GB away ??
Nothing is going to have a ".data" or ".bss" section this big...
Though, realistically, the PE/COFF format has a hard-limit of around 4GB due to 32-bit RVAs.
Note that the section headers and data-directories are still the same basic layout as in PE32+.
Well, nevermind the LZ4 compression and removal of MZ-EXE stub (I had no real need for an MZ EXE stub). Though, once the LZ4 decompression is done, the format is basically just PE32+ with the MZ stub removed (and a different format for the contents of the ".rsrc" section, *).
*: The original ".rsrc" section was absurd, so I essentially replaced it with a modified version of the WAD2 format operating in RVA space.
Problem is:
What if one wants multiple logical instances of a given PE image in a single address space?
Not a problem when each PE has a different set of mapping tables (at least
the entries pointing at GOTs[*].
Could be.
Wasn't using GOTs in this case.
Had instead made an unorthodox reinterpretation of the meaning of the "Global Pointer" entry in the Data Directory.
In the official version of PE/COFF, it is unused and must-be-zero IIRC.
In my version, it mostly spans the start of ".data" to the end of ".bss", and gives the region of memory that GBR is intended to point at.
PC REL breaks in this case, unless you load N copies of each PE image, which is a waste of memory (well, or use COW mappings, mandating the use of an MMU).
ELF FDPIC had used a different strategy, but then effectively turned each function call into something like (in SH):
MOV R14, R2 //R14=GOT
MOV disp, R0 //offset into GOT
ADD R0, R2 //adjust by offset
//R2=function pointer
MOV.L (R2, 0), R1 //function address
MOV.L (R2, 4), R3 //GOT
JSR R1
Which I do with::
CALX [IP,R0,#GOT+index<<3-.]
OK.
In the callee:
... save registers ...
MOV R3, R14 //put GOT into a callee-save register
...
In the BJX2 ABI, had rolled this part into the callee, reasoning that handling it in the callee (per-function) was less overhead than handling it in the caller (per function call).
Though, on the RISC-V side, it has the relative advantage of compiling for absolute addressing, albeit still loses in terms of performance.
Compiling and linking to absolute addresses works "really well" when one needs to place different sections in different memory every time the
application/kernel runs due to malicious codes trying to steal everything.
ASLR.....
Errm.
The RV64G compiler output tends to be fixed address by default (at least with "riscv64-unknown-elf-gcc"). Can't be easily relocated.
Though, this is theoretically the least overhead scenario for GCC.
I don't imagine an FDPIC version of RISC-V would win here, but this is only assuming there exists some way to get GCC to output FDPIC binaries (most I could find, was people debating whether to add FDPIC support for RISC-V).
PIC or PIE would also sort of work, but these still don't really allow for multiple program instances in a single address space.
Once you share the code and some of the data, the overhead of using different
mappings for special stuff {GOT, local thread data,...} is
multiple program instances in a single address space). But, does mean that pretty much every non-leaf function ends up needing to go through this ritual.
>
Universal constant solves the underlying issue.
>
I am not so sure that they could solve the "map multiple instances of the same binary into a single address space" issue, which is sort of the whole thing for why GBR is being used.
Otherwise, I would have been using PC-REL...
*2: Pretty much any function that has local arrays or similar, serves to protect register save area. If the magic number can't regenerate a matching canary at the end of the function, then a fault is generated.
>
My 66000 can place the callee save registers in a place where user cannot
access them with LDs or modify them with STs. So malicious code cannot
damage the contract between ABI and core.
>
Possibly. I am using a conventional linear stack.
Downside: There is a need either for bounds checking or canaries. Canaries are the cheaper option in this case.
The cost of some of this starts to add up.
>
>
In isolation, not much, but if all this happens, say, 500 or 1000 times or more in a program, this can add up.
>
Was thinking about that last night. H&P "book" statistics say that call/ret
represents 2% of instructions executed. But if you add up the prologue and
epilogue instructions you find 8% of instructions are related to calling and returning--taking the problem from (at 2%) ignorable to (at 8%) a big
ticket item demanding something be done.
>
8% represents saving/restoring only 3 registers vis stack and associated SP
arithmetic. So, it can easily go higher.
>
I guess it could make sense to add a compiler stat for this...
The save/restore can get folded off, but generally only done for functions with a larger number of registers being saved/restored (and does not cover secondary things like GBR reload or stack canary stuff, which appears to possibly be a significant chunk of space).
Goes and adds a stat for averages:
Prolog: 8% (avg= 24 bytes)
Epilog: 4% (avg= 12 bytes)
Body : 88% (avg=260 bytes)
With 959 functions counted (excluding empty functions/prototypes).
....
Date | Sujet | # | | Auteur |
17 Apr 24 | Stealing a Great Idea from the 6600 | 128 | | John Savard |
18 Apr 24 | Re: Stealing a Great Idea from the 6600 | 125 | | MitchAlsup1 |
18 Apr 24 | Re: Stealing a Great Idea from the 6600 | 124 | | John Savard |
18 Apr 24 | Re: Stealing a Great Idea from the 6600 | 123 | | MitchAlsup1 |
19 Apr 24 | Re: Stealing a Great Idea from the 6600 | 122 | | John Savard |
19 Apr 24 | Re: Stealing a Great Idea from the 6600 | 121 | | John Savard |
19 Apr 24 | Re: Stealing a Great Idea from the 6600 | 120 | | MitchAlsup1 |
20 Apr 24 | Re: Stealing a Great Idea from the 6600 | 2 | | John Savard |
21 Apr 24 | Re: Stealing a Great Idea from the 6600 | 1 | | John Savard |
20 Apr 24 | Re: Stealing a Great Idea from the 6600 | 117 | | John Savard |
20 Apr 24 | Re: Stealing a Great Idea from the 6600 | 116 | | John Savard |
20 Apr 24 | Re: Stealing a Great Idea from the 6600 | 115 | | MitchAlsup1 |
20 Apr 24 | Re: Stealing a Great Idea from the 6600 | 105 | | BGB |
21 Apr 24 | Re: Stealing a Great Idea from the 6600 | 104 | | MitchAlsup1 |
21 Apr 24 | Re: Stealing a Great Idea from the 6600 | 63 | | John Savard |
21 Apr 24 | Re: Stealing a Great Idea from the 6600 | 15 | | John Savard |
25 Apr 24 | Re: Stealing a Great Idea from the 6600 | 14 | | Lawrence D'Oliveiro |
25 Apr 24 | Re: Stealing a Great Idea from the 6600 | 12 | | MitchAlsup1 |
25 Apr 24 | Re: Stealing a Great Idea from the 6600 | 1 | | Lawrence D'Oliveiro |
30 Apr 24 | Re: a bit of history, Stealing a Great Idea from the 6600 | 10 | | John Levine |
3 May 24 | Re: a bit of history, Stealing a Great Idea from the 6600 | 9 | | Anton Ertl |
3 May 24 | Re: a bit of history, Stealing a Great Idea from the 6600 | 7 | | John Levine |
4 May 24 | Re: a bit of history, Stealing a Great Idea from the 6600 | 6 | | Thomas Koenig |
4 May 24 | Re: a bit of history, Stealing a Great Idea from the 6600 | 4 | | John Levine |
4 May 24 | Re: a bit of history, Stealing a Great Idea from the 6600 | 3 | | MitchAlsup1 |
5 May 24 | Re: a bit of history, Stealing a Great Idea from the 6600 | 2 | | Thomas Koenig |
5 May 24 | Re: a bit of history, Stealing a Great Idea from the 6600 | 1 | | MitchAlsup1 |
28 Jul 24 | Re: a bit of history, Stealing a Great Idea from the 6600 | 1 | | Lawrence D'Oliveiro |
3 May 24 | Re: a bit of history, Stealing a Great Idea from the 6600 | 1 | | MitchAlsup1 |
25 Apr 24 | Re: Stealing a Great Idea from the 6600 | 1 | | John Savard |
21 Apr 24 | Re: Stealing a Great Idea from the 6600 | 47 | | MitchAlsup1 |
23 Apr 24 | Re: Stealing a Great Idea from the 6600 | 45 | | George Neuner |
23 Apr 24 | Re: Stealing a Great Idea from the 6600 | 44 | | MitchAlsup1 |
25 Apr 24 | Re: Stealing a Great Idea from the 6600 | 43 | | George Neuner |
26 Apr 24 | Re: Stealing a Great Idea from the 6600 | 42 | | BGB |
26 Apr 24 | Re: Stealing a Great Idea from the 6600 | 41 | | MitchAlsup1 |
26 Apr 24 | Re: Stealing a Great Idea from the 6600 | 2 | | Anton Ertl |
26 Apr 24 | Re: Stealing a Great Idea from the 6600 | 1 | | MitchAlsup1 |
26 Apr 24 | Re: Stealing a Great Idea from the 6600 | 4 | | BGB |
26 Apr 24 | Re: Stealing a Great Idea from the 6600 | 2 | | MitchAlsup1 |
27 Apr 24 | Re: Stealing a Great Idea from the 6600 | 1 | | BGB |
26 Apr 24 | Re: Stealing a Great Idea from the 6600 | 1 | | MitchAlsup1 |
27 Apr 24 | Re: Stealing a Great Idea from the 6600 | 34 | | BGB |
27 Apr 24 | Re: Stealing a Great Idea from the 6600 | 33 | | MitchAlsup1 |
28 Apr 24 | Re: Stealing a Great Idea from the 6600 | 32 | | BGB |
28 Apr 24 | Re: Stealing a Great Idea from the 6600 | 31 | | MitchAlsup1 |
28 Apr 24 | Re: Stealing a Great Idea from the 6600 | 30 | | BGB |
28 Apr 24 | Re: Stealing a Great Idea from the 6600 | 24 | | BGB |
28 Apr 24 | Re: Stealing a Great Idea from the 6600 | 23 | | BGB |
28 Apr 24 | Re: Stealing a Great Idea from the 6600 | 22 | | Thomas Koenig |
28 Apr 24 | Re: Stealing a Great Idea from the 6600 | 21 | | BGB |
28 Apr 24 | Re: Stealing a Great Idea from the 6600 | 20 | | BGB |
28 Apr 24 | Re: Stealing a Great Idea from the 6600 | 2 | | Thomas Koenig |
28 Apr 24 | Re: Stealing a Great Idea from the 6600 | 1 | | BGB |
29 Jul 24 | Re: Stealing a Great Idea from the 6600 | 16 | | Lawrence D'Oliveiro |
29 Jul 24 | Re: Stealing a Great Idea from the 6600 | 6 | | BGB |
30 Jul 24 | Re: Stealing a Great Idea from the 6600 | 5 | | Lawrence D'Oliveiro |
30 Jul 24 | Re: Stealing a Great Idea from the 6600 | 4 | | BGB |
31 Jul 24 | Re: Stealing a Great Idea from the 6600 | 3 | | Lawrence D'Oliveiro |
31 Jul 24 | Re: Stealing a Great Idea from the 6600 | 2 | | BGB |
1 Aug 24 | Re: Stealing a Great Idea from the 6600 | 1 | | Lawrence D'Oliveiro |
29 Jul 24 | Re: Stealing a Great Idea from the 6600 | 9 | | Terje Mathisen |
29 Jul 24 | Re: Stealing a Great Idea from the 6600 | 8 | | MitchAlsup1 |
30 Jul 24 | Re: Stealing a Great Idea from the 6600 | 1 | | Lawrence D'Oliveiro |
30 Jul 24 | Re: Stealing a Great Idea from the 6600 | 4 | | Michael S |
30 Jul 24 | Re: Stealing a Great Idea from the 6600 | 3 | | MitchAlsup1 |
31 Jul 24 | Re: Stealing a Great Idea from the 6600 | 2 | | BGB |
1 Aug 24 | Re: Stealing a Great Idea from the 6600 | 1 | | Lawrence D'Oliveiro |
1 Aug 24 | Re: Stealing a Great Idea from the 6600 | 2 | | Thomas Koenig |
1 Aug 24 | Re: Stealing a Great Idea from the 6600 | 1 | | MitchAlsup1 |
29 Jul 24 | Re: Stealing a Great Idea from the 6600 | 1 | | George Neuner |
28 Apr 24 | Re: Stealing a Great Idea from the 6600 | 5 | | MitchAlsup1 |
28 Apr 24 | Re: Stealing a Great Idea from the 6600 | 4 | | BGB |
29 Apr 24 | Re: Stealing a Great Idea from the 6600 | 3 | | MitchAlsup1 |
29 Apr 24 | Re: Stealing a Great Idea from the 6600 | 2 | | BGB |
29 Apr 24 | Re: Stealing a Great Idea from the 6600 | 1 | | Thomas Koenig |
29 Apr 24 | Re: Stealing a Great Idea from the 6600 | 1 | | Tim Rentsch |
21 Apr 24 | Re: Stealing a Great Idea from the 6600 | 40 | | BGB |
21 Apr 24 | Re: Stealing a Great Idea from the 6600 | 39 | | MitchAlsup1 |
22 Apr 24 | Re: Stealing a Great Idea from the 6600 | 3 | | BGB |
22 Apr 24 | Re: Stealing a Great Idea from the 6600 | 2 | | MitchAlsup1 |
22 Apr 24 | Re: Stealing a Great Idea from the 6600 | 1 | | BGB |
22 Apr 24 | Re: Stealing a Great Idea from the 6600 | 2 | | John Savard |
22 Apr 24 | Re: Stealing a Great Idea from the 6600 | 1 | | BGB |
22 Apr 24 | Re: Stealing a Great Idea from the 6600 | 33 | | Terje Mathisen |
22 Apr 24 | Re: Stealing a Great Idea from the 6600 | 1 | | BGB |
13 Jun 24 | Re: Stealing a Great Idea from the 6600 | 31 | | Kent Dickey |
13 Jun 24 | Re: Stealing a Great Idea from the 6600 | 16 | | Stefan Monnier |
13 Jun 24 | Re: Stealing a Great Idea from the 6600 | 15 | | BGB |
13 Jun 24 | Re: Stealing a Great Idea from the 6600 | 14 | | MitchAlsup1 |
14 Jun 24 | Re: Stealing a Great Idea from the 6600 | 13 | | BGB |
18 Jun 24 | Re: Stealing a Great Idea from the 6600 | 12 | | MitchAlsup1 |
19 Jun 24 | Re: Stealing a Great Idea from the 6600 | 8 | | BGB |
19 Jun 24 | Re: Stealing a Great Idea from the 6600 | 7 | | MitchAlsup1 |
19 Jun 24 | Re: Stealing a Great Idea from the 6600 | 5 | | BGB |
19 Jun 24 | Re: Stealing a Great Idea from the 6600 | 4 | | MitchAlsup1 |
20 Jun 24 | Re: Stealing a Great Idea from the 6600 | 3 | | Thomas Koenig |
20 Jun 24 | Re: Stealing a Great Idea from the 6600 | 2 | | MitchAlsup1 |
21 Jun 24 | Re: Stealing a Great Idea from the 6600 | 1 | | Thomas Koenig |
20 Jun 24 | Re: Stealing a Great Idea from the 6600 | 1 | | John Savard |
19 Jun 24 | Re: Stealing a Great Idea from the 6600 | 1 | | Thomas Koenig |
20 Jun 24 | Re: Stealing a Great Idea from the 6600 | 1 | | MitchAlsup1 |
31 Jul 24 | Re: Stealing a Great Idea from the 6600 | 1 | | Lawrence D'Oliveiro |
13 Jun 24 | Re: Stealing a Great Idea from the 6600 | 13 | | MitchAlsup1 |
14 Jun 24 | Re: Stealing a Great Idea from the 6600 | 1 | | Terje Mathisen |
22 Apr 24 | Re: Stealing a Great Idea from the 6600 | 9 | | John Savard |
18 Apr 24 | Re: Stealing a Great Idea from the 6600 | 2 | | Lawrence D'Oliveiro |