Liste des Groupes | Revenir à c arch |
On 7/24/2024 3:37 PM, MitchAlsup1 wrote:a) one does not need a SUB or NEG instruction as one has:Just before Google Groups got spammed to death; I wrote::>
--------------------------------------------------------
MitchAlsup
Nov 1, 2022, 5:53:02 PM
>
In a thread called "Arguments for a Sane Instruction Set Architecture"
Aug 7, 2017, 6:53:09 PM I wrote::
-----------------------------------------------------------------------
Looking back over my 40-odd year career in computer architecture,
I thought I would list out the typical errors I and others have
made with respect to architecting computers. This is going to be
a bit long, so bear with me:
>
When the Instruction Set architecture is Sane, there is support
for:
A) negating operands prior to an arithmetic calculation.
Not seen many doing this, and might not be workable in the general case.
Might make sense for FPU ops like FADD/FMUL.
>
Maybe 'ADD'. Though, "-(A+B)" is the only case that can't be expressed
with traditional ADD/SUB/RSUB.
>ST #3.14159265358927,[IP,R3<<3,#0x123456789abcd]
>B) providing constants from the instruction stream;>
..where constant can be an immediate a displacement, or both.
Probably true.
>
My ISA allows for Immediate or Displacement to be extended, but doesn't
currently allow (in the base ISA) any instructions that can encode both
an immediate and displacement.
>Tread "thread state" and its register file as a write back cache.
At present:
Baseline allows Imm33s/Disp33s via a 64-bit encoding;
There is optional support for Imm57s, which in XG2 is now extended to
Imm64.
>
There are special cases that allow immediate encodings for many
instructions that would otherwise lack an immediate encoding.
>
>C) exact floating point arithmetics that get the Inexact flag>
..correctly unmolested.
Dunno. I suspect the whole global FPU status/control register thing
should probably be rethought somehow.
>
But, off-hand, don't know of a clearly better alternative.
>
>D) exception and interrupt control transfer should take no more>
..than 1 cache line read followed by 4 cache line reads to the
..same page in DRAM/L3/L2 that are dependent on the first cache
..line read. Control transfer back to the suspended thread should
..be no longer than the control transfer to the exception handler.
Likely expensive...
>Under My 66000 a low end implementation can choose the write back cache
>
Granted, "glorified branch with some twiddling" is probably a little too
far in the other direction. Interrupt and syscall overhead is fairly
high when the handler needs to manually save and restore all the
registers each time.
>
>
A fast, but more expensive, option would be to have multiple copies of
the register file which is then bank-switched on an interrupt.
>Just memory map everything into MMI/O space where you have access to
One possibility here could be, rather than being hard-wired to specific
modes, there are 4 assignable register banks controlled by 2 status
register bits.
>
Then, say:
0: User Task 1
1: User Task 2
2: Reserved for Kernel / Syscall Task;
3: Reserved for interrupts.
>
Possibly along with instructions to move between the banked registers
and the currently active register file.
>Just MMI/O
>
Though, likely cost would be that it would require putting the GPR
register file in Block-RAM and possibly needing to increase pipeline
length.
>For SYSCALL in particular, you want at least 6 of the callers registers
In an OS, the syscall and interrupt bank would likely be assigned
statically, and the others could be assigned dynamically by the
scheduler (though, as-is, would likely increase task-switch overhead vs
the current mechanism).
>The Write Back Cache model works easier.
This situation could potentially be "better" if there were 8 dynamic
banks, with the scheduler potentially able to be clever and reuse banks
if they haven't been evicted and the same process is run again (but
could otherwise reassign them round-robin or similar).
>My SVC overhead is about 10 cycles.
Though, can note that as-is, in my case, in some programs, system call
overhead is high enough that all this could be worth looking into (Say:
Quake 3 manages to spend nearly 3% of the clock-cycle budget in the
SYSCALL ISR; mostly saving/restoring registers).
Policy remains in SW, the ability to manifest a SW choice fast is in HW.E) Exception control transfer can transfer control directly to a>
..user privilege thread without taking an excursion through the
..Operating System.
? Putting the scheduler in hardware?...
>Signal handlers.
Could make sense for a microcontroller, but less so for a conventional
OS as pretty much the only things handling interrupts are likely to be
supervisor-mode drivers.
>It is simply a fully pipelined version of LL/SCF) upon arrival at an exception handler, no state needs to be saved,>
..and the "cause" of the exception is immediately available to the
..Exception handler.
G) Atomicity over a multiplicity of instructions and over a
..multiplicity of memory locations--without losing the
..illusion of real atomicity.
Memory consistency is hard...
>The trap is likely more cycles than FSIN().H) Elementary Transcendental function are first class citizens of>
..the instruction set, and at least faithfully accurate and perform
..at the same speeds as SQRT and DIV.
.... Yeah...
>
In my case, they don't exist, and FDIV and FSQRT are basically boat
anchors.
>
>
Well, I guess it could be possible to support them in the ISA if they
were all boat anchors.
>
Say:
FSIN Rm, Rn
Raises an TRAPFPU exception, whereupon the exception handler decodes the
instruction and performs the FSIN operation.
>How does one walk a nested page table when HV does not want OS to seeI) The "system programming model" is inherently:>
..1) Virtual Machine
..2) Hypervisor + Supervisor
..3) multiprocessor, multithreaded
If the system-mode architecture is low-level enough, the difference
between normal OS functionality and emulation starts to break down.
>
Like, in both cases one has:
Software page table walking;
Needing to keep track of a virtual model of the TLB;TLB is an association of host.PTE with guest.virtual-address.
>For the record, My 66000 code is PIC, including GOT, method calls, andJ) Simple applications can run with as little as 1 page of Memory>
..Mapping overhead. An application like 'cat' can be run with
..an total allocated page count of 6: {MMU, Code, Data, BSS, Stack,
..and Register Files}
Hmm.
>
>
I guess one could make a case for a position-independent version of an
"a.out" like format, focused on low-footprint binaries.
Les messages affichés proviennent d'usenet.