Newsportal USENET - Arguments for a sane ISA 6-years later

Just before Google Groups got spammed to death; I wrote::
--------------------------------------------------------
MitchAlsup
Nov 1, 2022, 5:53:02 PM
In a thread called "Arguments for a Sane Instruction Set Architecture"
Aug 7, 2017, 6:53:09 PM I wrote::
-----------------------------------------------------------------------
Looking back over my 40-odd year career in computer architecture,
I thought I would list out the typical errors I and others have
made with respect to architecting computers. This is going to be
a bit long, so bear with me:
When the Instruction Set architecture is Sane, there is support
for:
A) negating operands prior to an arithmetic calculation.
B) providing constants from the instruction stream;
..where constant can be an immediate a displacement, or both.
C) exact floating point arithmetics that get the Inexact flag
..correctly unmolested.
D) exception and interrupt control transfer should take no more
..than 1 cache line read followed by 4 cache line reads to the
..same page in DRAM/L3/L2 that are dependent on the first cache
..line read. Control transfer back to the suspended thread should
..be no longer than the control transfer to the exception handler.
E) Exception control transfer can transfer control directly to a
..user privilege thread without taking an excursion through the
..Operating System.
F) upon arrival at an exception handler, no state needs to be saved,
..and the "cause" of the exception is immediately available to the
..Exception handler.
G) Atomicity over a multiplicity of instructions and over a
..multiplicity of memory locations--without losing the
..illusion of real atomicity.
H) Elementary Transcendental function are first class citizens of
..the instruction set, and at least faithfully accurate and perform
..at the same speeds as SQRT and DIV.
I) The "system programming model" is inherently:
..1) Virtual Machine
..2) Hypervisor + Supervisor
..3) multiprocessor, multithreaded
J) Simple applications can run with as little as 1 page of Memory
..Mapping overhead. An application like 'cat' can be run with
..an total allocated page count of 6: {MMU, Code, Data, BSS, Stack,
..and Register Files}
--------------------------------------------------------------------
<
I though it might be fun to have a review of what came out of this::
<
At the time of that writing My 66000 ISA was still gestating in my
head--I was pretty much following the Mc 88000 Architecture in scope
and in format.
<
So; point by point::
<
A) negating operands prior to an arithmetic calculation.
1-operand instructions have sign control over result and of operand
2-operand instructions have sign control over both operands
3-operand instructions have sign control over two operands
So: check

>

B) providing constants from the instruction stream;
1-operand instructions have one <optional> immediate
2-operand instructions have one register and one <optional> immediate
3-operand instructions have two registers and one <optional> immediate
Loads have base register, index register and <optional> displacement
Stores have the same addressing, but the value being stored can be
....either from a register or from an immediate.
Many immediates have auto-expanding characteristics::
one can FADD Rd,Rs1,#3 to add 3.0D0 using a single 1-word
instruction. 32-bit immediates for (double) FP calculations are auto-
expanded to 64-bits in operand delivery.
Similarly, integer instructions have ±5-bit immediates, signed 16-bit
immediates, 32-bit immediates and 64-bit immediates.
Memory references have 16-bit, 32-bit, and 64-bit displacements.
When Rbase = R0 IP is inserted for easy access to data relative to the
code stream.
So, big Check
<
C) exact floating point arithmetics that get the Inexact flag
..correctly unmolested.
While CARRY provides access to these features (and the inexact bit
....gets set correctly; it my current assessment that DBLE will be
....greater use and utility than the exact FP arithmetics.
So, little check
D) exception and interrupt control transfer should take no more
..than 1 cache line read followed by 4 cache line reads to the
..same page in DRAM/L3/L2 that are dependent on the first cache
..line read.
While the above is TRUE it is different than expected. Yes, a context
switch still takes 5-cache line reads, and context switch can transpire
from any thread under any GuestOS to any other thread under any other
GuestOS, all of this is "perpetrated" by a "fixed function unit" far
from the cores of the chip.
This fixed function unit combines the thread being scheduled, the
customer thread asking for service and the <appropriate> HyperVisor
data "assembled" into a single message that effects a context switch.
<
E) Exception control transfer can transfer control directly to a
..user privilege thread without taking an excursion through the
..Operating System.
This remains illusive--while it is technically possible to setup
"state" such that the above happens; it requires each such thread
run under its unique GuestOS. However, one can configure a rather
normal GuestOS so that the exception dispatcher transfers control
to a user level exception handler in 15-ish instructions.
So, medium check
-------------------------------------------------------------------
Update: My 66000 new interrupt architecture can now allow interrupts
or exceptions to be directed at Application privilege level. And this
is how Linux would deliver signal() to Applications.
In addition, VMexit()s need no diddling with interrupt or PCI control
registers.
So, memory is virtualized, devices are virtualized, device DMA is
virtualized, device interrupts are virtualized, and the relation-
ship between cores and interrupt is virtualized; not needing any
diddling when one traverses up and down the privilege levels. The
only overhead is <rather static> mapping tables.
-------------------------------------------------------------------
<
F) upon arrival at an exception handler, no state needs to be saved,
..and the "cause" of the exception is immediately available to the
..Exception handler.
This above TRUE and also comes with the property that multiple
exceptions can be logged onto a handler without Interrupt or
Exception disablement.
No state needs to be saved: Check
No state needs to be loaded: Check
Pertinence arrives with control: Check
Control arrives on affinitizxed core: Check
--------
unCheck
--------
Control arrives at proper priority: Check
Control arrives with proper "privilege": Check
Hard Real Time supported: Maybe
---------------------------
Closer to check than maybe.
---------------------------
Moderate Real Time Supported: Check
No extraneous excursions though OS: Check.
Overall: Big check
<
G) Atomicity over a multiplicity of instructions and over a
..multiplicity of memory locations--without losing the
..illusion of real atomicity.
Up to 8 cache lines participate in an ATOMIC event.
Multiple locations in each line may have state altered.
There is direct access to whether interference has transpired.
Software can use interference to drive down future interference.
Hardware can transfer control is ATOMICITY has been violated.
Essentially ANY atomic-primitive studied in academia or provided
by industry can be synthesized.
So, medium check
<
H) Elementary Transcendental function are first class citizens of
..the instruction set, and at least faithfully accurate and perform
..at the same speeds as SQRT and DIV.
Transcendental functions operate at about the latency of FDIV
ln2, ln2P1, exp2, exp2M1 14 cycles
ln, ln10, exp, exp10 <and cousins> 18 cycles
sin, sinpi, cos, cospi 19 cycles {including Payne and Hanek argument
reduction}
tan, atan 19 or 38 cycles
power 35 cycles
23 Transcendental instructions are available in (float) and (double)
forms.
(float will be around 9 cycles)
So, reasonable check.
<
I) The "system programming model" is inherently:
..1) Virtual Machine
..2) Hypervisor + Supervisor
..3) multiprocessor, multithreaded
It is not only the above, but even moderately hard real time is built
in.
Interrupts are directed at threads not cores
------------------------------------------------------------------------
Turns out that Linux thinks interrupts are directed at cores and there
is essentially nothing anyone can do about that. My 66000 new system
model is much more Linus friendly at the cost of hard real time.
------------------------------------------------------------------------
Deferred Procedure Calls are single instruction events
--------
Check.
-------
Most handler->handler control transfers do not need an excursion though
the OS scheduler.
-----------------------------------------------------------------------
ISR schedules a softIRQ and then when it SVRs the softIRQ gains control
before what originally got interrupted, transitively, without having SW
traverse schedule queues.
-----------------------------------------------------------------------
Basically, if you have less than 1024 processes in a Linux system, the
lower level scheduler consumes no cycles on a second by second basis.
Context switch between threads under different hypervisors is the same
10-cycles as context switch between threads under the same GuestOS (10).
Conventional machines might take 1,000 cycles for a within GuestOS
context switch and 10,000 cycles on a between Guest OS context switch;
given 1,000 context switches per second, this accounts for a fraction
of a percent speed up.
So, moderate-big check
<
J) Simple applications can run with as little as 1 page of Memory
..Mapping overhead.
Achievable even when different areas {.text, .data, .bss, .stack, ...}
are separated by GB or even TB.
So, check
-------------------------------------------------------------------
That is all.

Date	Sujet	#	Auteur
24 Jul 24	Arguments for a sane ISA 6-years later	63	MitchAlsup1
25 Jul 24	Re: Arguments for a sane ISA 6-years later	62	BGB
25 Jul 24	Re: Arguments for a sane ISA 6-years later	57	Chris M. Thomasson
26 Jul 24	Re: Arguments for a sane ISA 6-years later	56	Anton Ertl
26 Jul 24	Re: Arguments for a sane ISA 6-years later	20	BGB
29 Jul 24	Re: Arguments for a sane ISA 6-years later	19	Anton Ertl
29 Jul 24	Intel overvoltage (was: Arguments for a sane ISA 6-years later)	2	Thomas Koenig
29 Jul 24	Re: Intel overvoltage	1	BGB
29 Jul 24	Re: Arguments for a sane ISA 6-years later	16	BGB
30 Jul 24	Re: Arguments for a sane ISA 6-years later	15	Anton Ertl
30 Jul 24	Re: Arguments for a sane ISA 6-years later	14	BGB
30 Jul 24	Re: Arguments for a sane ISA 6-years later	2	Chris M. Thomasson
30 Jul 24	Re: Arguments for a sane ISA 6-years later	1	BGB
1 Aug 24	Re: Arguments for a sane ISA 6-years later	11	Anton Ertl
1 Aug 24	Re: Arguments for a sane ISA 6-years later	1	Michael S
1 Aug 24	Re: Arguments for a sane ISA 6-years later	8	MitchAlsup1
1 Aug 24	Re: Arguments for a sane ISA 6-years later	1	Michael S
2 Aug 24	Re: Arguments for a sane ISA 6-years later	6	MitchAlsup1
2 Aug 24	Re: Arguments for a sane ISA 6-years later	1	Michael S
4 Aug 24	Re: Arguments for a sane ISA 6-years later	4	MitchAlsup1
5 Aug 24	Re: Arguments for a sane ISA 6-years later	3	Stephen Fuld
5 Aug 24	Re: Arguments for a sane ISA 6-years later	2	Stephen Fuld
5 Aug 24	Re: Arguments for a sane ISA 6-years later	1	MitchAlsup1
1 Aug 24	Re: Arguments for a sane ISA 6-years later	1	BGB
26 Jul 24	Re: Arguments for a sane ISA 6-years later	20	MitchAlsup1
27 Jul 24	Re: Arguments for a sane ISA 6-years later	1	BGB
29 Jul 24	Memory ordering (was: Arguments for a sane ISA 6-years later)	18	Anton Ertl
29 Jul 24	Re: Memory ordering	15	MitchAlsup1
29 Jul 24	Re: Memory ordering	6	Chris M. Thomasson
29 Jul 24	Re: Memory ordering	5	MitchAlsup1
30 Jul 24	Re: Memory ordering	4	Michael S
31 Jul 24	Re: Memory ordering	3	Chris M. Thomasson
31 Jul 24	Re: Memory ordering	2	Chris M. Thomasson
31 Jul 24	Re: Memory ordering	1	Chris M. Thomasson
30 Jul 24	Re: Memory ordering	8	Anton Ertl
30 Jul 24	Re: Memory ordering	2	Chris M. Thomasson
30 Jul 24	Re: Memory ordering	1	Chris M. Thomasson
31 Jul 24	Re: Memory ordering	5	MitchAlsup1
31 Jul 24	Re: Memory ordering	1	Chris M. Thomasson
1 Aug 24	Re: Memory ordering	3	Anton Ertl
1 Aug 24	Re: Memory ordering	2	MitchAlsup1
2 Aug 24	Re: Memory ordering	1	Anton Ertl
29 Jul 24	Re: Memory ordering	2	Chris M. Thomasson
30 Jul 24	Re: Memory ordering	1	Chris M. Thomasson
29 Jul 24	Re: Arguments for a sane ISA 6-years later	13	Chris M. Thomasson
29 Jul 24	Re: Arguments for a sane ISA 6-years later	9	BGB
29 Jul 24	Re: Arguments for a sane ISA 6-years later	8	Chris M. Thomasson
29 Jul 24	Re: Arguments for a sane ISA 6-years later	1	Chris M. Thomasson
29 Jul 24	Re: Arguments for a sane ISA 6-years later	2	BGB
29 Jul 24	Re: Arguments for a sane ISA 6-years later	1	Chris M. Thomasson
30 Jul 24	Re: Arguments for a sane ISA 6-years later	4	jseigh
30 Jul 24	Re: Arguments for a sane ISA 6-years later	3	Chris M. Thomasson
31 Jul 24	Re: Arguments for a sane ISA 6-years later	2	jseigh
31 Jul 24	Re: Arguments for a sane ISA 6-years later	1	Chris M. Thomasson
29 Jul 24	Memory ordering (was: Arguments for a sane ISA 6-years later)	1	Anton Ertl
29 Jul 24	Re: Arguments for a sane ISA 6-years later	2	MitchAlsup1
29 Jul 24	Re: Arguments for a sane ISA 6-years later	1	BGB
6 Aug 24	Re: Arguments for a sane ISA 6-years later	2	Chris M. Thomasson
6 Aug 24	Re: Arguments for a sane ISA 6-years later	1	Chris M. Thomasson
25 Jul 24	Re: Arguments for a sane ISA 6-years later	4	MitchAlsup1
26 Jul 24	Re: Arguments for a sane ISA 6-years later	1	BGB
28 Jul 24	Re: Arguments for a sane ISA 6-years later	2	Paul A. Clayton
28 Jul 24	Re: Arguments for a sane ISA 6-years later	1	MitchAlsup1