Sujet : Re: Making Lemonade (Floating-point format changes)
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.archDate : 14. May 2024, 00:12:12
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <v1u6oi$3o53t$1@dont-email.me>
References : 1 2 3 4 5 6
User-Agent : Mozilla Thunderbird
On 5/13/2024 4:16 PM, MitchAlsup1 wrote:
BGB wrote:
>
Emulation via traps is very slow, but typical for many ISA's is to just quietly turn the soft-float operations into runtime calls.
I recall that MIPS could emulate a TLB table walk in something like
19 cycles. That is:: a few cycles to get there, a hash table access,
a check, a TLB install, and a few cycles to get back.
On an x86 this would be at least 200 cycles just getting there and back.
I guess there are different possibilities here...
Trap cost can be reduced, say, by having banked registers.
But, not so good with explicit save/restore and a large register file.
For example, I can note that a MSP430 at 16MHz can service a 32kHz timer... (with a budget of 488 cycles per interrupt).
But, my BJX2 core (at 50MHz) would have a harder time here, with around a 1.5k cycle budget...
Then again, it is possible the per-interrupt overhead would go down slightly, since most likely the ISR stack will still be in the L1 cache between interrupts (and save/restore overhead should drop to ~ 100 cycles in the absence of L1 misses).
MSP430 had a slight advantage here (besides fewer registers) in that L1 misses are not a thing (so, memory access has constant latency).
So, to revisit your statement::
Emulation is slow when trap overhead is large and not-slow when trap overhead
is small.
Possible, but I would not expect trap overhead to be lower than runtime call overhead...
Also (in my case):
Debugging is rather annoying in cases where dealing with bugs appear/disappear/move around at random or with the slightest perturbation...
But, given for the most part behavior is consistently buggy (and manifesting in seemingly the same ways) between both the emulator and Verilog implementation, this implies the causal factors are in software.
I guess in this case, either I figure it out, or will need to again go back to cooperative scheduling. Seemingly, using preemptive scheduling and virtual memory at the same time is particularly unstable (programs tend to crash on startup or soon after).
Also I may need to rework how page-in/page-out is handled (and or how IO is handled in general) since if a page swap needs to happen while IO is already in progress (such as a page-miss in the system-call process), at present, the OS is dead in the water (one can't access the SDcard in the middle of a different access to the SDcard).
Though, this issue is reduced if all memory used for kernel-level operation is either physical or direct-mapped (well, and/or doing "TLB pokes" for each cluster before reading/writing it).
But, needing to use memory probes in some cases (before certain operations, such as inter-ISA branches, etc; because of bad things that can happen if a TLB miss or page-miss happens at that moment), is kinda stupid...
Doing a test:
Seems the bugs with preemption + virtual memory are independent of the usage of traditional nested page tables or hybrid B-Tree based page tables, ...