Liste des Groupes | Revenir à c arch |
On 2024-08-27 11:33 p.m., BGB wrote:There are apparently some things FPGAs do very well:On 8/27/2024 6:50 PM, MitchAlsup1 wrote:A bit much to expect an low-cost FPGA to run anything very fast. Performance is not everything though. Industry maturing?On Tue, 27 Aug 2024 22:39:02 +0000, BGB wrote:>
>On 8/27/2024 2:59 PM, John Dallman wrote:Yet, these people have decades of experience building complex thingsIn article <vajo7i$2s028$1@dont-email.me>, tkoenig@netcologne.de (Thomas>
Koenig) wrote:
>Just read that some architects are leaving Intel and doing their own>
startup, apparently aiming to develop RISC-V cores of all things.
They're presumably intending to develop high-performance cores, since
they have substantial experience in doing that for x86-64. The question
is if demand for those will develop.
>
Making RISC-V "not suck" in terms of performance will probably at least
be easier than making x86-64 "not suck".
>
that
made x86 (also() not suck. They should have the "drawing power" to get
more people with similar experiences.
>
The drawback is that they are competing with "everyone else in
RISC-V-land,
and starting several years late.
Though, if anything, they probably have the experience to know how to make things like the fabled "opcode fusion" work without burning too many resources.
>
>>>>Android is apparently waiting for a new RISC-V instruction setabout anyone wanting to do so on a large scale.
extension; >> you can run various Linuxes, but I have not heard>>
My thoughts for "major missing features" is still:
Needs register-indexed load;
Needs an intermediate size constant load (such as 17-bit sign extended)
in a 32-bit op.
Full access to constants.
>
That would be better, but is unlikely within the existing encoding constraints.
>
But, say, if one burned one of the remaining unused "OP Rd, Rs, Imm12s" encodings as an Imm17s, well then...
>
There were a few holes in this space. Like, for example, there are no ANDW/ORW/XORW ops with Imm12s, so these spots could be reclaimed and used for such a purpose, treating the Imm12 and Rs as a combined 17- bit field.
>
>
But, arguably, LUI+ADD, or LUI+ADD+LUI+ADD+SLLI+ADD, may not matter as much if one can afford the pattern-matching logic to turn 2 (or 6) operations into a fused operation...
>
>>Where, there is a sizeable chunk of constants between 12 and 17 bits,>
but not quite as many between 17 and 32 (and 32-64 bits is comparably
infrequent).
Except in in "math codes".
>
But 64-bit memory reference displacements means one does not have to
even bother to have a strategy of what to do when you need a single
FORTRAN common block to be 74GB in size in order to run 5-decade old
FEM codes.
>
I don't assume that RISC-V would be getting a 64-bit FPU immediate anytime soon.
>
>>I could also make a case for an instruction to load a Binary16 value and>
convert to Binary32 or Binary64 in an FPR, but this is arguably a bit
niche (but, would still beat out using a memory load).
Most of these are covered by something like::
>
CVTSD Rd,#1 // 32-bit instruction
>
My case, I have:
FLDCH Imm16f, Rn //also a 32-bit instruction
Which can cover a significant majority of typical FP constants.
>
>
In RISC-V, one needs to use a memory load, and store in memory using the full 64-bits if one needs the value as "double". This kinda sucks.
>
Though, arguably still not as bad as it was on SH-4 (where constant loading in general was a PITA; and loading a FP constant typically involved multiple memory loads, and an address generation).
>
Eg:
MOVA @(PC, Disp8), R3
FMOV.S @R3+, FR5
FMOV.S @R3+, FR4
AKA: Suck...
>
>>>>
Big annoying thing with it, is that to have any hope of adoption, one
needs an "actually involved" party to add it. There doesn't seem to be
any sort of aggregated list of "known in-use" opcodes, or any real
mechanism for "informal" extensions.
With the OpCode space already 98% filled there does not need to
be such a list.
>
One would still need it if multiple parties want to be able to define an extension independently of each other and not step on the same encodings.
>
>
Well, or it becomes like the file-extension space where there are seemingly pretty much no unused 2 or 3 letter filename extensions.
>
So, for some recent formats I went and used ".GTF" and ".UPI", which while not unused, were not used by anything I had reason to care about (medical research and banks).
>
>
Though, with file extensions and names, at least one can web-search them (which is more than one can do to check whether or not a part of the RISC-V opcode map is used by a 3rd party extension).
>
What provisions have been made, don't scale much beyond "specific SoC provides extensions within a block generally provisioned for SoC specific extensions".
>
>>The closest we have on the latter point is the "Composable Extensions">
extension by Jan Gray, which seems to be mostly that part of the ISA's
encoding space can be banked out based on a CSR or similar.
>
>
Though, bigger immediate values and register-indexed loads do arguably
better belong in the base ISA encoding space.
Agreed, but there is so much more.
>
FCMP Rt,#14,R19 // 32-bit instruction
ENTER R16,R0,#400 // 32-bit instruction
..
>
These are likely a bit further down the priority list.
>
High priority cases would likely be things that happen often enough to significantly effect performance.
>
>
As I see it, array loads/stores, and integer constant values in the 12-17 bit range, are common enough to justify this.
>
>
Prolog/Epilog happens once per function, and often may be skipped for small leaf functions, so seems like a lower priority. More so, if one lacks a good way to optimize it much beyond the sequence of load/store ops which is would be replacing (and maybe not a way to do it much faster than however can be moved in a single clock cycle with the available register ports).
>
>>>>
At present, I am still on the fence about whether or not to support the
C extension in RISC-V mode in the BJX2 Core, mostly because the encoding
scheme just sucks bad enough that I don't really want to deal with it.>>
Realistically, can't likely expect anyone else to adopt BJX2 though.
Captain Obvious strikes again.
>
This is likely the fate of nearly every hobby class ISA.
>
>
Like, there is seemingly pretty much nothing one can do that other people haven't done already, and often better. It then becomes a question of if it can be personally interesting or maybe useful.
>
Like, even when I am beating RISC-V in terms of performance, it is usually only by 20%-60%, with other cases being closer to break even.
>
>
And, the only times it really pulls strongly ahead are when I try to use it more like a GPU than as a CPU. If anything, it makes more sense for me to try using it like a GPU or NPU ISA, and then leaving CPU stuff more to RISC-V (where people are more likely to care about things like GCC support; and commercially available CPU ASICs).
>
>
And like, even at this sort of task, a BJX2 core running on an FPGA isn't exactly going to be able to match something like an NVIDIA Jetson at running TensorFlow models (and also the Jetson Nano is cheaper than a Nexys A7, ...).
>
>
And, most of the nets I can run, are mostly multilayer perceptrons; as that much bigger than perceptron style nets are too big/slow to be processed in any reasonable amount of time.
>
Being able to compete performance wise at these tasks with a 2003 era laptop or a 700 MHz ARM11 based RasPi, likely doesn't count for much.
The 2003 laptop has x87; The ARM11 theoretically has a decent FPU ISA (VFP2), just FPU performance on the original RasPi seems to be unexpectedly weak.
>
The RasPi is around 9x faster than the BJX2 core at Dhrystone, but this is within the expected margin of error (though slightly less than the 14x clock-speed difference).
>
>
Does at least significantly beat out doing it with RV64G though (at 50MHz), as the lack of "dense FP-SIMD" effectively makes RV64G entirely non-competitive at this task.
>
But, most normal C code, isn't going to make much use of SIMD.
Well, even as much as compilers try to use SIMD, its automatic vectorization is at best "weak", often still falling well short of manually writing SIMD code (but, with the lack of any "common denominator" that works transparently across targets and compilers).
>
>
...
>
>
>>>>
Though, bigger issue might be how to make it able to access hardware
devices (seems like part of the physical address space is used for as a
PCI Config space, and would need to figure out what sorts of devices the
Linux kernel expects to be there in such a scenario).
It is reasons like this that cause My 66000 to have four 64-bit address
spaces {DRAM, MMI/O, configuration, ROM}. PCIe MMI/O space can easily
exceed 42-bits before one throws MR-IOV at the problem. Configuration
headers in My 66000 contain all the information CPUID has in x86-land.
Presumably, one would mimic the memory map of whatever SiFive device one is claiming to be for sake of Linux Kernel compatibility. From what I could gather, not all of them have the same physical memory map (and it doesn't seem well documented).
>
Then, one has to know what hardware interfaces one needs to support (there is likely to be a specific hardware list for a particular SoC that the kernel would be built to expect).
>
Well, or go the route of trying to build the kernel themselves, and then figuring out which drivers Linux supports and which would be easiest to implement hardware interfaces for, ...
>
>
>
Though, at present, for my project it would probably be less effort to make TestKern fake the Linux syscalls for RISC-V mode than to make the BJX2 core pretend to be a SiFive SoC.
>
But, then again, the bigger problem (for practical use) isn't so much its lack of ability to run Linux software, but that it only runs at 50 MHz (and has nowhere enough "go fast" magic to make 50 MHz "not slow").
>
...
>
>
Going with a variable length instruction set for the latest design, even though I touted there were issues which such in a FPGA. It supports 6, 14, and 30-bit immediates for many instructions. The exceptions are the load immediate, which can handle 8,16,32, and 64-bit constants. Compares can also handle 8,16,32, and 64-bit constants. That includes float loads and compares too. The design is a bit more complex than a RISC machine. It has scaled-indexed-addressing, and variable sized displacement addressing too (6, 14, and 22-bit).I ended up mostly with:
The project includes an i386 compatible core, which is progressing along. Lots of fun has been had sorting out various bugs. The goal is to incorporate multiple dissimilar (legacy) cores in the same system. Working out the details of an architecture call instruction.It seems i386 can be done, but making an i386 implementation in an FPGA that doesn't perform horribly seems like another matter.
I have thought that RISC-V is decent, although missing some of the more CISC-like features like scaled-indexed-addressing. I discovered a while ago I am a bit of a fan of the obscene bordering on the beautiful. Uglier cores are more interesting.By my current estimates, some of the limitations RISC-V imposes has around a 40% overhead in scalar performance.
Les messages affichés proviennent d'usenet.