Newsportal USENET - Re: Cost of handling misaligned access

On 2/24/2025 1:52 PM, Robert Finch wrote:

On 2025-02-24 12:28 p.m., Michael S wrote:
On Mon, 24 Feb 2025 11:52:38 -0500
EricP <ThatWouldBeTelling@thevillage.com> wrote:
>
Michael S wrote:
On Sun, 23 Feb 2025 11:13:53 -0500
EricP <ThatWouldBeTelling@thevillage.com> wrote:
It looks to me that Vivado intends that after you get your basic
design working, this module optimization is *exactly* what one is
supposed to do.
>
In this case the prototype design establishes that you need
multiple 64-bit adders and the generic ones synthesis spits out
are slow. So you isolate that module off, use Verilog to drive the
basic LE selections, then iterate doing relative LE placement
specifiers, route the module, and when you get the fastest 64-bit
adder you can then lock down the netlist and save the module
design.
>
Now you have a plug-in 64-bit adder module that runs at (I don't
know the speed difference between Virtex and your Spartan-7 so
wild guess) oh, say, 4 ns, to use multiple places... fetch,
decode, alu, agu.
>
Then plug that into your ALU, add in SUB, AND, OR, XOR, functions,
isolate that module, optimize placement, route, lock down netlist,
and now you have a 5 ns plug-in ALU module.
>
Doing this you build up your own IP library of optimized hardware
modules.
>
As more and more modules are optimized the system synthesis gets
faster because much of the fine grain work and routing is already
done.
>
>
It sounds like your 1st hand FPGA design experience is VERY
outdated.
>
Never have, likely never will.
Nothing against them - looks easier than wire-wrapping TTL and 4000
CMOS. Though people do seem to spend an awful lot of time working
around certain deficiencies like the lack of >1 write ports on
register files, and the lack of CAM's. One would think market forces
would induce at least one supplier to add these and take the fpga
market by storm.
>
>
Your view is probably skewed by talking to soft core hobbyists.
Please realize that most professionals do not care about
high-performance soft core. Soft core is for control plane functions
rather than for data plane. Important features are ease of use,
reliability, esp. of software tools and small size. Performance is
rated low. Performance per clock is rated even lower. So, professional
do not develop soft cores by themselves. And OTS cores that they use
are not superscalar. Quite often not even fully pipelined.
It means, no, small SRAM banks with two independent write ports is not
a feature that FPGA pros would be excited about.
>
Also fpga's do seem prone to monopolistic locked-in pricing
(though not really different from any relational database vendor).
>
Cheap Chinese clones of X&A FPGAs from late 2000s and very early 2010s
certainly exist. I didn't encounter Chinese clones of slightly newer
devices, like Xilinx 7-series. But I didn't look hard for them. So,
wouldn't be surprised if they exist, too.
Right now, and almost full decade back, neither X nor A cares about low
end. They just continue to ship old chips, mostly charging old price or
rising a little.
>
At least with TTL one could do an RFQ to 5 or 10 different suppliers.
>
I'm just trying to figure out what these other folks are doing to get
bleeding edge performance from essentially the same tools and similar
chips.
>
I assume you are referring to the gui IDE interface for things like
floor planning where you click on a LE cells and set some attributes.
I also think I saw reference to locking down parts of the net list.
But there are a lot of documents to go through.
>
>
No, I mean florplanning, as well as most other manual physical-level
optimization are not used at all in 99% percents of FPGA designs that
started after year 2005.
>
Respecting I do not know that much about the work environment of FPGA developers:
I have thought of FPGAs as more of a prototyping tool, or to be used in one-off designs, proof-of-concept type things. In those cases one probably does not care too much about manual operations, as was said one would be more interested in productivity of developers that comes from reliable tools and being able to deal with things at a high level.

It is likely not worth it to invest effort into things that one effectively can't distribute either.
Vivado projects seem to contain lots of absolute paths, and so you can't just upload them to github or similar and expect anyone to be able to make much use of them unless they also cloned the original PC's drives and directory structure...
Also, as noted, having a design that isn't tied down to a specific device (or FPGA vendor) is useful as well.
If people can use whatever hardware they have, that could be an advantage.
Though, I am left with the issue that I would need to rework how my register files work to match the patterns needed for Altera FPGAs.

The vendor’s have a number of pre-made components that can be plugged into a design making it possible to sketch out a design very quickly with a couple of caveats. One being one might be stuck to a particular vendor.

Yeah.
I personally steer clear of these.
They tend to give you some sort of opaque binary blob along with a Verilog wrapper stub that can mostly be used in a similar way to a header.
If used, pretty much every one of these would need redundant fall-back code to be able to remain portable.
Like, contrary to "recommendation" I didn't use MIG for the DDR RAM modules. I am running the RAM in a non-standard way, and not as fast as it could be, but it works.
Granted, it is debatable whether it would have been easier to deal with the DDR RAM chip or to deal with the AXI bus or similar. In theory, could also switch over to using SERDES, but hasn't seemed "worth it" yet.
In theory, I could also try to develop a faster memory bus. But, it seems that, proportional to CPU clock speed, my memory bandwidth is already "monstrously fast" vs early 2000s systems (like, the 2000s being apparently an era of Fast-CPU, slow RAM).
Vs, say, slow CPU, fast RAM, if one actually uses the RAM at its intended speed.
A also mostly use Verilator for simulation, which does limit things.
Vivado does have a built-in simulator, but it only gives signal waveforms, which is not so useful.
If their simulator allowed, say, plugging in a virtual VGA monitor and PS2 keyboard, or hooking up a bus interface to scripted devices for bench-testing, would be more interested.
Then thought about it, and imagined a sort of wonky hybrid of Verilog and JavaScript. Could almost be more usable than writing them in C++ (what Verilator demands).

CAMs can easily be implemented in FPGAs although they may have multi- cycle latency. One has only to research CAM implementation in FPGAs. Register files with multiple ports are easily implemented with replication. It may be nice to see a CAM component in a vendor library. Register files sometimes have bypassing requirements that might make it challenging to develop a generic component.

CAMs exist in higher-end FPGAs.
But, yeah, building a register file from 1R1W LUTRAM arrays or similar is doable, if annoying.
One ends up with NR * NW copies of the register arrays, but this is seemingly unavoidable (as is the NR*NW*NE pattern for the register forwarding).
Don't want to extend NW or NE beyond 3, as this makes cost too steep.
One idea I had floated was the idea of expanding EX to 6 stages, but only allowing forwarding from 2/3/5.
So, one would have pipelined latency of:
2, 3, 5, and 7 cycles.
But, some ugly gotcha cases would probably ruin this.
Say, if a bundle sitting in RF depends on two instructions that finish on different clock-cycles, it may end up needing to wait for the worst case timing to proceed (say, both instructions to pass through WB).
Pseudo-forwarding from 4 and 6 could avoid this, but would still likely cost about as much as "real" forwarding from these stages (though, might be cheaper if held closer to the register file).
Main motivation for more EX stages is mostly to allow pipelining the Binary64 FPU operations.
Though, could possibly instead leave the existing forwarding as-is (1/2/3), but hack Lane 1 to allow some additional stages (with delayed write-back), but any operation that has not retired by the end of EX3 needs to wait the full 6 EX cycles (so, 1/2/3/6).
With, Lane 1 having pseudo-forwarding for 4/5.
Though, potentially, the increase in branch latency could have a roughly 2-3% performance penalty (might cost more than the gains from pipelined FMUL and FADD). But, could maybe sidestep this by only having the longer branch latency if there is still an in-flight instruction.

If one is concerned about performance of a one-off, simply buy a chip with double the performance. That would probably be a lot less expensive than implementing everything manually.
In the past I have managed to purchase FPGA boards with a higher-speed grade or higher capacity part for additional $$$, as long as the footprint was the same. It does not hurt to ask, as it indicates demand.
If it is going to be a high-volume design, it may be implemented again as custom logic.
From a hobbyist perspective, being able to go down to micro-detail is great. It is possible to get significantly better performance that way, when it is not possible to obtain a better chip.

I was more going for local optimum, which kinda led me to where I was...
At 50MHz, one can basically use nearly the whole FPGA for a single large core on the XC7A100T and smaller.
For the XC7A200T, going much bigger would require dropping to 33 or 25.
But, also makes it viable to have 64K L1 caches.
But, 64K of L1 doesn't compensate for the drop in clock-speed in this case.
Scaling down, clock-for-clock performance gets worse almost faster than MHz increases.
Granted, if the code takes 20% more cycles, but runs at a 50% faster clock-speed, it may still be ahead.
But, yeah, this is within the limit of staying high-level enough that the Verilog remains portable.

Date	Sujet	#	Auteur
2 Feb 25	Re: Cost of handling misaligned access	112	BGB
3 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	109	Anton Ertl
3 Feb 25	Re: Cost of handling misaligned access	11	BGB
3 Feb 25	Re: Cost of handling misaligned access	10	Anton Ertl
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	8	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	7	Anton Ertl
4 Feb 25	Re: Cost of handling misaligned access	5	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	4	Anton Ertl
4 Feb 25	Re: Cost of handling misaligned access	2	Thomas Koenig
10 Feb 25	Re: Cost of handling misaligned access	1	Mike Stump
10 Feb 25	Re: Cost of handling misaligned access	1	Mike Stump
4 Feb 25	Re: Cost of handling misaligned access	1	MitchAlsup1
3 Feb 25	Re: Cost of handling misaligned access	3	Thomas Koenig
3 Feb 25	Re: Cost of handling misaligned access	2	BGB
3 Feb 25	Re: Cost of handling misaligned access	1	MitchAlsup1
4 Feb 25	Re: Cost of handling misaligned access	41	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	40	Terje Mathisen
5 Feb 25	Re: Cost of handling misaligned access	4	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	2	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	35	Michael S
6 Feb 25	Re: Cost of handling misaligned access	32	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	31	Michael S
6 Feb 25	Re: Cost of handling misaligned access	2	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Michael S
6 Feb 25	Re: Cost of handling misaligned access	28	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	27	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	26	Michael S
6 Feb 25	Re: Cost of handling misaligned access	25	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	19	Michael S
7 Feb 25	Re: Cost of handling misaligned access	18	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	17	Michael S
7 Feb 25	Re: Cost of handling misaligned access	16	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	15	Michael S
7 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	3	MitchAlsup1
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
8 Feb 25	Re: Cost of handling misaligned access	10	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	6	Michael S
8 Feb 25	Re: Cost of handling misaligned access	5	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	2	Michael S
11 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
10 Feb 25	Re: Cost of handling misaligned access	1	Michael S
7 Feb 25	Re: Cost of handling misaligned access	5	BGB
7 Feb 25	Re: Cost of handling misaligned access	4	MitchAlsup1
7 Feb 25	Re: Cost of handling misaligned access	3	BGB
8 Feb 25	Re: Cost of handling misaligned access	2	Anssi Saari
8 Feb 25	Re: Cost of handling misaligned access	1	BGB
6 Feb 25	Re: Cost of handling misaligned access	2	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	1	Michael S
6 Feb 25	Re: Cost of handling misaligned access	5	Waldek Hebisch
6 Feb 25	Re: Cost of handling misaligned access	3	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	2	Waldek Hebisch
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
13 Feb 25	Re: Cost of handling misaligned access	48	Marcus
13 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
14 Feb 25	Re: Cost of handling misaligned access	41	BGB
14 Feb 25	Re: Cost of handling misaligned access	40	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	39	BGB
18 Feb 25	Re: Cost of handling misaligned access	33	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	1	BGB
18 Feb 25	Re: Cost of handling misaligned access	31	Michael S
18 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
18 Feb 25	Re: Cost of handling misaligned access	26	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	25	Terje Mathisen
18 Feb 25	Re: Cost of handling misaligned access	24	MitchAlsup1
19 Feb 25	Re: Cost of handling misaligned access	23	Terje Mathisen
19 Feb 25	Re: Cost of handling misaligned access	22	MitchAlsup1
19 Feb 25	Re: Cost of handling misaligned access	21	BGB
20 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
20 Feb 25	Re: Cost of handling misaligned access	5	MitchAlsup1
20 Feb 25	Re: Cost of handling misaligned access	2	BGB
20 Feb 25	Re: Cost of handling misaligned access	1	BGB
21 Feb 25	Re: Cost of handling misaligned access	2	Robert Finch
21 Feb 25	Re: Cost of handling misaligned access	1	BGB
21 Feb 25	Re: Cost of handling misaligned access	14	BGB
22 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
22 Feb 25	Re: Cost of handling misaligned access	12	Robert Finch
23 Feb 25	Re: Cost of handling misaligned access	10	BGB
23 Feb 25	Re: Cost of handling misaligned access	9	Michael S
24 Feb 25	Re: Cost of handling misaligned access	1	BGB
24 Feb 25	Re: Cost of handling misaligned access	7	Michael S
24 Feb 25	Re: Cost of handling misaligned access	4	Robert Finch
24 Feb 25	Re: Cost of handling misaligned access	1	BGB
24 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
25 Feb 25	Re: Cost of handling misaligned access	1	BGB
25 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
25 Feb 25	Re: Cost of handling misaligned access	1	BGB
23 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
18 Feb 25	Re: Cost of handling misaligned access	3	BGB
19 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	5	Robert Finch
17 Feb 25	Re: Cost of handling misaligned access	5	Terje Mathisen