Re: Cost of handling misaligned access

Liste des GroupesRevenir à c arch 
Sujet : Re: Cost of handling misaligned access
De : cr88192 (at) *nospam* gmail.com (BGB)
Groupes : comp.arch
Date : 21. Feb 2025, 22:47:50
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vpasaa$3itge$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
User-Agent : Mozilla Thunderbird
On 2/21/2025 1:51 PM, EricP wrote:
BGB wrote:
>
Can note that the latency of carry-select adders is a little weird:
  16/32/64: Latency goes up steadily;
    But, still less than linear;
  128-bit: Only slightly more latency than 64-bit.
>
The best I could find in past testing was seemingly 16-bit chunks for normal adding. Where, 16-bits seemed to be around the break-even between the chained CARRY4's and the Carry-Select (CS being slower below 16 bits).
>
But, for a 64-bit adder, still basically need to give it a clock-cycle to do its thing. Though, not like 32 is particularly fast either; hence part of the whole 2 cycle latency on ALU ops thing. Mostly has to do with ADD/SUB (and CMP, which is based on SUB).
>
>
Admittedly part of why I have such mixed feelings on full compare-and- branch:
  Pro: It can offer a performance advantage (in terms of per-clock);
  Con: Branch is now beholden to the latency of a Subtract.
 IIRC your cpu clock speed is about 75 MHz (13.3 ns)
and you are saying it takes 2 clocks for a 64-bit ADD.
 
The 75MHz was mostly experimental, mostly I am running at 50MHz because it is easier (a whole lot of corners need to be cut for 75MHz, so often overall performance ended up being worse).
Via the main ALU, which also shares the logic for SUB and CMP and similar...
Generally, I give more or less a full cycle for the ADD to do its thing, with the result presented to the outside world on the second cycle, where it can go through the register forwarding chains and similar.
This gives it a 2 cycle latency.
Operations with a 1 cycle latency need to feed their output directly into the register forwarding logic.
In a pseudocode sense, something like:
   tValB = IsSUB ? ~valB : valB;
   tAddA0={ 1'b0, valA[15:0] } + { 1'b0, tValB[15:0] } + 0;
   tAddA1={ 1'b0, valA[15:0] } + { 1'b0, tValB[15:0] } + 1;
   tAddB0={ 1'b0, valA[31:16] } + { 1'b0, tValB[31:16] } + 0;
   tAddB1={ 1'b0, valA[31:16] } + { 1'b0, tValB[31:16] } + 1;
   tAddC0=...
   ...
   tAddSbA = tCarryIn;
   tAddSbB = tAddSbA ? tAddA1[16] : tAddA0[16];
   tAddSbC = tAddSbB ? tAddB1[16] : tAddB0[16];
   ...
   tAddRes = {
      tAddSbD ? tAddD1[15:0] : tAddD0[15:0],
      tAddSbC ? tAddC1[15:0] : tAddC0[15:0],
      tAddSbB ? tAddB1[15:0] : tAddB0[15:0],
      tAddSbA ? tAddA1[15:0] : tAddA0[15:0]
   };
This works, but still need to ideally give it a full clock-cycle to do its work.
Note that one has to be careful with logic coupling, as if too many things are tied together, one may get a "routing congestion" warning message, and generally timing fails in this case...
Also, "inferring latch" warning is one of those "you really gotta go fix this" issues (both generally indicates Verilog bugs, and also negatively effects timing).

I don't remember what Xilinx chip you are using but this paper describes
how to do a 64-bit ADD at between 350 Mhz (2.8 ns) to 400 MHz (2.5 ns)
on a Virtex-5:
 A Fast Carry Chain Adder for Virtex-5 FPGAs, 2010
https://scholar.archive.org/work/tz6fy2zm4fcobc6k7khsbwskh4/access/ wayback/http://ece.gmu.edu:80/coursewebpages/ECE/ECE645/S11/projects/ project_1_resources/Adders_MELECON_2010.pdf
 
As for Virtex: I am not made of money...
Virtex tends to be absurdly expensive high-end FPGAs.
   Even the older Virtex chips are still absurdly expensive.
Kintex is considered mid range, but still too expensive, and mostly not usable in the free versions of Vivado (and there are no real viable FOSS alternatives to Vivado). When I tried looking at some of the "open source" tools for targeting Xilinx chips, they were doing the hacky thing of basically invoking Xilinx's tools in the background (which, if used to target a Kintex, is essentially piracy).
Where, a valid FOSS tool would need to be able to do everything and generate the bitstream itself.
Mostly I am using Spartan-7 and Artix-7.
   Generally at the -1 speed grade (slowest, but cheapest).
These are mostly considered low-end and consumer-electronics oriented FPGAs by Xilinx.
Or, by "car analogies":
You can't expect a "VW Jetta" to perform like a "Ferrari Enzo" even if the "Jetta" is a newer model year...
Cheapest FPGA dev-boards I have gotten a (minimal) BJX2 core onto were around $70 (XC7S25). Most expensive dev-board I have is the Nexys A7 (XC7A100T), but it has gone up in price (IIRC, it was around $290 at the time; right now seems like $350, but was IIRC a bit more in 2021/2022).
There was the temptation to get a "Nexys Video", which an XC7A200T-2, but, very expensive (around $600 IIRC). However, this chip *could* pass 75MHz a bit more easily (though, still not enough to easily reach 100MHz).
I have a QMTech board with an XC7A200T at -1, but generally, it seems to actually have a slightly harder time passing timing constraints than the XC7A100T in the Nexys A7 (possibly some sort of Vivado magic here).
Generally, also hard to find FPGA boards much under $100 that "aren't crap".
A lot of the ICE40 boards fail on both, often not really any cheaper than XC7S25 or XC7A35T based boards, but much worse in comparison.
There was rumor of cheaper boards (in the $30-$50 range), but not seen anything that seems worth bothering with in this range.
Had noted that I could get higher clock speeds (or "fmax" in their terms) on Intel/Altera chips (according to Quartus), but didn't buy any of these:
The DE10 was expensive, and at the time, even a less feature-rich version of the BJX2 core basically ate the entire resource budget of the DE10 (but, IIRC, was otherwise an fmax of around 85MHz or something).
IIRC, something had seemingly gone horribly wrong with attempts to use LUTRAM arrays, and it was needing to fall back to trying to make them out of Flip-Flops...
IIRC, it was something like they didn't have LUTRAM's in the same sort as Xilinx, but rather smaller and larger Block RAMs; and, possibly, the register file would need to be reworked to fit BRAM-like access patterns (namely, that reads are only performed on a clock-edge rather than combinatorial). Granted, could be done in theory, but means I would need to feed in the register-port inputs on the ID1/ID2 edge, rather than in the ID2 stage itself.
I didn't really mess with it much at the time to figure it out...

and this does 64-bit ADD up to 428 MHz (2.3 ns) on a Virtex-6:
 Fast and Area Efficient Adder for Wide Data in Recent Xilinx FPGAs, 2016
http://www.diva-portal.org/smash/get/diva2:967655/FULLTEXT02.pdf
 
Errm, skim, this doesn't really look like something you can pull off in normal Verilog.
Generally, one doesn't control over how the components hook together, only one can influence what happens based on how they write their Verilog.
You can just write:
   reg[63:0] tValA;
   reg[63:0] tValB;
   reg[63:0] tValC;
   tValC=tValA+tValB;
But, then it spits out something with a chain of 16 CARRY4's, so there is a fairly high latency on the high order bits of the result.
Generally, Vivado synthesis seems to mostly be happy (at 50 MHz), if the total logic path length stays under around 12 or so. Paths with 15 or more are often near the edge of failing timing.
At 75MHz, one has to battle with pretty much anything much over 8.
And, at 200MHz, you have have path lengths of 2 that are failing...
Like, it seemingly can't do much more than "FF -> LUT -> FF" at these speeds.

Date Sujet#  Auteur
2 Feb 25 * Re: Cost of handling misaligned access112BGB
3 Feb 25 +* Re: Cost of handling misaligned access2MitchAlsup1
3 Feb 25 i`- Re: Cost of handling misaligned access1BGB
3 Feb 25 `* Re: Cost of handling misaligned access109Anton Ertl
3 Feb 25  +* Re: Cost of handling misaligned access11BGB
3 Feb 25  i`* Re: Cost of handling misaligned access10Anton Ertl
3 Feb 25  i +- Re: Cost of handling misaligned access1BGB
3 Feb 25  i `* Re: Cost of handling misaligned access8Thomas Koenig
4 Feb 25  i  `* Re: Cost of handling misaligned access7Anton Ertl
4 Feb 25  i   +* Re: Cost of handling misaligned access5Thomas Koenig
4 Feb 25  i   i`* Re: Cost of handling misaligned access4Anton Ertl
4 Feb 25  i   i +* Re: Cost of handling misaligned access2Thomas Koenig
10 Feb 25  i   i i`- Re: Cost of handling misaligned access1Mike Stump
10 Feb 25  i   i `- Re: Cost of handling misaligned access1Mike Stump
4 Feb 25  i   `- Re: Cost of handling misaligned access1MitchAlsup1
3 Feb 25  +* Re: Cost of handling misaligned access3Thomas Koenig
3 Feb 25  i`* Re: Cost of handling misaligned access2BGB
3 Feb 25  i `- Re: Cost of handling misaligned access1MitchAlsup1
4 Feb 25  +* Re: Cost of handling misaligned access41Anton Ertl
5 Feb 25  i`* Re: Cost of handling misaligned access40Terje Mathisen
5 Feb 25  i +* Re: Cost of handling misaligned access4Anton Ertl
5 Feb 25  i i+* Re: Cost of handling misaligned access2Terje Mathisen
6 Feb 25  i ii`- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25  i i`- Re: Cost of handling misaligned access1Anton Ertl
5 Feb 25  i `* Re: Cost of handling misaligned access35Michael S
6 Feb 25  i  +* Re: Cost of handling misaligned access32Anton Ertl
6 Feb 25  i  i`* Re: Cost of handling misaligned access31Michael S
6 Feb 25  i  i +* Re: Cost of handling misaligned access2Anton Ertl
6 Feb 25  i  i i`- Re: Cost of handling misaligned access1Michael S
6 Feb 25  i  i `* Re: Cost of handling misaligned access28Terje Mathisen
6 Feb 25  i  i  `* Re: Cost of handling misaligned access27Terje Mathisen
6 Feb 25  i  i   `* Re: Cost of handling misaligned access26Michael S
6 Feb 25  i  i    `* Re: Cost of handling misaligned access25Terje Mathisen
6 Feb 25  i  i     +* Re: Cost of handling misaligned access19Michael S
7 Feb 25  i  i     i`* Re: Cost of handling misaligned access18Terje Mathisen
7 Feb 25  i  i     i `* Re: Cost of handling misaligned access17Michael S
7 Feb 25  i  i     i  `* Re: Cost of handling misaligned access16Terje Mathisen
7 Feb 25  i  i     i   `* Re: Cost of handling misaligned access15Michael S
7 Feb 25  i  i     i    +- Re: Cost of handling misaligned access1Terje Mathisen
7 Feb 25  i  i     i    +* Re: Cost of handling misaligned access3MitchAlsup1
8 Feb 25  i  i     i    i+- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25  i  i     i    i`- Re: Cost of handling misaligned access1Michael S
8 Feb 25  i  i     i    `* Re: Cost of handling misaligned access10Anton Ertl
8 Feb 25  i  i     i     +- Re: Cost of handling misaligned access1Terje Mathisen
8 Feb 25  i  i     i     +* Re: Cost of handling misaligned access6Michael S
8 Feb 25  i  i     i     i`* Re: Cost of handling misaligned access5Anton Ertl
8 Feb 25  i  i     i     i +- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     i +* Re: Cost of handling misaligned access2Michael S
11 Feb 25  i  i     i     i i`- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     i `- Re: Cost of handling misaligned access1Michael S
9 Feb 25  i  i     i     +- Re: Cost of handling misaligned access1Michael S
10 Feb 25  i  i     i     `- Re: Cost of handling misaligned access1Michael S
7 Feb 25  i  i     `* Re: Cost of handling misaligned access5BGB
7 Feb 25  i  i      `* Re: Cost of handling misaligned access4MitchAlsup1
7 Feb 25  i  i       `* Re: Cost of handling misaligned access3BGB
8 Feb 25  i  i        `* Re: Cost of handling misaligned access2Anssi Saari
8 Feb 25  i  i         `- Re: Cost of handling misaligned access1BGB
6 Feb 25  i  `* Re: Cost of handling misaligned access2Terje Mathisen
6 Feb 25  i   `- Re: Cost of handling misaligned access1Michael S
6 Feb 25  +* Re: Cost of handling misaligned access5Waldek Hebisch
6 Feb 25  i+* Re: Cost of handling misaligned access3Anton Ertl
6 Feb 25  ii`* Re: Cost of handling misaligned access2Waldek Hebisch
6 Feb 25  ii `- Re: Cost of handling misaligned access1Anton Ertl
6 Feb 25  i`- Re: Cost of handling misaligned access1Terje Mathisen
13 Feb 25  `* Re: Cost of handling misaligned access48Marcus
13 Feb 25   +- Re: Cost of handling misaligned access1Thomas Koenig
14 Feb 25   +* Re: Cost of handling misaligned access41BGB
14 Feb 25   i`* Re: Cost of handling misaligned access40MitchAlsup1
18 Feb 25   i `* Re: Cost of handling misaligned access39BGB
18 Feb 25   i  +* Re: Cost of handling misaligned access33MitchAlsup1
18 Feb 25   i  i+- Re: Cost of handling misaligned access1BGB
18 Feb 25   i  i`* Re: Cost of handling misaligned access31Michael S
18 Feb 25   i  i +- Re: Cost of handling misaligned access1Thomas Koenig
18 Feb 25   i  i +* Re: Cost of handling misaligned access26MitchAlsup1
18 Feb 25   i  i i`* Re: Cost of handling misaligned access25Terje Mathisen
18 Feb 25   i  i i `* Re: Cost of handling misaligned access24MitchAlsup1
19 Feb 25   i  i i  `* Re: Cost of handling misaligned access23Terje Mathisen
19 Feb 25   i  i i   `* Re: Cost of handling misaligned access22MitchAlsup1
19 Feb 25   i  i i    `* Re: Cost of handling misaligned access21BGB
20 Feb 25   i  i i     +- Re: Cost of handling misaligned access1Robert Finch
20 Feb 25   i  i i     +* Re: Cost of handling misaligned access5MitchAlsup1
20 Feb 25   i  i i     i+* Re: Cost of handling misaligned access2BGB
20 Feb 25   i  i i     ii`- Re: Cost of handling misaligned access1BGB
21 Feb 25   i  i i     i`* Re: Cost of handling misaligned access2Robert Finch
21 Feb 25   i  i i     i `- Re: Cost of handling misaligned access1BGB
21 Feb 25   i  i i     `* Re: Cost of handling misaligned access14BGB
22 Feb 25   i  i i      +- Re: Cost of handling misaligned access1Robert Finch
22 Feb 25   i  i i      `* Re: Cost of handling misaligned access12Robert Finch
23 Feb 25   i  i i       +* Re: Cost of handling misaligned access10BGB
23 Feb 25   i  i i       i`* Re: Cost of handling misaligned access9Michael S
24 Feb 25   i  i i       i +- Re: Cost of handling misaligned access1BGB
24 Feb 25   i  i i       i `* Re: Cost of handling misaligned access7Michael S
24 Feb 25   i  i i       i  +* Re: Cost of handling misaligned access4Robert Finch
24 Feb 25   i  i i       i  i+- Re: Cost of handling misaligned access1BGB
24 Feb 25   i  i i       i  i`* Re: Cost of handling misaligned access2MitchAlsup1
25 Feb 25   i  i i       i  i `- Re: Cost of handling misaligned access1BGB
25 Feb 25   i  i i       i  `* Re: Cost of handling misaligned access2MitchAlsup1
25 Feb 25   i  i i       i   `- Re: Cost of handling misaligned access1BGB
23 Feb 25   i  i i       `- Re: Cost of handling misaligned access1Robert Finch
18 Feb 25   i  i `* Re: Cost of handling misaligned access3BGB
19 Feb 25   i  i  `* Re: Cost of handling misaligned access2MitchAlsup1
18 Feb 25   i  `* Re: Cost of handling misaligned access5Robert Finch
17 Feb 25   `* Re: Cost of handling misaligned access5Terje Mathisen

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal