Newsportal USENET - Re: Cost of handling misaligned access

On 2/6/2025 8:28 PM, MitchAlsup1 wrote:

On Thu, 6 Feb 2025 23:34:27 +0000, BGB wrote:

On 2/6/2025 2:36 PM, Terje Mathisen wrote:
Michael S wrote:
On Thu, 6 Feb 2025 17:47:30 +0100
Terje Mathisen <terje.mathisen@tmsw.no> wrote:

>
FWIW: The idea of running a CPU at 4+ GHz seems a bit much (IME, CPUs
tend to run excessively hot at these kinds of clock speeds; 3.2 to 3.6
seemingly more reasonable so that it "doesn't melt", or have thermal
throttling or stability issues).
>
Does the idea of using all 500 HP of your car give you similar
reservations ?!?
Then again CPU heat production is between quadratic and cubic
wrt frequency... it is k×V^2×f and we have to raise the voltage
in order to run at higher frequencies.

If one limits the CPU to around 3.6 GHz, they can run it at around 100% load without it going to much over 50C or so.
If one runs their CPU at 4 GHz, then under multi-threaded load, it may hit 70C or so, frequency starts jumping all over (as it tries to keep temperature under control), and sometimes the computer will crash.
So, for maybe a 15% reduction in clock-speed (and loss of "turbo boost" for single-threaded tasks), one gains the ability to max out the CPU load without stuff getting wonky and unstable.
Granted, this is less of an issue on my current PC (Zen+ based), but was basically necessary for my prior PC (running Piledriver), as it ran excessively hot and was unstable at its stock speed (4.2 GHz), but ran fine at 3.6 with no turbo.
My current CPU has a stock speed of 3.7 GHz, and seems to be stable at this speed (but, disabling turbo does still make it better behaved under load).
I am also running the RAM at a speed lower than advertised on the box as well, but more because it isn't stable if ran that fast (but, the box was apparently showing the speeds for the "XMP2" profile rather than the Base profile).
IIRC:
   Base: 2133
   XMP1: 2667
   XMP2: 3200
Runs stable at 2133 and 2667.
However 3200, was nowhere near an acceptable level of stability.

>
<snip>
>
>
A smaller pagefile still exists on the SSD, but mostly because Windows
is unhappy if there is no pagefile on 'C'. Don't generally want a
pagefile on an SSD though as it is worse for lifespan (but, it is 8GB,
which Windows accepts; with around 192GB each on the other drives, for ~
400GB of swap space).
I have not had a swap file on C since 1997!
I have a separate SATA drive for swap.
The only times I use the C drive for swap is the initial bootup and
configuration of the system. Afterwards, I install the swap drive,
call it S, and allow the system to sue 95% of it. Unstable when
using 100% of the space available.
I also have OS and applications on C
but all my files are on M or P;
so a reload does not damage any of my work files (just my time)

It trying to disable swap on C, windows gives a warning IIRC about how this may lead to instability and will disable the ability to make crash reports and would disable hibernation and similar.
However, reducing the size to 8GB seemingly still works...
With, the bulk of the swap space still on two other HDDs.
Activity is usually low, except when trying to build LLVM or similar, or Firefox randomly deciding to expand to an enormous size...
But, at this point I regularly need to deal with the hassle that (for reasons), my 'C' drive partition is only 320GB (cough, errm, in its earliest form, it was on a 320GB HDD; but Windows seemingly can't move or expand partitions on MBR drives, the rest of the 1TB drive now being two other partitions, and a small auto-created "Windows Recovery" partition from the original Windows 7 install; which was later upgraded to Win10 via Windows Update).
At the time, 320GB was enough to keep Windows happy, but now is a bit cramped (whenever I have the option, I install software onto other drives).
The drives' contents were mirrored onto new drives several times since the original drive (most recently, having gotten a new 1TB SSD as the prior 1TB SSD started failing).
Though, ironically, a few days ago I made a comment about wishing there was a way to limit the maximum allowed memory for processes, but then found a semi-legitimate use case for big processes:
   Running the 70B parameter version of DeepSeek locally.
Where, then one has like a single process eating 80GB of RAM...
Though, the 32B version is a little more practical here:
   Uses roughly 40GB of RAM (slightly less);
   Can generate ~ 1.0 token / second, vs around 0.12 tokens/second;
   Can still generate competent responses.
   The 1.5B, 7B, and 14B models fell short here...
   The smaller versions are a lot faster, but "not very smart".
Have observed that seemingly the smaller models are largely unable to understand my requests, but the 32B and 70B models were seemingly able to understand my requests. Still, one has to feed in a lot of clues to help them along though (and with anything "actually difficult" they still seem to fail pretty hard).
Task 1:
"Describe the optimal balance between callee saved and caller saved registers in the design of an ABI for an ISA with 64 usable general purpose registers and no floating point or vector registers, where the general purpose registers may also hold 64-bit floating point and vector values, along with 64 bit integer and pointer values, and the first 5 registers are reserved for use by the architecture. Assuming a codebase where much of the hot path is primarily in code involving a significant number of function calls. Assume that some functions in the hot path may have up to 150 local variables, and that functions may pass and return 128 bit SIMD vectors represented as register pairs."
32B and 70B: Multiple versions: Gives a description with similar properties to my existing XG2 ABI.
Smaller models, seem to wander off course.
Task 2:
"Come up with a list of instruction layouts as bit patterns for a 64-bit RISC style ISA with a 32 bit instruction word, 64 general purpose, no floating point or vector registers. Opcode should be at least 7 bits. ALU immediate and load/store displacements should be 10 bits. Conditional branches also use a 10 bit displacement, and unconditional branches use a 23 bit displacement. All branches are PC relative, with an instruction word as their scale. Any unused bits are to be left as function or sub-opcode bits."
1.5B: Total Fail.
Earlier (less verbose attempt), got confused, started talking about table-top RPGs and D&D for some reason.
7B: Total Fail.
Gives either something resembling marketing material, or describes something more akin to a 6502 or Z80.
14B: Partial Fail. Failed to realize 64 GPRs needs 6 bits, so basically rehashed a broken version of RISC-V (with 5-bit register fields).
32B: Mostly works, though the layouts still kinda sucked. Though, the question was written in a way that addressed some of the confusion of the 32B model.
70B: Demerit. It answered the question, but chose to ignore parts of the original request and substitute its own immediate-field sizes, giving something more like RISC-V just with 6 bit register fields.
Task 3:
"Come up with a state transition table for a state machine with 5 bits of current state, 1 bit of input, generating the next state and an output bit that predicts the next input bit in a repeating pattern of between 1 and 4 bits. The state will be initialized to a random value and the state machine should be able to match the repeating bit pattern ideally within 12 bits of input."
Status:
All models failed to give a satisfactory answer. If given some hints, the larger ones could figure out how to match up to 3 bits. The outputs indicated that they both didn't know the answer and were also underestimating the difficulty (once one tries to fit 4-bit patterns in, it gets a lot worse due to the limited number of states; IOW: it is not as easy as simply filling out a table).
Granted, this is a more difficult problem, it might be asking too much.
Task 4:
"OK. How about: Given two 8-bit colors composed of a 4 bit brightness in the low 4 bits, and a 4 bit RGBI value in the high 4 bits, where the Intensity bit (bit 7) is repurposed as a saturation level (0=full saturation, 1=low saturation; so colors 0x1 through 0x6 are high saturation; 0x9 through 0xE are low saturation), and 0x0=grayscale, 0x7=orange (255, 170, 85), 0x8=azure(85,170,255), 0xF=olive(170,255,85). For high saturation colors, the RGB (R is bit 6, G is bit 5, and B is bit 4) bits select between 85 and 255 (where 0 selects the minimum and 1 the maximum). For low saturation, the bits select between 170 and 255. Construct a C function to blend these two colors (and not using any third party libraries), producing a new 8 bit color reflecting the weighted sum of the two colors vectors (using relative brightness as the weighting factor), and using the larger of the two brightness values for the brightness of the result. The RGBI color represents a color vector that is to be scaled by the brightness value, with the normalized form of the resulting color vector being used to select the best matching color within the RGBI space."
Larger models:
Partial success, but they seemed to become confused about how to interpret the color vectors pr blend the colors. The above reflects a task that comes up with how I sometimes represented colored lighting in some of my 3D engines. Generated response involved using loops to try to match the best color (works I guess).
Didn't test all the models, and this one seemed to take an unusually long time (over 10 minutes with the online version, around 3.5 hours with the 32B model running locally).
I don't feel my role is particularly threatened just yet, but this is some of the better results I have seen from an LLMs (more so that one can run locally).
Also, to get a desired answer, seemingly one has to be fairly verbose and leave relatively little up to interpretation.
Could almost set up a server to run it, except that I don't really have any spare computers that are "sufficiently beefy" to run the 32B or 70B models, and the 14B models and smaller are dumb enough to not really be worth the hassle.
At non-technical tasks, it gave better results. Tried asking it to describe a fantasy scene involving rap-battling elves and some other stuff, actually did pretty well.
Also kinda pointed out some weaknesses in my own writing, mostly in the area of being lazy with naming characters or describing scenery (which it did readily). Seemingly also understands how things like rhyme structures and similar work, ... Nevermind if I was vague so it didn't quite match what I was going for.
Well, errm, I might be in fear of "my job" if my thing were trying to be a professional fantasy author...
In the corresponding "actual story" did then end up going around some and trying to name characters and describe stuff more. Had decided to try something different, and write something more in the fantasy genre, but going the atypical direction of taking a high-fantasy setting and gluing some urban fantasy type stuff onto it (say, if you had a fantasy world that had started to industrialize rather than remain permanently medieval).
Well, as opposed to normal urban fantasy, which is usually "take a modern setting and add werewolves and vampires" or similar...
So, kinda the inverse, the elves and wizards and similar are out in the open and living in buildings (where, say, you have gnome shop keepers, elf rap musicians, etc...). I guess, if anything, almost more like the Discworld setting; and not intended to be taken all that seriously.
...

Not sure how well Windows load-balances swap, apparently not very well
though (when it starts paging, most of the load seems to be on one
drive; better if it could give a more even spread).
>
The SSD seems to get ~ 300 MB/sec.
>

Actually, both my old and new SSD get a max of 300MB/s... They are fairly consistent about this part.
Realistically, though:
Copy of a large file between two drives is limited to the smaller max speed of either drives;
Copying files on the same drive is slower than between drives;
"Lots of small" files are typically several orders of magnitude slower (copying lots of C source-code files is often kB/sec territory, *).
*: Though, likely more because of Defender and AVG getting all up in this crap than anything to do with the drives proper...
Sadly, whatever intrinsic speed the drives may have is mostly rendered moot when lots of small files is involved (where the speed of Defender and AVG become the dominant factors).
For many of these cases, overhead is more likely a bigger limiting factor than IO (with raw IO speed only really mattering for large files in this case).

....

Date	Sujet	#	Auteur
2 Feb 25	Re: Cost of handling misaligned access	112	BGB
3 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	109	Anton Ertl
3 Feb 25	Re: Cost of handling misaligned access	11	BGB
3 Feb 25	Re: Cost of handling misaligned access	10	Anton Ertl
3 Feb 25	Re: Cost of handling misaligned access	1	BGB
3 Feb 25	Re: Cost of handling misaligned access	8	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	7	Anton Ertl
4 Feb 25	Re: Cost of handling misaligned access	5	Thomas Koenig
4 Feb 25	Re: Cost of handling misaligned access	4	Anton Ertl
4 Feb 25	Re: Cost of handling misaligned access	2	Thomas Koenig
10 Feb 25	Re: Cost of handling misaligned access	1	Mike Stump
10 Feb 25	Re: Cost of handling misaligned access	1	Mike Stump
4 Feb 25	Re: Cost of handling misaligned access	1	MitchAlsup1
3 Feb 25	Re: Cost of handling misaligned access	3	Thomas Koenig
3 Feb 25	Re: Cost of handling misaligned access	2	BGB
3 Feb 25	Re: Cost of handling misaligned access	1	MitchAlsup1
4 Feb 25	Re: Cost of handling misaligned access	41	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	40	Terje Mathisen
5 Feb 25	Re: Cost of handling misaligned access	4	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	2	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
5 Feb 25	Re: Cost of handling misaligned access	35	Michael S
6 Feb 25	Re: Cost of handling misaligned access	32	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	31	Michael S
6 Feb 25	Re: Cost of handling misaligned access	2	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Michael S
6 Feb 25	Re: Cost of handling misaligned access	28	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	27	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	26	Michael S
6 Feb 25	Re: Cost of handling misaligned access	25	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	19	Michael S
7 Feb 25	Re: Cost of handling misaligned access	18	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	17	Michael S
7 Feb 25	Re: Cost of handling misaligned access	16	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	15	Michael S
7 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
7 Feb 25	Re: Cost of handling misaligned access	3	MitchAlsup1
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
8 Feb 25	Re: Cost of handling misaligned access	10	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
8 Feb 25	Re: Cost of handling misaligned access	6	Michael S
8 Feb 25	Re: Cost of handling misaligned access	5	Anton Ertl
8 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	2	Michael S
11 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
9 Feb 25	Re: Cost of handling misaligned access	1	Michael S
10 Feb 25	Re: Cost of handling misaligned access	1	Michael S
7 Feb 25	Re: Cost of handling misaligned access	5	BGB
7 Feb 25	Re: Cost of handling misaligned access	4	MitchAlsup1
7 Feb 25	Re: Cost of handling misaligned access	3	BGB
8 Feb 25	Re: Cost of handling misaligned access	2	Anssi Saari
8 Feb 25	Re: Cost of handling misaligned access	1	BGB
6 Feb 25	Re: Cost of handling misaligned access	2	Terje Mathisen
6 Feb 25	Re: Cost of handling misaligned access	1	Michael S
6 Feb 25	Re: Cost of handling misaligned access	5	Waldek Hebisch
6 Feb 25	Re: Cost of handling misaligned access	3	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	2	Waldek Hebisch
6 Feb 25	Re: Cost of handling misaligned access	1	Anton Ertl
6 Feb 25	Re: Cost of handling misaligned access	1	Terje Mathisen
13 Feb 25	Re: Cost of handling misaligned access	48	Marcus
13 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
14 Feb 25	Re: Cost of handling misaligned access	41	BGB
14 Feb 25	Re: Cost of handling misaligned access	40	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	39	BGB
18 Feb 25	Re: Cost of handling misaligned access	33	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	1	BGB
18 Feb 25	Re: Cost of handling misaligned access	31	Michael S
18 Feb 25	Re: Cost of handling misaligned access	1	Thomas Koenig
18 Feb 25	Re: Cost of handling misaligned access	26	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	25	Terje Mathisen
18 Feb 25	Re: Cost of handling misaligned access	24	MitchAlsup1
19 Feb 25	Re: Cost of handling misaligned access	23	Terje Mathisen
19 Feb 25	Re: Cost of handling misaligned access	22	MitchAlsup1
19 Feb 25	Re: Cost of handling misaligned access	21	BGB
20 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
20 Feb 25	Re: Cost of handling misaligned access	5	MitchAlsup1
20 Feb 25	Re: Cost of handling misaligned access	2	BGB
20 Feb 25	Re: Cost of handling misaligned access	1	BGB
21 Feb 25	Re: Cost of handling misaligned access	2	Robert Finch
21 Feb 25	Re: Cost of handling misaligned access	1	BGB
21 Feb 25	Re: Cost of handling misaligned access	14	BGB
22 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
22 Feb 25	Re: Cost of handling misaligned access	12	Robert Finch
23 Feb 25	Re: Cost of handling misaligned access	10	BGB
23 Feb 25	Re: Cost of handling misaligned access	9	Michael S
24 Feb 25	Re: Cost of handling misaligned access	1	BGB
24 Feb 25	Re: Cost of handling misaligned access	7	Michael S
24 Feb 25	Re: Cost of handling misaligned access	4	Robert Finch
24 Feb 25	Re: Cost of handling misaligned access	1	BGB
24 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
25 Feb 25	Re: Cost of handling misaligned access	1	BGB
25 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
25 Feb 25	Re: Cost of handling misaligned access	1	BGB
23 Feb 25	Re: Cost of handling misaligned access	1	Robert Finch
18 Feb 25	Re: Cost of handling misaligned access	3	BGB
19 Feb 25	Re: Cost of handling misaligned access	2	MitchAlsup1
18 Feb 25	Re: Cost of handling misaligned access	5	Robert Finch
17 Feb 25	Re: Cost of handling misaligned access	5	Terje Mathisen