Sujet : Re: else ladders practice
De : bc (at) *nospam* freeuk.com (Bart)
Groupes : comp.lang.cDate : 23. Nov 2024, 00:30:31
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <vhr46m$1cre9$1@dont-email.me>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
User-Agent : Mozilla Thunderbird
On 22/11/2024 19:29, Waldek Hebisch wrote:
Bart <bc@freeuk.com> wrote:
On 22/11/2024 12:33, Waldek Hebisch wrote:
But, OK, here's the first sizeable benchmark that I thought of (I can't
find a reliable Dhrystone one; perhaps you can post a link).
First Google hit for Dhrystone 2.2a
https://homepages.cwi.nl/~steven/dry.chttps://homepages.cwi.nl/~steven/dry.c
(I used this one).
There was no shortage of them, there were just too many. All seemed to need some Linux script to compile them, and all needed Linux anyway because only that has sys/times.h.
I eventually find one for Windows, and that goes to the other extreme and needs CL (MSVC) with these options:
cl /O2 /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_MBCS" /MD /W4 /Wp64 /Zi /TP /EHsc /Fa /c dhry264.c dhry_264.c
Plus it uses various ASM routines written MASM syntax. I was partway through getting it to work with my compiler, when I saw your post.
Your version is much simpler to get going, but still not straightforward because of 'gettimeofday', which is available via gcc, but is not exported by msvcrt, which is what tcc and my product use.
I changed it to use clock().
The results then are like this (I tried two sizes of matrix element):
uint32_t uint64_t
gcc -O0 2165 2180 msec
gcc -O3 282 470
tcc 2572 2509
cc 2165 2243
mcc -opt 720 720
The mcc product keeps some local variables in registers, a minor optimisation I will apply to cc in due course. It's not a priority, since usually it makes little difference on real applications. Only on benchmarks like this.
gcc -O3 seems to enable some SIMD instructions, but only for u32. With u64 elements, then gcc -O3 is only about 50% faster than my compiler.
If I try -march=native, then the 282 sometimes gets down to 235, and the 470 to 420.
(When functions like this were needed in my programs during 80s and 90s, I used inline assembly. Most code wasn't that critical.)
- most of code is portable, but for timing we need timer with
sufficient resolution, so I use Unix 'gettimeofday'.
Why? Just make the task take long enough.
BTW I also ported your program to my 'M' language. The timing however was about the same as mcc-opt.
The source is below if interested.
-------------------------------
type T=u32
proc inner_mul(ref[0:]T x, y, z, int xdeg, ydeg, zdeg, p) =
u64 ss
if ydeg<xdeg then
swap(x, y)
swap(xdeg, ydeg)
fi
xdeg min:=zdeg
ydeg min:=zdeg
for i in 0..xdeg do
ss:=z[i]
for j in 0..i do
ss +:=u64(x[i-j]) * u64(y[j])
od
z[i]:=ss rem p
od
for i in xdeg+1..ydeg do
ss:=z[i]
for j in 0..xdeg do
ss +:=u64(x[j]) * u64(y[i-j])
od
z[i]:=ss rem p
od
for i in ydeg+1..zdeg do
ss:=z[i]
for j in i-xdeg .. ydeg do
ss +:=u64(x[i-j]) * u64(y[j])
od
z[i]:=ss rem p
od
end
proc main=
[0:85]T x, y, z
int tv1, tv2
for i in x.bounds do
x[i]:=y[i]:=1
od
tv1:=clock()
to 100'000 do
for i in 0..168 do
z[i]:=1
od
inner_mul(&x,&y,&z, 84, 84, 168, 1'000'003)
od
tv2:=clock()
for i in 0..11 do
print z[i], $
od
println
println "Time:",tv2-tv1,"ms"
end