Sujet : Re: Fast division (was Re: Suggested method for returning a string from a C program?)
De : antispam (at) *nospam* fricas.org (Waldek Hebisch)
Groupes : comp.lang.cDate : 26. Mar 2025, 03:37:26
Autres entêtes
Organisation : To protect and to server
Message-ID : <vrvp94$3q1ah$1@paganini.bofh.team>
References : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
User-Agent : tin/2.6.2-20221225 ("Pittyvaich") (Linux/6.1.0-9-amd64 (x86_64))
Janis Papanagnou <janis_papanagnou+
ng@hotmail.com> wrote:
On 22.03.2025 15:07, Waldek Hebisch wrote:
Actually, to do fast division of N-bit number by fixed N-bit number
one need 2N-bit multiplication.
I just stumbled across your post and above sentence. Do you mean *one*
multiplication of 2N bit numbers? - Could you please explain that (by
an example, or could you provide a reference)?
One multiplications with 2N bit result + few other operations
like shifts and additions. Consider for example:
unsigned int
divv(unsigned int a) {
return a/1234567;
}
My gcc-12 at -O generates the following assembly code:
divv:
.LFB0:
.cfi_startproc
movl %edi, %eax
movl $3000869427, %edx
imulq %rdx, %rax
shrq $32, %rax
subl %eax, %edi
shrl %edi
addl %edi, %eax
shrl $20, %eax
ret
So 1 multiplication, 3 shifts, subtraction and addtion. When
more bits of multiplication are available one can use smaller
number of extra operations. For example, when I change
function above so that argument is 16-bit and divisor is
12345, the code is:
divv:
.LFB0:
.cfi_startproc
movzwl %di, %eax
imull $43489, %eax, %eax
shrl $29, %eax
ret
So just 1 multiplication and 1 shift.
The idea beside this is quite simple: instead of division we
multiply by reciprocal. Reciprocal is represented in fixed
point aritihemtic (so normally there is at least one shift to get
binary point in right place). Since divisor is fixed reciprocal can
be precomputed. This works best when there is enough accuracy,
otherwise one needs to add extra steps. Working out how
many bits of accuracy are needed is a bit tricky, in particular
by adding extra operations one can lower needed accuracy.
(The reason for my question is that for integer divisions of length N
an old DSP I used required besides shifts effectively N subtractions
to create the result and modulus; it didn't use any multiplications.)
Method above is good when you have fast and wide multiplier (compared
to your numbers). Also there is cost of precomputation,
to gain divisor must be fixed or at least change slowly.
If you have varying divisor then one can use Newton method
(IIUC this is what modern CPU-s use). Shifts and subtractions
are good when you do not have fast multiplier.
-- Waldek Hebisch