Liste des Groupes | Revenir à c arch |
>A fine rendition of why this should be in HW as an instruction.
This is because in some cases, the performance overhead of copying the
last (sz&31) bytes is significant, say:
rsz=cte-ct;
if(rsz)
{
if(rsz&16)
{
v0=((u64 *)cs)[0]; v1=((u64 *)cs)[1];
((u64 *)ct)[0]=v0; ((u64 *)ct)[1]=v1;
cs+=16; ct+=16;
}
if(rsz&8)
{
v0=((u64 *)cs)[0];
((u64 *)ct)[0]=v0;
cs+=8; ct+=8;
}
if(rsz&4)
{
v0=((u32 *)cs)[0];
((u32 *)ct)[0]=v0;
cs+=4; ct+=4;
}
if(rsz&2)
{
v0=((u16 *)cs)[0];
((u16 *)ct)[0]=v0;
cs+=2; ct+=2;
}
if(rsz&1)
{
v0=((byte *)cs)[0];
((byte *)ct)[0]=v0;
cs++; ct++;
}
}
>
For small copies with awkward sizes, this tailing part can cost more
than the whole rest of the copy.
Les messages affichés proviennent d'usenet.