其他分享
首页 > 其他分享> > c – 运行时常量值的重复整数除法

c – 运行时常量值的重复整数除法

作者:互联网

在我的程序中的某个时刻,我计算整数除数d.从那时起,d将是不变的.

稍后在代码中,我将多次除以该d – 执行整数除法,因为d的值不是编译时已知的常量.

鉴于与其他类型的整数运算相比,整数除法是一个相对较慢的过程,我想优化它.我可以存储一些替代格式,以便分割过程执行得更快吗?也许是某种形式的倒数?

其他任何东西我都不需要d的值.

d的值是任何64位整数,但通常很适合32位.

解决方法:

这个-libdivide有一个图书馆:

libdivide is an open source library for optimizing integer division

libdivide allows you to replace expensive integer divides with
comparatively cheap multiplication and bitshifts. Compilers usually do
this, but only when the divisor is known at compile time. libdivide
allows you to take advantage of it at runtime. The result is that
integer division can become faster – a lot faster. Furthermore,
libdivide allows you to divide an SSE2 vector by a runtime constant,
which is especially nice because SSE2 has no integer division
instructions!

libdivide is free and open source with a permissive license. The name
“libdivide” is a bit of a joke, as there is no library per se: the
code is packaged entirely as a single header file, with both a C and a
C++ API.

你可以在blog阅读它背后的算法;例如,这个entry.

基本上,它背后的算法与编译器用于通过常量优化除法的算法相同,除了它允许在运行时完成这些强度降低优化.

注意:您可以创建更快的libdivide版本.这个想法是,对于每个除数,你总是可以创建一个三元组(mul / add / shift),所以这个表达式给出了结果:(num * mul add)>> shift(乘法是这里的宽乘法).有趣的是,这种方法可以击败编译器版本,以便对几个微基准测试进行持续划分!

这是我的实现(这不是开箱即用的可编译,但可以看到一般算法):

struct Divider_u32 {
    u32 mul;
    u32 add;
    s32 shift;

    void set(u32 divider);
};

void Divider_u32::set(u32 divider) {
    s32 l = indexOfMostSignificantBit(divider);
    if (divider&(divider-1)) {
        u64 m = static_cast<u64>(1)<<(l+32);
        mul = static_cast<u32>(m/divider);

        u32 rem = static_cast<u32>(m)-mul*divider;
        u32 e = divider-rem;

        if (e<static_cast<u32>(1)<<l) {
            mul++;
            add = 0;
        } else {
            add = mul;
        }
        shift = l;
    } else {
        if (divider==1) {
            mul = 0xffffffff;
            add = 0xffffffff;
            shift = 0;
        } else {
            mul = 0x80000000;
            add = 0;
            shift = l-1;
        }
    }
}

u32 operator/(u32 v, const Divider_u32 &div) {
    u32 t = static_cast<u32>((static_cast<u64>(v)*div.mul+div.add)>>32)>>div.shift;

    return t;
}

标签:c,assembly,x86-64,optimization
来源: https://codeday.me/bug/20190930/1835764.html