其他分享
首页 > 其他分享> > c – 为什么valarray这么慢?

c – 为什么valarray这么慢?

作者:互联网

我正在尝试使用valarray,因为它在操作矢量和矩阵时非常类似于MATLAB.我首先做了一些性能检查,发现valarray无法达到Stroustrup在C++ programming language书中声明的性能.

测试程序实际上做了500万倍的双倍.我认为c = a * b至少可以与for循环双重型元素乘法相媲美,但我完全错了.我尝试了几台计算机和Microsoft Visual C 6.0和Visual Studio 2008.

顺便说一句,我使用以下代码在MATLAB上测试:

len = 5*1024*1024;
a = rand(len, 1);
b = rand(len, 1);
c = zeros(len, 1);
tic;
c = a.*b;
toc;

结果是46毫秒.这个时间精度不高;它只作为参考.

代码是:

#include <iostream>
#include <valarray>
#include <iostream>
#include "windows.h"

using namespace std;
SYSTEMTIME stime;
LARGE_INTEGER sys_freq;

double gettime_hp();

int main()
{
    enum { N = 5*1024*1024 };
    valarray<double> a(N), b(N), c(N);
    QueryPerformanceFrequency(&sys_freq);
    int i, j;
    for (j=0 ; j<8 ; ++j)
    {
        for (i=0 ; i<N ; ++i)
        {
            a[i] = rand();
            b[i] = rand();
        }

        double* a1 = &a[0], *b1 = &b[0], *c1 = &c[0];
        double dtime = gettime_hp();
        for (i=0 ; i<N ; ++i)
            c1[i] = a1[i] * b1[i];
        dtime = gettime_hp()-dtime;
        cout << "double operator* " << dtime << " ms\n";

        dtime = gettime_hp();
        c = a*b ;
        dtime = gettime_hp() - dtime;
        cout << "valarray operator* " << dtime << " ms\n";

        dtime = gettime_hp();
        for (i=0 ; i<N ; ++i)
            c[i] = a[i] * b[i];
        dtime = gettime_hp() - dtime;
        cout << "valarray[i] operator* " << dtime<< " ms\n";

        cout << "------------------------------------------------------\n";
    }
}

double gettime_hp()
{
    LARGE_INTEGER tick;
    extern LARGE_INTEGER sys_freq;
    QueryPerformanceCounter(&tick);
    return (double)tick.QuadPart * 1000.0 / sys_freq.QuadPart;
}

运行结果:(具有最大速度优化的释放模式)

double operator* 52.3019 ms
valarray operator* 128.338 ms
valarray[i] operator* 43.1801 ms
------------------------------------------------------
double operator* 43.4036 ms
valarray operator* 145.533 ms
valarray[i] operator* 44.9121 ms
------------------------------------------------------
double operator* 43.2619 ms
valarray operator* 158.681 ms
valarray[i] operator* 43.4871 ms
------------------------------------------------------
double operator* 42.7317 ms
valarray operator* 173.164 ms
valarray[i] operator* 80.1004 ms
------------------------------------------------------
double operator* 43.2236 ms
valarray operator* 158.004 ms
valarray[i] operator* 44.3813 ms
------------------------------------------------------

具有相同优化的调试模式:

double operator* 41.8123 ms
valarray operator* 201.484 ms
valarray[i] operator* 41.5452 ms
------------------------------------------------------
double operator* 40.2238 ms
valarray operator* 215.351 ms
valarray[i] operator* 40.2076 ms
------------------------------------------------------
double operator* 40.5859 ms
valarray operator* 232.007 ms
valarray[i] operator* 40.8803 ms
------------------------------------------------------
double operator* 40.9734 ms
valarray operator* 234.325 ms
valarray[i] operator* 40.9711 ms
------------------------------------------------------
double operator* 41.1977 ms
valarray operator* 234.409 ms
valarray[i] operator* 41.1429 ms
------------------------------------------------------
double operator* 39.7754 ms
valarray operator* 234.26 ms
valarray[i] operator* 39.6338 ms
------------------------------------------------------

解决方法:

我怀疑c = a * b的原因比执行一次一个元素的操作慢得多

template<class T> valarray<T> operator*
    (const valarray<T>&, const valarray<T>&);

运算符必须分配内存以将结果放入,然后按值返回.

即使使用“交换优化”来执行复制,该功能仍然具有开销

>为生成的valarray分配新块
>初始化新的valarray(这可能会被优化掉)
>将结果放入新的valarray中
>在内存中为新valarray进行分页,因为它已初始化或使用结果值进行设置
>解除分配由结果替换的旧valarray

标签:valarray,c
来源: https://codeday.me/bug/20191003/1850768.html