c – 为什么valarray这么慢?
作者:互联网
我正在尝试使用valarray,因为它在操作矢量和矩阵时非常类似于MATLAB.我首先做了一些性能检查,发现valarray无法达到Stroustrup在C++ programming language书中声明的性能.
测试程序实际上做了500万倍的双倍.我认为c = a * b至少可以与for循环双重型元素乘法相媲美,但我完全错了.我尝试了几台计算机和Microsoft Visual C 6.0和Visual Studio 2008.
顺便说一句,我使用以下代码在MATLAB上测试:
len = 5*1024*1024;
a = rand(len, 1);
b = rand(len, 1);
c = zeros(len, 1);
tic;
c = a.*b;
toc;
结果是46毫秒.这个时间精度不高;它只作为参考.
代码是:
#include <iostream>
#include <valarray>
#include <iostream>
#include "windows.h"
using namespace std;
SYSTEMTIME stime;
LARGE_INTEGER sys_freq;
double gettime_hp();
int main()
{
enum { N = 5*1024*1024 };
valarray<double> a(N), b(N), c(N);
QueryPerformanceFrequency(&sys_freq);
int i, j;
for (j=0 ; j<8 ; ++j)
{
for (i=0 ; i<N ; ++i)
{
a[i] = rand();
b[i] = rand();
}
double* a1 = &a[0], *b1 = &b[0], *c1 = &c[0];
double dtime = gettime_hp();
for (i=0 ; i<N ; ++i)
c1[i] = a1[i] * b1[i];
dtime = gettime_hp()-dtime;
cout << "double operator* " << dtime << " ms\n";
dtime = gettime_hp();
c = a*b ;
dtime = gettime_hp() - dtime;
cout << "valarray operator* " << dtime << " ms\n";
dtime = gettime_hp();
for (i=0 ; i<N ; ++i)
c[i] = a[i] * b[i];
dtime = gettime_hp() - dtime;
cout << "valarray[i] operator* " << dtime<< " ms\n";
cout << "------------------------------------------------------\n";
}
}
double gettime_hp()
{
LARGE_INTEGER tick;
extern LARGE_INTEGER sys_freq;
QueryPerformanceCounter(&tick);
return (double)tick.QuadPart * 1000.0 / sys_freq.QuadPart;
}
运行结果:(具有最大速度优化的释放模式)
double operator* 52.3019 ms
valarray operator* 128.338 ms
valarray[i] operator* 43.1801 ms
------------------------------------------------------
double operator* 43.4036 ms
valarray operator* 145.533 ms
valarray[i] operator* 44.9121 ms
------------------------------------------------------
double operator* 43.2619 ms
valarray operator* 158.681 ms
valarray[i] operator* 43.4871 ms
------------------------------------------------------
double operator* 42.7317 ms
valarray operator* 173.164 ms
valarray[i] operator* 80.1004 ms
------------------------------------------------------
double operator* 43.2236 ms
valarray operator* 158.004 ms
valarray[i] operator* 44.3813 ms
------------------------------------------------------
具有相同优化的调试模式:
double operator* 41.8123 ms
valarray operator* 201.484 ms
valarray[i] operator* 41.5452 ms
------------------------------------------------------
double operator* 40.2238 ms
valarray operator* 215.351 ms
valarray[i] operator* 40.2076 ms
------------------------------------------------------
double operator* 40.5859 ms
valarray operator* 232.007 ms
valarray[i] operator* 40.8803 ms
------------------------------------------------------
double operator* 40.9734 ms
valarray operator* 234.325 ms
valarray[i] operator* 40.9711 ms
------------------------------------------------------
double operator* 41.1977 ms
valarray operator* 234.409 ms
valarray[i] operator* 41.1429 ms
------------------------------------------------------
double operator* 39.7754 ms
valarray operator* 234.26 ms
valarray[i] operator* 39.6338 ms
------------------------------------------------------
解决方法:
我怀疑c = a * b的原因比执行一次一个元素的操作慢得多
template<class T> valarray<T> operator*
(const valarray<T>&, const valarray<T>&);
运算符必须分配内存以将结果放入,然后按值返回.
即使使用“交换优化”来执行复制,该功能仍然具有开销
>为生成的valarray分配新块
>初始化新的valarray(这可能会被优化掉)
>将结果放入新的valarray中
>在内存中为新valarray进行分页,因为它已初始化或使用结果值进行设置
>解除分配由结果替换的旧valarray
标签:valarray,c 来源: https://codeday.me/bug/20191003/1850768.html