CUDA学习心得
作者:互联网
CUDA学习心得
本文将记录一些零碎的关于CUDA C的知识。
最快得到设备属性的办法 The fast way to query device properties
仍然有一些教材和文章使用cudaGetDeviceProperties()去得到设备属性。但对于更高级的开发人员,NVIDIA官方给出了这个函数
cudaDeviceGetAttribute();
原理
cudaGetDeviceProperties()会给出所有的属性,而很多情况下我们只需要用其中的一两个而已;而cudaDeviceGetAttribute()则只给出调用者想要的那一个属性(attribute)/返回值。所以二者有了几个数量级的差距,纳秒vs毫秒。
调用方法
__host____device__ cudaError_t cudaDeviceGetAttribute ( int* value, cudaDeviceAttr attr, int device )
参数:
value
- Returned device attribute value
attr
- Device attribute to query
device
- Device number to query
cudaDeviceAttr
CUDA device attributes。也就是我们的第二个参数。
如果我想要知道每个block最多可以有多少个thread,那么
int deviceId;
int threadsPerBlocks;
cudaDeviceGetAttribute(&threadsPerBlock, cudaDevAttrMaxThredsPerBlock, deviceId);
cudaDeviceAttr共有115个不同的赋值选择。前二十个如下所示。
cudaDevAttrMaxThreadsPerBlock = 1
Maximum number of threads per block
cudaDevAttrMaxBlockDimX = 2
Maximum block dimension X
cudaDevAttrMaxBlockDimY = 3
Maximum block dimension Y
cudaDevAttrMaxBlockDimZ = 4
Maximum block dimension Z
cudaDevAttrMaxGridDimX = 5
Maximum grid dimension X
cudaDevAttrMaxGridDimY = 6
Maximum grid dimension Y
cudaDevAttrMaxGridDimZ = 7
Maximum grid dimension Z
cudaDevAttrMaxSharedMemoryPerBlock = 8
Maximum shared memory available per block in bytes
cudaDevAttrTotalConstantMemory = 9
Memory available on device for __constant__ variables in a CUDA C kernel in bytes
cudaDevAttrWarpSize = 10
Warp size in threads
cudaDevAttrMaxPitch = 11
Maximum pitch in bytes allowed by memory copies
cudaDevAttrMaxRegistersPerBlock = 12
Maximum number of 32-bit registers available per block
cudaDevAttrClockRate = 13
Peak clock frequency in kilohertz
cudaDevAttrTextureAlignment = 14
Alignment requirement for textures
cudaDevAttrGpuOverlap = 15
Device can possibly copy memory and execute a kernel concurrently
cudaDevAttrMultiProcessorCount = 16
Number of multiprocessors on device
cudaDevAttrKernelExecTimeout = 17
Specifies whether there is a run time limit on kernels
cudaDevAttrIntegrated = 18
Device is integrated with host memory
cudaDevAttrCanMapHostMemory = 19
Device can map host memory into CUDA address space
cudaDevAttrComputeMode = 20
Compute mode
参考资料
https://developer.nvidia.com/blog/cuda-pro-tip-the-fast-way-to-query-device-properties/
https://docs.nvidia.com/cuda/cuda-runtime-api/
标签:学习心得,Maximum,Device,CUDA,device,dimension,block 来源: https://blog.csdn.net/x786695720/article/details/114671680