Tensor Core
作者:互联网
参考:
https://forums.developer.nvidia.com/t/how-to-use-wmma-efficiently/157619/2
https://github.com/BigNerd95/CUDASamples/tree/master/samples/0_Simple/cudaTensorCoreGemm
(配置WARPS_PER_BLOCK为4,即可达到接近100TFLOS,一般100TFLOPS性能已经比较好。80%peak )
标签:Core,Tensor,WARPS,forums,PER,https,com 来源: https://www.cnblogs.com/simpleminds/p/16313068.html