TENSOR CORE PERFORMANCE: THE ULTIMATE GUIDE
作者:互联网
TENSOR CORE PERFORMANCE: THE ULTIMATE GUIDE
1. 一个有意思的点,batch size / 108 整除的性能(TFLOPS)更好,因为A100的tensor core sm数为108.
见参考
参考:
https://developer.download.nvidia.cn/video/gputechconf/gtc/2020/presentations/s21929-tensor-core-performance-on-nvidia-gpus-the-ultimate-guide.pdf
标签:CORE,TENSOR,core,108,ULTIMATE,PERFORMANCE,GUIDE 来源: https://www.cnblogs.com/simpleminds/p/16334552.html