Time Profiling
SINGA supports the time profiling of each of the operators buffered in the
graph. To utilize the time profiling function, we first call the
device.SetVerbosity
method to set the verbosity of the time profilier, and
then call the device.PrintTimeProfiling
to print out the results of time
profiling.
Setup the Time Profiling Verbosity
To use the time profiling function, we need to set the verbosity. There are
three levels of verbosity. With the default value verbosity == 0
, it will not
do any time profiling. When we set verbosity == 1
, it will profile the forward
and backward propagation time. When verbosity == 2
, it will profile the time
spent on every buffered operation in the graph.
The following is the example code to setup the time profiling function:
# create a device
from singa import device
dev = device.create_cuda_gpu()
# set the verbosity
verbosity = 2
dev.SetVerbosity(verbosity)
# optional: skip the first 5 iterations when profiling the time
dev.SetSkipIteration(5)
Then, after we have completed the training at the end of the program, we can
print the time profiling result by calling the device.PrintTimeProfiling
method:
dev.PrintTimeProfiling()
Example Outputs for Different Verbosity
We can run the ResNet example to see the output with different setting of verbosity:
verbosity == 1
Time Profiling:
Forward Propagation Time : 0.0409127 sec
Backward Propagation Time : 0.114813 sec
verbosity == 2
Time Profiling:
OP_ID0. SetValue : 1.73722e-05 sec
OP_ID1. cudnnConvForward : 0.000612724 sec
OP_ID2. GpuBatchNormForwardTraining : 0.000559449 sec
OP_ID3. ReLU : 0.000375004 sec
OP_ID4. GpuPoolingForward : 0.000240041 sec
OP_ID5. SetValue : 3.4176e-06 sec
OP_ID6. cudnnConvForward : 0.000115619 sec
OP_ID7. GpuBatchNormForwardTraining : 0.000150415 sec
OP_ID8. ReLU : 9.95494e-05 sec
OP_ID9. SetValue : 3.22432e-06 sec
OP_ID10. cudnnConvForward : 0.000648668 sec
OP_ID11. GpuBatchNormForwardTraining : 0.000149793 sec
OP_ID12. ReLU : 9.92118e-05 sec
OP_ID13. SetValue : 3.37728e-06 sec
OP_ID14. cudnnConvForward : 0.000400953 sec
OP_ID15. GpuBatchNormForwardTraining : 0.000572181 sec
OP_ID16. SetValue : 3.21312e-06 sec
OP_ID17. cudnnConvForward : 0.000398698 sec
OP_ID18. GpuBatchNormForwardTraining : 0.00056836 sec
OP_ID19. Add : 0.000542246 sec
OP_ID20. ReLU : 0.000372783 sec
OP_ID21. SetValue : 3.25312e-06 sec
OP_ID22. cudnnConvForward : 0.000260731 sec
OP_ID23. GpuBatchNormForwardTraining : 0.000149041 sec
OP_ID24. ReLU : 9.9072e-05 sec
OP_ID25. SetValue : 3.10592e-06 sec
OP_ID26. cudnnConvForward : 0.000637481 sec
OP_ID27. GpuBatchNormForwardTraining : 0.000152577 sec
OP_ID28. ReLU : 9.90518e-05 sec
OP_ID29. SetValue : 3.28224e-06 sec
OP_ID30. cudnnConvForward : 0.000404586 sec
OP_ID31. GpuBatchNormForwardTraining : 0.000569679 sec
OP_ID32. Add : 0.000542291 sec
OP_ID33. ReLU : 0.00037211 sec
OP_ID34. SetValue : 3.13696e-06 sec
OP_ID35. cudnnConvForward : 0.000261219 sec
OP_ID36. GpuBatchNormForwardTraining : 0.000148281 sec
OP_ID37. ReLU : 9.89299e-05 sec
OP_ID38. SetValue : 3.25216e-06 sec
OP_ID39. cudnnConvForward : 0.000633644 sec
OP_ID40. GpuBatchNormForwardTraining : 0.000150711 sec
OP_ID41. ReLU : 9.84902e-05 sec
OP_ID42. SetValue : 3.18176e-06 sec
OP_ID43. cudnnConvForward : 0.000402752 sec
OP_ID44. GpuBatchNormForwardTraining : 0.000571523 sec
OP_ID45. Add : 0.000542435 sec
OP_ID46. ReLU : 0.000372539 sec
OP_ID47. SetValue : 3.24672e-06 sec
OP_ID48. cudnnConvForward : 0.000493054 sec
OP_ID49. GpuBatchNormForwardTraining : 0.000293142 sec
OP_ID50. ReLU : 0.000190047 sec
OP_ID51. SetValue : 3.14784e-06 sec
OP_ID52. cudnnConvForward : 0.00148837 sec
OP_ID53. GpuBatchNormForwardTraining : 8.34794e-05 sec
OP_ID54. ReLU : 5.23254e-05 sec
OP_ID55. SetValue : 3.40096e-06 sec
OP_ID56. cudnnConvForward : 0.000292971 sec
OP_ID57. GpuBatchNormForwardTraining : 0.00029174 sec
OP_ID58. SetValue : 3.3248e-06 sec
OP_ID59. cudnnConvForward : 0.000590154 sec
OP_ID60. GpuBatchNormForwardTraining : 0.000294149 sec
OP_ID61. Add : 0.000275119 sec
OP_ID62. ReLU : 0.000189268 sec
OP_ID63. SetValue : 3.2704e-06 sec
OP_ID64. cudnnConvForward : 0.000341232 sec
OP_ID65. GpuBatchNormForwardTraining : 8.3304e-05 sec
OP_ID66. ReLU : 5.23667e-05 sec
OP_ID67. SetValue : 3.19936e-06 sec
OP_ID68. cudnnConvForward : 0.000542484 sec
OP_ID69. GpuBatchNormForwardTraining : 8.60537e-05 sec
OP_ID70. ReLU : 5.2479e-05 sec
OP_ID71. SetValue : 3.41824e-06 sec
OP_ID72. cudnnConvForward : 0.000291295 sec
OP_ID73. GpuBatchNormForwardTraining : 0.000292795 sec
OP_ID74. Add : 0.000274438 sec
OP_ID75. ReLU : 0.000189689 sec
OP_ID76. SetValue : 3.21984e-06 sec
OP_ID77. cudnnConvForward : 0.000338776 sec
OP_ID78. GpuBatchNormForwardTraining : 8.484e-05 sec
OP_ID79. ReLU : 5.29408e-05 sec
OP_ID80. SetValue : 3.18208e-06 sec
OP_ID81. cudnnConvForward : 0.000545542 sec
OP_ID82. GpuBatchNormForwardTraining : 8.40976e-05 sec
OP_ID83. ReLU : 5.2256e-05 sec
OP_ID84. SetValue : 3.36256e-06 sec
OP_ID85. cudnnConvForward : 0.000293003 sec
OP_ID86. GpuBatchNormForwardTraining : 0.0002989 sec
OP_ID87. Add : 0.000275041 sec
OP_ID88. ReLU : 0.000189867 sec
OP_ID89. SetValue : 3.1184e-06 sec
OP_ID90. cudnnConvForward : 0.000340417 sec
OP_ID91. GpuBatchNormForwardTraining : 8.39395e-05 sec
OP_ID92. ReLU : 5.26544e-05 sec
OP_ID93. SetValue : 3.2336e-06 sec
OP_ID94. cudnnConvForward : 0.000539787 sec
OP_ID95. GpuBatchNormForwardTraining : 8.2753e-05 sec
OP_ID96. ReLU : 4.86758e-05 sec
OP_ID97. SetValue : 3.24384e-06 sec
OP_ID98. cudnnConvForward : 0.000287108 sec
OP_ID99. GpuBatchNormForwardTraining : 0.000293127 sec
OP_ID100. Add : 0.000269478 sec
.
.
.