forked from orlandoacevedo/MCGPU
-
Notifications
You must be signed in to change notification settings - Fork 0
/
profile_README.txt
32 lines (26 loc) · 1.31 KB
/
profile_README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
This file contains instructions for how to profile the program
Date created: 3/27/14
Date updated: 3/28/14
Author: Josh Mosby
NOTE: The ./ command is not needed for the app_name
For CUDA functions
nvprof - collects events/metrics for CUDA kernels
- actually runs the program, and thus needs full arguments
nvprof [options] app_name [app_options]
For C/C++ functions
gprof - profiles serial functions
- Whenever the program is run, a file called gmon.out is created with the metrics from the last run
- Gprof does not run the program, it simply analyzes these metrics, thus no command line arguments are needed
gprof [options] app_name gmon.out
Other tools
CUDA-MEMCHECK - detects memory access errors in CUDA applications
- works with all SM architectures
- does not require any special compilation settings
- should support dynamic parallelism
cuda-memcheck [options] app_name [app_options]
CUDA-RACECHECK - helps identify memory access race conditions in CUDA applications that use shared memory
- only looks at on-chip shared access memory (defined with the __shared__ flag)
- does not require any special compilation settigns
- only works on SM architectures 2.0 and above
- should support dynamic parallelism
cuda-memcheck --tool racecheck [memcheck-options] app_name [app_options]