-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the parsing part for the profiling tool #7043
Conversation
The profiling report looks like:
And this pull request is ready for review. @qingqing01 @reyoung |
paddle/platform/profiler.cc
Outdated
void PopEvent(const std::string& name, DeviceContext* dev_ctx) { | ||
GetEventList().Record(EventKind::kPopRange, std::move(name), g_thread_id, | ||
dev_ctx); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to refine RecordEvent::RecordEvent()
and RecordEvent::~RecordEvent()
to reuse these two functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
std::cout << "Place: GPU" << std::endl; | ||
#else | ||
std::cout << "Place: CPU" << std::endl; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If compiling with GPU, the users also can run tasks on CPU only. So here should not use the PADDLE_WITH_CUDA
macro. You can use g_state
to determine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Use another global variable g_profiler_place
because g_state
has been set to kDisabled
when parsing events.
paddle/platform/profiler.cc
Outdated
double event_time = rit->CudaElapsedMs(events[i][j]); | ||
#else | ||
double event_time = rit->CpuElapsedMs(events[i][j]); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same problem with above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
pushed_events.erase((++rit).base()); | ||
} else { | ||
std::cout << "Warning: can not find the start marker of event " | ||
<< events[i][j].name(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG(WARNING)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
<< std::setw(data_width) << event_item.max_time | ||
<< std::setw(data_width) << event_item.ave_time << std::endl; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the ParseEvents
and print can be two functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed all the comments. Thanks!
paddle/platform/profiler.cc
Outdated
void PopEvent(const std::string& name, DeviceContext* dev_ctx) { | ||
GetEventList().Record(EventKind::kPopRange, std::move(name), g_thread_id, | ||
dev_ctx); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
std::cout << "Place: GPU" << std::endl; | ||
#else | ||
std::cout << "Place: CPU" << std::endl; | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Use another global variable g_profiler_place
because g_state
has been set to kDisabled
when parsing events.
paddle/platform/profiler.cc
Outdated
double event_time = rit->CudaElapsedMs(events[i][j]); | ||
#else | ||
double event_time = rit->CpuElapsedMs(events[i][j]); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
pushed_events.erase((++rit).base()); | ||
} else { | ||
std::cout << "Warning: can not find the start marker of event " | ||
<< events[i][j].name(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
<< std::setw(data_width) << event_item.max_time | ||
<< std::setw(data_width) << event_item.ave_time << std::endl; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
@@ -13,12 +13,18 @@ See the License for the specific language governing permissions and | |||
limitations under the License. */ | |||
|
|||
#include "paddle/platform/profiler.h" | |||
#include <iomanip> | |||
#include <map> | |||
#include "gflags/gflags.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gflags
is not used. This header can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
return a.max_time > b.max_time; | ||
default: | ||
return a.ave_time > b.ave_time; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this lambda expression can be selected before the for loop in line 187, and the sort_domain
can be saved at the same time. Then pass the sort_domain
as a string argument to PrintProfilingReport
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified. Thanks!
paddle/platform/profiler.cc
Outdated
@@ -13,12 +13,18 @@ See the License for the specific language governing permissions and | |||
limitations under the License. */ | |||
|
|||
#include "paddle/platform/profiler.h" | |||
#include <iomanip> | |||
#include <map> | |||
#include "gflags/gflags.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
paddle/platform/profiler.cc
Outdated
return a.max_time > b.max_time; | ||
default: | ||
return a.ave_time > b.ave_time; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
#ifdef PADDLE_WITH_CUDA | ||
PADDLE_ENFORCE(e.has_cuda() && has_cuda()); | ||
PADDLE_ENFORCE(e.device() == device()); | ||
PADDLE_ENFORCE(cudaEventSynchronize(event_)); | ||
PADDLE_ENFORCE(cudaEventSynchronize(e.event())); | ||
float ms; | ||
PADDLE_ENFORCE(cudaEventElapsedTime(&ms, event_, e.event())); | ||
return ms * 1000.0; | ||
return ms; | ||
#else | ||
PADDLE_THROW("CUDA is not enabled"); | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can not comment on line 110 but only write the comments here.
line 110 should be before line 109.
|
||
namespace paddle { | ||
namespace platform { | ||
|
||
// The profiler state, the initial value is ProfilerState::kDisabled | ||
static ProfilerState g_state = ProfilerState::kDisabled; | ||
// To record which timer the profiler used, CUDA or CPU. | ||
static std::string g_profiler_place = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
g_profiler_place
and g_state
should be thread_local
, just like g_thread_id
.
name_(std::move(name)), | ||
thread_id_(thread_id), | ||
has_cuda_(false) { | ||
: kind_(kind), name_(name), thread_id_(thread_id), has_cuda_(false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do not you use std::move
?
double Event::CpuElapsedUs(const Event& e) const { | ||
return (e.cpu_ns_ - cpu_ns_) / (1000.0); | ||
double Event::CpuElapsedMs(const Event& e) const { | ||
return (e.cpu_ns_ - cpu_ns_) / (1000000.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ii will be better to replace / (1000000.0)
with *0.000001
.
See Part I in #6701
Please review this part after the Part I merged.