Open
Description
I am trying to optimize my GPU training performance. In the overview page, I got "42.1% of the total step time sampled is spent on 'Kernel Launch'". I tried the methods mentioned in the #8 , but they did not improve my situation.
Now I want to further analyze this issue. How can I locate the kernel launch time in the trace view, or how is the kernel launch time calculated here?
I found the following part in TensorFlow's source code, and I think this is the kernel launch time, but I am not sure how to find them in the trace view
EventType ClassifyCpuEvent(absl::string_view event_name, bool has_device,
bool has_correlation_id) {
tsl::profiler::TfOp tf_op = tsl::profiler::ParseTfOpFullname(event_name);
if (tsl::profiler::IsInfeedEnqueueOp(tf_op) ||
tsl::profiler::IsMemcpyHToDOp(tf_op)) {
return HOST_TO_DEVICE;
} else if (tsl::profiler::IsMemcpyHToHOp(tf_op)) {
return HOST_TO_HOST;
} else if (has_device && (has_correlation_id ||
absl::StartsWithIgnoreCase(
event_name, "ExecutorState::Process"))) {
// TODO(b/150420972): Separate runtime overhead from actual compute for
// CPU-only.
return HOST_PREPARE; // !!!kernel launch!!!
} else if (absl::StartsWithIgnoreCase(event_name, "IteratorGetNext")) {
return HOST_WAIT_INPUT;
} else {
return HOST_COMPUTE;
}
}
Metadata
Metadata
Assignees
Labels
No labels