Skip to content

How to find the "Kernel Launch" time in trace view #900

Open
@zxdclyz

Description

@zxdclyz

I am trying to optimize my GPU training performance. In the overview page, I got "42.1% of the total step time sampled is spent on 'Kernel Launch'". I tried the methods mentioned in the #8 , but they did not improve my situation.

Now I want to further analyze this issue. How can I locate the kernel launch time in the trace view, or how is the kernel launch time calculated here?

I found the following part in TensorFlow's source code, and I think this is the kernel launch time, but I am not sure how to find them in the trace view

EventType ClassifyCpuEvent(absl::string_view event_name, bool has_device,
                           bool has_correlation_id) {
  tsl::profiler::TfOp tf_op = tsl::profiler::ParseTfOpFullname(event_name);
  if (tsl::profiler::IsInfeedEnqueueOp(tf_op) ||
      tsl::profiler::IsMemcpyHToDOp(tf_op)) {
    return HOST_TO_DEVICE;
  } else if (tsl::profiler::IsMemcpyHToHOp(tf_op)) {
    return HOST_TO_HOST;
  } else if (has_device && (has_correlation_id ||
                            absl::StartsWithIgnoreCase(
                                event_name, "ExecutorState::Process"))) {
    // TODO(b/150420972): Separate runtime overhead from actual compute for
    // CPU-only.
    return HOST_PREPARE; // !!!kernel launch!!!
  } else if (absl::StartsWithIgnoreCase(event_name, "IteratorGetNext")) {
    return HOST_WAIT_INPUT;
  } else {
    return HOST_COMPUTE;
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions