Add support for collecting CLR event through EventPipe #1291

chrisnas · 2020-10-16T08:36:14Z

Add -eventpipe option to activate the EventPipe collection
Add -providers provider:keyword:verbosity:tags,... option to allow fine grained tuning of dotnet-trace execution
Add -sdk-path to point to an already installed dotnet SDK (otherwise it will be install in /tmp/dotnet_sdk_tool)
When the collection is done, a trace.nettrace and eventpipe.log files will be part of the resulting zip file that perfview is now able to leverage

brianrob

@chrisnas, @gleocadie thank you very much for your contribution! I have a few questions and comments, but this is looking quite good.

brianrob · 2020-10-22T23:43:59Z

src/perfcollect/perfcollect


+# Use EventPipe to collect CLR events
+useEventPipe=0
+sdkAndToolDir="/tmp/dotnet_sdk_tool"


Nit: Can we make this something like /tmp/perfcollect-dotnet-sdk so that it's clear where it came from?

brianrob · 2020-10-22T23:44:27Z

src/perfcollect/perfcollect

            useLTTng=0
+        elif [ "-eventpipe" == "$arg" ]
+        then
+            useEventPipe=1


I like this - if you enable EventPipe, LTTng is disabled.

brianrob · 2020-10-22T23:45:20Z

src/perfcollect/perfcollect

+            useLTTng=0
+        elif [ "-providers" == "$arg" ]
+        then
+            providers=$rawvalue


I would like to propose that if -eventpipe isn't specified then this writes a FatalError that -eventpipe must be specified. This ensures that users of LTTng don't inadvertently specify this, but get no results.

Good one. I will add a validation step after the arguments are parsed.

brianrob · 2020-10-22T23:46:55Z

src/perfcollect/perfcollect

        elif [ "-gccollectonly" == "$arg" ]
        then
            gcCollectOnly=1
+            usePerf=0


I may have mis-understood, but I think you were wanting a way to make it possible to capture -gccollectonly with perf enabled.

I was thinking that I would do that in a second step (PR), we are not blocked now. what do you think?

Sure, no problem.

brianrob · 2020-10-22T23:50:32Z

src/perfcollect/perfcollect

+BuildEventPipeArgs()
+{
+    if [ "$collectionPid" == "" ]
+    then


I'm wondering if we should have an argument validation step after the arguments are parsed to handle this and some of the other possible argument-related issues that I mentioned above. What do you think?

We are aligned :) I will add a validation step

Great, thanks!

brianrob · 2020-10-22T23:56:07Z

src/perfcollect/perfcollect

+       WriteStatus "Installing dotnet sdk in $sdkAndToolDir"
+       ResetText
+       RunSilent "mkdir $sdkAndToolDir"
+       RunSilent "curl -OL https://dot.net/v1/dotnet-install.sh"


I think the SDK directory should be created first so that you can download dotnet-install.sh and store it in this directory as opposed to the current working directory.

Oh, I see thank, I missed that.

brianrob · 2020-10-22T23:56:52Z

src/perfcollect/perfcollect

+   then
+      FatalError "dotnet-trace tool was installed correctly."
+   fi
+   LogAppend 'dotnet-trace version:' `$sdkAndToolDir/dotnet trace --version`


Thanks for adding the version information to the log unconditionally!

brianrob · 2020-10-22T23:57:56Z

src/perfcollect/perfcollect

 # Ensure prerequisites are installed.
 EnsurePrereqsInstalled

+if [ "$useEventPipe" == "1" ]  && [ "$1" != "stop" ]


I would like this to follow the same pattern as we do for the other prerequisites, and make download and install of the SDK part of perfcollect install. This ensures that any disk changes other than for the traces themselves are intentional on the part of the user.

@brianrob one question: the .NET sdk and the tool will unconditionally be installed in /tmp/perfcollect-dotnet-sdk when using perfcollect install.
The DiscoverCommands and InitializeLog functions will take use this path but not the one provided by -sdk-path option.
I was wondering if the -sdk-path was still worth. Maybe we can remove it no. What do you think ? (I might have missed something)

You bring up a good point here. I was trying to have more flexibility here, but I feel like the more I think about it, the more complicated things get, and that we should go back to a more simple plan as you are suggesting.

Here's what I think we should do, let me know what you think:

If there is a global dotnet SDK installed, use it. If not, install to /tmp/perfcollect-dotnet-sdk.

Install dotnet-trace using whichever SDK we have.

When collecting, discover the SDK to use based on whether or not there is a global one or one in /tmp.

This ensures that if someone doesn't want to install another SDK, but already has one that they can use it. What do you think?

That sounds good to me.

brianrob · 2020-10-23T00:01:01Z

src/perfcollect/perfcollect

 useLTTng=1

+# Use EventPipe to collect CLR events
+useEventPipe=0


I'd like to propose that instead of using the term eventpipe that we use dotnetTrace and/or dotnet-trace since the actual tool being used here is dotnet-trace. This is more of a forward-looking thing, so that it's clear what collector is being used should there be multiple, or if people don't know what eventpipe is.

brianrob · 2020-10-23T00:01:58Z

src/perfcollect/perfcollect

+then
+    EnsureDotNetTraceToolIsInstalled
+fi
+


Can you please add a regression test that uses dotnet-trace?

Ok, I missed the test folder. I will add a regression test.

@brianrob I tried to add a regression test (it allowed me to find stuff I forgot to add in the script: installing curl). But, to run the script with -dotnet-trace, we need an .NET app running in the container. I'm not sure it's possible to do that in the current state. Do you want me to add this?

Yes, that would be great.

ezsilmar · 2021-05-20T13:26:01Z

Hi, I'd like to push this PR forward to be able to merge criteo-forks@747f2a8 so that we stop relying on the perfview fork internally :)

I discussed with @chrisnas and @gleocadie, and it seems that some tests were missing. I also see that 1 test in the current build failed, but the build result is long gone. @brianrob could you please re-trigger the test if there's such an option?

brianrob · 2021-05-24T15:33:06Z

/azp run

azure-pipelines · 2021-05-24T15:33:15Z

Azure Pipelines successfully started running 1 pipeline(s).

brianrob · 2021-05-24T15:33:41Z

Thanks @ezsilmar. Just triggered a new CI run.

ezsilmar · 2021-09-17T12:57:38Z

Hello! @brianrob I got some time to come back to this PR and would be glad to get a code review.

I mainly fought test instability:

Disable test parallelization: this was already the case for most test projects
Better handling of shared directories in EtlTestBase
In perfcollect install for Ubuntu removed the packages that are missing, used linux-tools-generic instead
In container tests for dotnet-trace added a sleep after launching the test program

About the last point, something weird is happening. If I attach to the process with dotnet-trace right after the process is started, dotnet-trace hangs forever printing Stopping the trace. This may take up to minutes depending on the application being traced. If I wait for about a couple of seconds it works fine. This behavior reproduces in the github build pipeline, so there's probably a bug in dotnet-trace.

ezsilmar · 2021-09-17T14:16:43Z

The test failing currently is OOM of CanReadV4EventPipeTraceBiggerThan4GB, it passes on my machine.

…Tng)

ezsilmar · 2023-02-03T16:20:02Z

Hello, this PR is hanging for almost 3 years but it is still relevant in our context, and I think it'd be beneficial for the community as well.

To remind what this is all about, we often use PerfView on Windows to analyze the behavior of dotnet apps running on Linux. Relevant to this PR, PerfView can understand:

perf CPU samples, collected with perfcollect
lttng text data file, collected with perfcollect
nettrace file, collected with dotnet-trace

In the days of net2, using perf+lttng was the only way. Perfcollect greatly eased the process by combining events and cpu samples into a single .trace.zip file. Later, dotnet-trace became a thing making perfcollect almost abandoned (at least that's my feeling). While for the events dotnet-trace is much more convenient than lttng, it was never intended to match capabilities of perf. Thus today when we need both cpu sampling and events we deal with two separate artifacts: a perfcollect output and a nettrace file.

This PR is a quality of life change that allows the nettrace file to be packaged in .trace.zip, alongside perf data. A nice side-effect is we can zip .nettrace file which is important for sharing long sessions. The PR also modifies perfcollect to be able to use dotnet-trace under the hood, however this part is not important for my particular usecase as we run perf and zip directly in our troubleshooting code.

If modifying perfcollect is not something you'd like to support, we could just merge the change in PerfViewData.cs that tries to read .nettrace from .trace.zip: it's small and beneficial on its own. Wdyt?

Mentioning @brianrob as the last reviewer

brianrob · 2025-01-14T18:54:25Z

We're working to clean-up old open PRs in this repo. This PR is greater than 1 year old. If you would like to continue working on this PR, please add a comment within the next 7 days so that we can start discussion on next steps. Otherwise, we will close this PR. Please feel free to open a new PR or issue if you'd like to re-open this discussion at a later date.

brianrob · 2025-01-21T18:35:41Z

Closing this PR as it is greater than 1 year old. If you'd like to continue working on this, please open a new PR or issue to discuss next steps.

gleocadie force-pushed the PR_perfcollect_eventpipe branch from 785af82 to 4c4c6de Compare October 19, 2020 14:35

brianrob reviewed Oct 23, 2020

View reviewed changes

gleocadie force-pushed the PR_perfcollect_eventpipe branch 2 times, most recently from f307984 to e8b179f Compare November 3, 2020 20:02

gleocadie force-pushed the PR_perfcollect_eventpipe branch from e8b179f to 9f100e8 Compare December 17, 2020 15:39

Base automatically changed from master to main February 2, 2021 23:16

Christophe Nasarre and others added 5 commits February 3, 2023 14:19

Support traces recorded on Linux with dotnet-trace (in addition to LT…

7c95595

…Tng)

Add support for events collection through EventPipe

8930b23

Add Rider files to .gitignore

c188a90

Make tests more stable

9944623

Make container tests pass

954aada

ezsilmar force-pushed the PR_perfcollect_eventpipe branch 2 times, most recently from 02df7c7 to 65bd20b Compare February 3, 2023 13:42

Bump test app to net6

d1fa55b

ezsilmar force-pushed the PR_perfcollect_eventpipe branch from 65bd20b to d1fa55b Compare February 3, 2023 13:42

brianrob closed this Jan 21, 2025

Add support for collecting CLR event through EventPipe #1291

Add support for collecting CLR event through EventPipe #1291

Uh oh!

Conversation

chrisnas commented Oct 16, 2020

Uh oh!

brianrob left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gleocadie Oct 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gleocadie Nov 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ezsilmar commented May 20, 2021

Uh oh!

brianrob commented May 24, 2021

Uh oh!

azure-pipelines bot commented May 24, 2021

Uh oh!

brianrob commented May 24, 2021

Uh oh!

ezsilmar commented Sep 17, 2021

Uh oh!

ezsilmar commented Sep 17, 2021

Uh oh!

ezsilmar commented Feb 3, 2023

Uh oh!

brianrob commented Jan 14, 2025

Uh oh!

brianrob commented Jan 21, 2025

Uh oh!

Reviewers

gleocadie Oct 28, 2020 •

edited

Loading

gleocadie Nov 3, 2020 •

edited

Loading