-
Notifications
You must be signed in to change notification settings - Fork 729
NRE when calling TraceEventProviders.GetProviderGuidByName #2177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…eturns incomplete list instead of crashing with a NRE.
I think the most common way My gut feeling is telling me that throwing an exception would be more correct solution to this. But it is also very rough. Looking into the code around, I don't see a pattern we use for such cases. I'll defer to @brianrob for final call about this. |
@desdesdes do you have a scenario where you're seeing this fail? I've seen a couple of reports, but haven't been able to get more information on what is actually happening. |
We have an application which runs as a windows service. This service sometimes seems to fail to startup while booting windows. The windows infrastructure automatically retries and after a some retries it succeeds to start. We have a few thousand servers running and we only experience this issue on some servers. We have not found a pattern yet. We pinpointed the problem to this location in code. It could be that the TdhEnumerateProviders is returning 122 and TdhEnumerateProviders should be called in a loop. It could also be that TdhEnumerateProviders is dependent on another windows service or another part of windows which is still initializing, but I don't know. It would be nice if you throw and exception to add the HR to the exception so we can analyse this better. |
Thank you for clarifying the scenario. Do you have a callstack of when this happens? |
Hope this helps.
|
Yes, this is exactly what I am looking for. I think this is worth fixing, but I'm afraid that the fix is more complicated than this. I think that @cincuranet is right that we should be throwing an exception. The documentation states that there are only two error codes that should come back from the call to A more appropriate fix will be a bit more involved, as I think we'll want to throw an exception when an unexpected error code is returned, but, we'll want to modify callers of As for the exception itself, I think we'll want to create an exception that describes that the call to Ultimately, this won't make your unhandled exception go away, but it will tell us when something goes wrong that is outside of our control, and maybe will give us an opportunity to report a bug. |
@desdesdes Can you maybe try capture the trace log? The call @brianrob If the name is already GUID, can't we skip calling |
Presumably, yes. But I would not be surprised if we ran into a situation where someone had registered a provider whose name is a GUID but not the same GUID as the ID of the provider. :( That would be the risk of the change, but honestly, I think I'd be OK with that. I think I'm more curious what the transient failure is, but having looked at the code for |
@cincuranet The issue only occurs in our production environment where this only occur on less then 1% of our >6700 servers and it occurs on different servers every time. So it is very hard for us to capture the trace logs. It is hard for us to change anything because on our production environment and it has to pass the Security department/checks. If you could change the Trace.WriteLine to an exception with the hr in the exception message then we can deploy it as a hotfix and let you know which hr it is. |
@desdesdes Is it OK if I give you just the nupkg? |
@cincuranet Unfortunately we still use 3.1.9 our production code. The issue is self resolving, the service tries to restart after the NRE an after a few attempts it succeeds, so it is not a critical issue to resolve as a hotfix in production for us. We can not rollout a hotfix to eventtrace 3.1.20 to resolve/test a non-critical issue. So I can do two things.
Best regards, |
@desdesdes Yes, if you can hotpatch 3.1.9 locally, that would be great. Knowing the hr would help us make more educated decision how to proceed next. |
Then TraceEventSession.ProviderNameToGuid is called code assumes this never results in null. See TraceEventProviders.GetProviderGuidByName. Then TraceEventNativeMethods.TdhEnumerateProviders results return a hr != 0 or providersDesc == null then Trace.WriteLine will be called and the function returns null. This pull request changes the behavior to return an incomplete list instead of crashing with a NRE. A point can me made to change the Trace.WriteLine to a throw new Exception(.