-
Notifications
You must be signed in to change notification settings - Fork 4
Integrate high-level observability using Tracer #392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This will be starting off as more of a research effort, then later it might become more of an implementation issue. |
Logger is used in a lot of subpackages and the version needs to be updated everywhere. Instead of updating the version manually, the following needs to be done:
This should match the version of the dependency in the current Without updating this in all the dependencies, Polykey will fail to compile. |
Well due to semantic versioning it should have been a minor update that easily flowed to all dependencies. No need to update everything. |
I think the issue was when we were trying to give the RPC a logger, when the RPC was using the older version and polykey used the newer version. That made the RPC logger instance different than the polykey one. The difference could have been also caused due to different symbols, as even with the same name, two symbols can be technically different in what they represent, which was why it was complaining. After overriding the version, all references to |
Yes you have to beware of this. Keep track of it using npm command that gives you a tree of the dependencies. |
Is this dependent on #391? |
Not really, I can run the program just fine locally, and monkey-patching is local-only as well. I'm working on this right now, actually. |
Refer to MatrixAI/js-logger#52 (comment) - and define a few phases: after you have experimental visualisation, the next step is to apply it to areas where we know there are resource leakage.
|
It turns out that the issue was in my implementation. Basically, I used an IIFE to wait for polykey to finish execution and then signal to end the tracing, but that didn't capture the entirety of polykey logs. After digging into it, the reason ended up being that the exit handlers did some additional processing, ending the tracer before polykey could end. To avoid this, previously I avoided ending tracer and let the process close naturally, which worked fine for running it interactively, but it ended up failing all the tests as the tests were finished but the generator remained active. Also, the tracer is run in the background as a promise and awaited just before exiting to ensure the logs are written properly. This unfortunately results in tracer not being completely transparent as time is added before the application can end. This has a major effect on the tests where multiple instances are started and stopped. Only running spans in specific modes like verbose or a specific mode to enable span tracing might be better here, but it still wont make tracing completely transparent. For full transparency, I'll have to fork the main process to run the tracer which will receive messages from the main process via IPC, ensuring the main process can end by itself and the forked process will continue until it has written all the logs, then exit. But this might have issues in runtimes like the polykey docker image. |
I think you should review the observability push-flow. The main thread running the JS code will always be pushing data, this will have a performance impact no matter what however there should be no IO delays. A secondary thread from node's IO thread pool can be used by the relevant sink to then push the tracing information into a designated location like append-only file descriptor - potentially with log rotation.
|
Specification
Some work has been done in js-logger for rolling out a custom implementation of tracing which supports streaming and potentially unending spans. (js-logger#47). We now need to use the tracing system in Polykey to gain observability during Polykey's runtime.
Obtaining observability helps debug resource leaks more easily, and given how frequently the seednodes shut down and polykey remains stuck in a stopping state, there are a lot of them. This is meant to simplify finding and fixing them.
Additional context
Tasks
The text was updated successfully, but these errors were encountered: