Skip to content

Conversation

@suchetanrs
Copy link

@suchetanrs suchetanrs commented Apr 5, 2025

Hello! @christophebedard

The changes in this PR are as follows:

  • Add test to lttngpy for creating and destroying a live session
  • I changed the --live argument and added the arguments --live-timer-interval and --live-tracing-url-origin
  • Live tracing can be triggered with the start verb.

* Enable live tracing through --live option
* Allow providing a url to send tracing output

Signed-off-by: suchetanrs <[email protected]>
@christophebedard
Copy link
Member

Thanks for the PR! I'll try to take a look tomorrow.

Signed-off-by: suchetanrs <[email protected]>
@suchetanrs
Copy link
Author

I just found out that a job that passed in my older PR now failed. Seems like a bunch of tests are failing :p
I've managed to fix push a fix for few of them but I think most are still because of the API changes I made.
I feel that I should attempt to fix this later because the API is still subject to change. Please let me know if this is okay! :)

@christophebedard
Copy link
Member

Yeah, don't worry, GitHub CI is a bit broken right now! See #160. You can ignore those test failures. I'll fix them once I get some time.

@suchetanrs
Copy link
Author

suchetanrs commented Apr 7, 2025

Thanks for your review comments!!
I will quickly setup a second computer with the tracing repository for the url comment and will work on all of them tomorrow once I finish my GSoC proposal for this :p

@suchetanrs
Copy link
Author

Hey! @christophebedard

I've setup two different laptops with ros2 tracing. I can ssh between the laptops and ros communication between them seems to work. However, when I try to connect to the ip address using babeltrace, I always get a connection refused error. In fact, if I try to view the trace output from the same laptop for an ip that's not localhost, I get the same issue.

I've built LTTng from source and I'm trying to understand the flow of functions. It seems like the supported protocols from the input url are net, net6, tcp and tcp6. We can also specify the input ports we'd like to use (upto 2 ports for net/net6 and a single one for tcp) in the following manner protocol://address:PORT1:PORT2 (just sharing some useful info that I found here :p).
If I use the tcp protocol instead of net, I cannot view the output on babeltrace because it says Unknown protocol: tcp6. I'm not sure if babeltrace supports tcp.
I'm still looking for the exact location in the lttng code base where the input url is resolved to something like net://localhost/host/host_name/session_name.

Please let me know if you have some inputs here. I'll continue to read and understand what's done in LTTng before finishing the PR just so I'm aware of what I'm doing :p

@christophebedard
Copy link
Member

Thanks for looking into it!

I think we might have to use the LTTng relay daemon (lttng-relayd) if we're sending the tracing data from one system to another. I found this example here: https://pavelmakhov.com/2017/01/lttng-streaming/ (note the comment about the ports needing to be open; not sure if that will be an issue for you, otherwise maybe you can try different ports?). So the URL on the system that is tracing should point to the host that is receiving the trace data/that has a relay daemon. The host that is receiving the trace data should start a relay daemon, and the URL used for receiving the trace data should point to localhost but include the hostname of the remote system (see the last code block).

Knowing the above, these sections in the LTTng docs make a lot more sense:

  1. The last paragraph of this section about live tracing and starting the relay daemon on another system: https://lttng.org/docs/v2.13/#doc-lttng-live
  2. This whole section about sending trace data over the network and pointing to the host on which the relay daemon is running: https://lttng.org/docs/v2.13/#doc-sending-trace-data-over-the-network

If we combine those two use-cases, we basically achieve what we want, i.e., live tracing and sending the data to a remote system. And the URL format makes more sense!

Let me know if that makes sense and if you can get it to work knowing this.

I've built LTTng from source

You're very courageous! 😁

Signed-off-by: suchetanrs <[email protected]>
@suchetanrs suchetanrs force-pushed the suchetan/live-tracing-session branch from 0c813e3 to bf892d9 Compare April 22, 2025 14:46
@suchetanrs
Copy link
Author

Thanks a lot for your reply @christophebedard !!
Sorry for the late reply. I completely missed this part in the documentation. When I was trying to read the trace data with babeltrace, instead of localhost, I always used the IP of the remote system. I guess that explains why I got the connection refused error. I followed the blog you shared and it seems to work well for me with two computers 😄

I went on an LTTng documentation study spree and hence the delay with PR. I've uploaded my study notes here :p

I've addressed your review comments but I have a question:
Do we expect users to start lttng-relayd manually? I'm not sure if it makes a lot of sense to do this but if the input url is just localhost, can we spawn the relayd on the same machine similar to the below code using lttng-relayd --daemonize?

def spawn_session_daemon() -> None:

If we do this, maybe we can somehow use the --path parameter to write the live traces to the default folder with $ROS_HOME instead of $LTTNG_HOME?

@christophebedard
Copy link
Member

Thanks for the changes, I will take a look tomorrow.

Sorry for the late reply. I completely missed this part in the documentation. When I was trying to read the trace data with babeltrace, instead of localhost, I always used the IP of the remote system. I guess that explains why I got the connection refused error. I followed the blog you shared and it seems to work well for me with two computers 😄

Great!

I went on an LTTng documentation study spree and hence the delay with PR. I've uploaded my study notes here :p

I'll have to take a look at this!

Do we expect users to start lttng-relayd manually? I'm not sure if it makes a lot of sense to do this but if the input url is just localhost, can we spawn the relayd on the same machine similar to the below code using lttng-relayd --daemonize?

Oh, interesting! It does look like the lttng create command is the one that automatically starts the relay daemon if you create a --live session with the lttng CLI and don't provide a URL (and not liblttng-ctl, which lttngpy uses): https://github.com/lttng/lttng-tools/blob/e710f55dbf2b15e4874d38875338ac695a31cbfa/src/bin/lttng/commands/create.c#L255-L270

In that case, then yes we could definitely spawn it similar to spawn_session_daemon(), unless there's already a relay daemon.

@suchetanrs
Copy link
Author

Thanks!! :D

I will add spawning the relayd to this as well!

Copy link
Member

@christophebedard christophebedard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments, but overall this is looking good

default=1000000,
help='TODO (default: %(default)s)')
const=100000,
help='Set the live timer interval (default: %(default)s)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does set the live timer interval, but the main use of the --live flag is to create a live tracing session, so this should be something like:

Suggested change
help='Set the live timer interval (default: %(default)s)')
help='Create a live tracing session. Optionally set the live timer interval (default: %(default)s)')

(double check the line length and split it over two lines if necessary)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

parser.add_argument(
'--live-url', dest='live_url', type=str,
default='net://localhost',
help='Set the live tracing URL origin (default: %(default)s)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a note for some improvements later, no need to do anything now:

We should document what this all means (timer interval, URL, etc.), link to the relevant parts of the LTTng docs, provide some examples, etc. This could go in the README.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, understood 👍
I will raise a PR for the updates in the README after this.

:param subbuffer_size_kernel: the size of the subbuffers for kernel events (defaults to 32
times the usual page size, since there can be way more kernel events than UST events)
:param live_timer_interval: the time interval at which the data should be flushed from the
buffer and sent to the LTTng relay. This is in microseconds.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
buffer and sent to the LTTng relay. This is in microseconds.
buffer and sent to the LTTng relay daemon. This is in microseconds.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

the usual page size)
:param subbuffer_size_kernel: the size of the subbuffers for kernel events (defaults to 32
times the usual page size, since there can be way more kernel events than UST events)
:param live_timer_interval: the time interval at which the data should be flushed from the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also say that the tracing session will be in live mode if this value is not None.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

times the usual page size, since there can be way more kernel events than UST events)
:param live_timer_interval: the time interval at which the data should be flushed from the
buffer and sent to the LTTng relay. This is in microseconds.
Used only if live_timer_interval is `True`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Used only if live_timer_interval is `True`.
Used only if live_timer_interval is not `None`.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 261 to 264
# TODO(christophebedard): do we need to join the
# base_path with session_name for a live session?
# We need to return a path, so maybe format it like:
# "net://localhost/host/$hostname/$session_name"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 461 to 463
# TODO(christophebedard): figure out what to provide here as the URL
# This depends on how we expect users to use live tracing
# See the documentation for the url param of lttng_create_session_live()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this comment

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +166 to +167
if live_timer_interval is None:
assert trace_directory == full_session_path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value returned by lttng.lttng_init() (trace_directory) should be equivalent to full_live_url if we're in live mode, so we should assert that they are equal too

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 90 to 93
live_tracing_url = (
'net://localhost/host/' + socket.gethostname() + '/' +
session_name
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my other comment, this should probably use live_url instead of net://localhost.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

'net://localhost/host/' + socket.gethostname() + '/' +
session_name
)
print(f'live trace data will be sent to: {live_tracing_url} on the system running lttng-relayd')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove "on the system running lttng-relayd" but still mention the relay daemon before the :, like:

Suggested change
print(f'live trace data will be sent to: {live_tracing_url} on the system running lttng-relayd')
print(f'live trace data will be sent to the relay daemon at: {live_tracing_url}')

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@suchetanrs
Copy link
Author

suchetanrs commented Apr 29, 2025

Hey, I tried a few things about your question in #165 (comment)

If we consider the following scenario: The tracing is launched on a machine say M_trace with host name M_trace_host and IP M_trace_ip and the relay daemon is running on a machine on the same network say M_remote and M_remote_host and IP M_remote_ip.

At first, I started the relay daemon only using the command lttng-relayd on M_remote. In this case, the relay daemon listens for viewer connections only on localhost, so it would work only if I ran babeltrace on M_remote using the URL net://localhost/host/M_trace_host/session-name
In this setting, if I ran babeltrace on M_trace with the URL net://localhost/host/M_trace_host/session-name, I got a connection refused error without any surprises. Even with the URL net://M_remote_ip/host/M_trace_host/session-name, I got the same error.

On the other hand, if I start the relay daemon on M_remote using the command lttng-relayd --live-port=tcp://0.0.0.0:5344 instead of just lttng-relayd, it enables the daemon to find viewer connections anywhere in the network https://lttng.org/man/8/lttng-relayd/v2.13/. So this time when I ran babeltrace with the URL net://localhost/host/M_trace_host/session-name or net://M_remote_ip/host/M_trace_host/session-name on M_remote, both worked fine. And on M_trace, running babeltrace with the URL net://M_remote_ip/host/M_trace_host/session-name worked fine.

So the bottom line is that I am confused about what should we inform the user when they run live tracing 😂
If we say that the live trace output will be available at net://localhost/host/M_trace_host/session-name then it would enforce them to run the relay-d using lttng-relayd on the same machine as babeltrace otherwise it would not work.

On the other hand, like you mentioned in #165 (comment), if we use the live_url directly, we should inform the user about running the relay daemon with the command lttng-relayd --live-port=tcp://0.0.0.0:5344 so they can run babeltrace on any machine with our informed live_url. Shall I continue the PR with the later setting and adding a comment about starting the relay daemon with the extra --live-port?

suchetanrs added 3 commits May 1, 2025 22:11
Signed-off-by: suchetanrs <[email protected]>
Signed-off-by: suchetanrs <[email protected]>
Signed-off-by: suchetanrs <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants