Some features added to live tracing session. #165

suchetanrs · 2025-04-05T06:59:29Z

The changes in this PR are as follows:

Add test to lttngpy for creating and destroying a live session
I changed the --live argument and added the arguments --live-timer-interval and --live-tracing-url-origin
Live tracing can be triggered with the start verb.

Signed-off-by: suchetanrs <[email protected]>

* Enable live tracing through --live option * Allow providing a url to send tracing output Signed-off-by: suchetanrs <[email protected]>

christophebedard · 2025-04-06T00:27:36Z

Thanks for the PR! I'll try to take a look tomorrow.

Signed-off-by: suchetanrs <[email protected]>

suchetanrs · 2025-04-06T11:33:18Z

I just found out that a job that passed in my older PR now failed. Seems like a bunch of tests are failing :p
I've managed to fix push a fix for few of them but I think most are still because of the API changes I made.
I feel that I should attempt to fix this later because the API is still subject to change. Please let me know if this is okay! :)

christophebedard · 2025-04-06T23:50:56Z

Yeah, don't worry, GitHub CI is a bit broken right now! See #160. You can ignore those test failures. I'll fix them once I get some time.

tracetools_trace/tracetools_trace/tools/args.py

tracetools_trace/tracetools_trace/tools/lttng_impl.py

tracetools_trace/tracetools_trace/trace.py

suchetanrs · 2025-04-07T04:09:01Z

Thanks for your review comments!!
I will quickly setup a second computer with the tracing repository for the url comment and will work on all of them tomorrow once I finish my GSoC proposal for this :p

suchetanrs · 2025-04-11T18:24:09Z

Hey! @christophebedard

I've setup two different laptops with ros2 tracing. I can ssh between the laptops and ros communication between them seems to work. However, when I try to connect to the ip address using babeltrace, I always get a connection refused error. In fact, if I try to view the trace output from the same laptop for an ip that's not localhost, I get the same issue.

I've built LTTng from source and I'm trying to understand the flow of functions. It seems like the supported protocols from the input url are net, net6, tcp and tcp6. We can also specify the input ports we'd like to use (upto 2 ports for net/net6 and a single one for tcp) in the following manner protocol://address:PORT1:PORT2 (just sharing some useful info that I found here :p).
If I use the tcp protocol instead of net, I cannot view the output on babeltrace because it says Unknown protocol: tcp6. I'm not sure if babeltrace supports tcp.
I'm still looking for the exact location in the lttng code base where the input url is resolved to something like net://localhost/host/host_name/session_name.

Please let me know if you have some inputs here. I'll continue to read and understand what's done in LTTng before finishing the PR just so I'm aware of what I'm doing :p

christophebedard · 2025-04-12T00:20:45Z

Thanks for looking into it!

I think we might have to use the LTTng relay daemon (lttng-relayd) if we're sending the tracing data from one system to another. I found this example here: https://pavelmakhov.com/2017/01/lttng-streaming/ (note the comment about the ports needing to be open; not sure if that will be an issue for you, otherwise maybe you can try different ports?). So the URL on the system that is tracing should point to the host that is receiving the trace data/that has a relay daemon. The host that is receiving the trace data should start a relay daemon, and the URL used for receiving the trace data should point to localhost but include the hostname of the remote system (see the last code block).

Knowing the above, these sections in the LTTng docs make a lot more sense:

The last paragraph of this section about live tracing and starting the relay daemon on another system: https://lttng.org/docs/v2.13/#doc-lttng-live
This whole section about sending trace data over the network and pointing to the host on which the relay daemon is running: https://lttng.org/docs/v2.13/#doc-sending-trace-data-over-the-network

If we combine those two use-cases, we basically achieve what we want, i.e., live tracing and sending the data to a remote system. And the URL format makes more sense!

Let me know if that makes sense and if you can get it to work knowing this.

I've built LTTng from source

You're very courageous! 😁

Signed-off-by: suchetanrs <[email protected]>

suchetanrs · 2025-04-22T15:18:40Z

Thanks a lot for your reply @christophebedard !!
Sorry for the late reply. I completely missed this part in the documentation. When I was trying to read the trace data with babeltrace, instead of localhost, I always used the IP of the remote system. I guess that explains why I got the connection refused error. I followed the blog you shared and it seems to work well for me with two computers 😄

I went on an LTTng documentation study spree and hence the delay with PR. I've uploaded my study notes here :p

I've addressed your review comments but I have a question:
Do we expect users to start lttng-relayd manually? I'm not sure if it makes a lot of sense to do this but if the input url is just localhost, can we spawn the relayd on the same machine similar to the below code using lttng-relayd --daemonize?

ros2_tracing/tracetools_trace/tracetools_trace/tools/lttng_impl.py

Line 134 in aa718f3

def spawn_session_daemon() -> None:

If we do this, maybe we can somehow use the --path parameter to write the live traces to the default folder with $ROS_HOME instead of $LTTNG_HOME?

Signed-off-by: suchetanrs <[email protected]>

christophebedard · 2025-04-25T00:12:15Z

Thanks for the changes, I will take a look tomorrow.

Sorry for the late reply. I completely missed this part in the documentation. When I was trying to read the trace data with babeltrace, instead of localhost, I always used the IP of the remote system. I guess that explains why I got the connection refused error. I followed the blog you shared and it seems to work well for me with two computers 😄

Great!

I went on an LTTng documentation study spree and hence the delay with PR. I've uploaded my study notes here :p

I'll have to take a look at this!

Do we expect users to start lttng-relayd manually? I'm not sure if it makes a lot of sense to do this but if the input url is just localhost, can we spawn the relayd on the same machine similar to the below code using lttng-relayd --daemonize?

Oh, interesting! It does look like the lttng create command is the one that automatically starts the relay daemon if you create a --live session with the lttng CLI and don't provide a URL (and not liblttng-ctl, which lttngpy uses): https://github.com/lttng/lttng-tools/blob/e710f55dbf2b15e4874d38875338ac695a31cbfa/src/bin/lttng/commands/create.c#L255-L270

In that case, then yes we could definitely spawn it similar to spawn_session_daemon(), unless there's already a relay daemon.

suchetanrs · 2025-04-25T15:32:30Z

Thanks!! :D

I will add spawning the relayd to this as well!

christophebedard

Some more comments, but overall this is looking good

christophebedard · 2025-04-25T23:50:12Z

tracetools_trace/tracetools_trace/tools/args.py

-        default=1000000,
-        help='TODO (default: %(default)s)')
+        const=100000,
+        help='Set the live timer interval (default: %(default)s)')


This does set the live timer interval, but the main use of the --live flag is to create a live tracing session, so this should be something like:

Suggested change

help='Set the live timer interval (default: %(default)s)')

help='Create a live tracing session. Optionally set the live timer interval (default: %(default)s)')

(double check the line length and split it over two lines if necessary)

christophebedard · 2025-04-25T23:52:43Z

tracetools_trace/tracetools_trace/tools/args.py

+    parser.add_argument(
+        '--live-url', dest='live_url', type=str,
+        default='net://localhost',
+        help='Set the live tracing URL origin (default: %(default)s)')


This is just a note for some improvements later, no need to do anything now:

We should document what this all means (timer interval, URL, etc.), link to the relevant parts of the LTTng docs, provide some examples, etc. This could go in the README.

Okay, understood 👍
I will raise a PR for the updates in the README after this.

christophebedard · 2025-04-25T23:53:37Z

tracetools_trace/tracetools_trace/tools/lttng_impl.py

    :param subbuffer_size_kernel: the size of the subbuffers for kernel events (defaults to 32
        times the usual page size, since there can be way more kernel events than UST events)
+    :param live_timer_interval: the time interval at which the data should be flushed from the
+        buffer and sent to the LTTng relay. This is in microseconds.


Suggested change

buffer and sent to the LTTng relay. This is in microseconds.

buffer and sent to the LTTng relay daemon. This is in microseconds.

christophebedard · 2025-04-25T23:54:30Z

tracetools_trace/tracetools_trace/tools/lttng_impl.py

        the usual page size)
    :param subbuffer_size_kernel: the size of the subbuffers for kernel events (defaults to 32
        times the usual page size, since there can be way more kernel events than UST events)
+    :param live_timer_interval: the time interval at which the data should be flushed from the


This should also say that the tracing session will be in live mode if this value is not None.

christophebedard · 2025-04-25T23:54:53Z

tracetools_trace/tracetools_trace/tools/lttng_impl.py

        times the usual page size, since there can be way more kernel events than UST events)
+    :param live_timer_interval: the time interval at which the data should be flushed from the
+        buffer and sent to the LTTng relay. This is in microseconds.
+        Used only if live_timer_interval is `True`.


Suggested change

Used only if live_timer_interval is `True`.

Used only if live_timer_interval is not `None`.

christophebedard · 2025-04-26T00:10:23Z

tracetools_trace/tracetools_trace/tools/lttng_impl.py

+        # TODO(christophebedard): do we need to join the
+        #   base_path with session_name for a live session?
+        #   We need to return a path, so maybe format it like:
+        #   "net://localhost/host/$hostname/$session_name"


You can remove this comment

christophebedard · 2025-04-26T00:10:56Z

tracetools_trace/tracetools_trace/tools/lttng_impl.py

        # TODO(christophebedard): figure out what to provide here as the URL
        #   This depends on how we expect users to use live tracing
        #   See the documentation for the url param of lttng_create_session_live()


You can remove this comment

christophebedard · 2025-04-26T00:19:12Z

tracetools_trace/tracetools_trace/trace.py

+    if live_timer_interval is None:
+        assert trace_directory == full_session_path


The value returned by lttng.lttng_init() (trace_directory) should be equivalent to full_live_url if we're in live mode, so we should assert that they are equal too

christophebedard · 2025-04-26T00:24:33Z

tracetools_trace/tracetools_trace/trace.py

+    live_tracing_url = (
+        'net://localhost/host/' + socket.gethostname() + '/' +
+        session_name
+    )


Similar to my other comment, this should probably use live_url instead of net://localhost.

christophebedard · 2025-04-26T00:25:00Z

tracetools_trace/tracetools_trace/trace.py

+        'net://localhost/host/' + socket.gethostname() + '/' +
+        session_name
+    )
+    print(f'live trace data will be sent to: {live_tracing_url} on the system running lttng-relayd')


I would remove "on the system running lttng-relayd" but still mention the relay daemon before the :, like:

Suggested change

print(f'live trace data will be sent to: {live_tracing_url} on the system running lttng-relayd')

print(f'live trace data will be sent to the relay daemon at: {live_tracing_url}')

suchetanrs · 2025-04-29T08:32:39Z

Hey, I tried a few things about your question in #165 (comment)

If we consider the following scenario: The tracing is launched on a machine say M_trace with host name M_trace_host and IP M_trace_ip and the relay daemon is running on a machine on the same network say M_remote and M_remote_host and IP M_remote_ip.

At first, I started the relay daemon only using the command lttng-relayd on M_remote. In this case, the relay daemon listens for viewer connections only on localhost, so it would work only if I ran babeltrace on M_remote using the URL net://localhost/host/M_trace_host/session-name
In this setting, if I ran babeltrace on M_trace with the URL net://localhost/host/M_trace_host/session-name, I got a connection refused error without any surprises. Even with the URL net://M_remote_ip/host/M_trace_host/session-name, I got the same error.

On the other hand, if I start the relay daemon on M_remote using the command lttng-relayd --live-port=tcp://0.0.0.0:5344 instead of just lttng-relayd, it enables the daemon to find viewer connections anywhere in the network https://lttng.org/man/8/lttng-relayd/v2.13/. So this time when I ran babeltrace with the URL net://localhost/host/M_trace_host/session-name or net://M_remote_ip/host/M_trace_host/session-name on M_remote, both worked fine. And on M_trace, running babeltrace with the URL net://M_remote_ip/host/M_trace_host/session-name worked fine.

So the bottom line is that I am confused about what should we inform the user when they run live tracing 😂
If we say that the live trace output will be available at net://localhost/host/M_trace_host/session-name then it would enforce them to run the relay-d using lttng-relayd on the same machine as babeltrace otherwise it would not work.

On the other hand, like you mentioned in #165 (comment), if we use the live_url directly, we should inform the user about running the relay daemon with the command lttng-relayd --live-port=tcp://0.0.0.0:5344 so they can run babeltrace on any machine with our informed live_url. Shall I continue the PR with the later setting and adding a comment about starting the relay daemon with the extra --live-port?

Signed-off-by: suchetanrs <[email protected]>

suchetanrs added 2 commits April 5, 2025 11:15

Fix description and add test for live tracing

e19f2b4

Signed-off-by: suchetanrs <[email protected]>

Changes:

5476fb7

* Enable live tracing through --live option * Allow providing a url to send tracing output Signed-off-by: suchetanrs <[email protected]>

suchetanrs mentioned this pull request Apr 5, 2025

Allow creating a live tracing session with ros2 trace/Trace #149

Open

christophebedard mentioned this pull request Apr 6, 2025

Enable creating live session from lttngpy #164

Closed

Fix some failing tests

aa37210

Signed-off-by: suchetanrs <[email protected]>

christophebedard reviewed Apr 7, 2025

View reviewed changes

Address PR changes

bf892d9

Signed-off-by: suchetanrs <[email protected]>

suchetanrs force-pushed the suchetan/live-tracing-session branch from 0c813e3 to bf892d9 Compare April 22, 2025 14:46

Move not live_timer_interval to live_timer_interval is None

d9e3129

Signed-off-by: suchetanrs <[email protected]>

christophebedard reviewed Apr 26, 2025

View reviewed changes

suchetanrs added 3 commits May 1, 2025 22:11

Fix args.py

c799fbf

Signed-off-by: suchetanrs <[email protected]>

Fix lttng_impl.py

edd0acf

Signed-off-by: suchetanrs <[email protected]>

Fix trace.py

403b2d7

Signed-off-by: suchetanrs <[email protected]>

alsora assigned christophebedard May 9, 2025

	help='Set the live timer interval (default: %(default)s)')
	help='Create a live tracing session. Optionally set the live timer interval (default: %(default)s)')

	buffer and sent to the LTTng relay. This is in microseconds.
	buffer and sent to the LTTng relay daemon. This is in microseconds.

	Used only if live_timer_interval is `True`.
	Used only if live_timer_interval is not `None`.

		if live_timer_interval is None:
		assert trace_directory == full_session_path

	print(f'live trace data will be sent to: {live_tracing_url} on the system running lttng-relayd')
	print(f'live trace data will be sent to the relay daemon at: {live_tracing_url}')

Some features added to live tracing session. #165

Are you sure you want to change the base?

Some features added to live tracing session. #165

Uh oh!

Conversation

suchetanrs commented Apr 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

christophebedard commented Apr 6, 2025

Uh oh!

suchetanrs commented Apr 6, 2025

Uh oh!

christophebedard commented Apr 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

suchetanrs commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

suchetanrs commented Apr 11, 2025

Uh oh!

christophebedard commented Apr 12, 2025

Uh oh!

suchetanrs commented Apr 22, 2025

Uh oh!

christophebedard commented Apr 25, 2025

Uh oh!

suchetanrs commented Apr 25, 2025

Uh oh!

christophebedard left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

suchetanrs commented Apr 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

suchetanrs commented Apr 5, 2025 •

edited

Loading

suchetanrs commented Apr 7, 2025 •

edited

Loading

suchetanrs commented Apr 29, 2025 •

edited

Loading