Add a "Drop" reason or allow invalid messages to be received/recorded #1023

Blutkoete · 2023-03-20T09:11:13Z

Blutkoete
Mar 20, 2023

Hello!

This idea came up when reading this pull request.

As it was always a pain in my past to debug problems where messages were dropped by one of the layers between the source and my application by a framework or middleware, I wanted to ask whether it is possible to add a "drop reason" to dropped messages and allow for an option to change this behaviour for a subscriber for debugging, e.g. telling the recorder to also record messages that would otherwise be dropped to e.g. a wrong length, being out of order, ... or having my own debugging subscriber with that "allow invalid messages" option set to see what is actually going wrong.

With all today's middlewares, you basically have to revert to tcpdump/tshark/wireshark for doing such an analysis and this might set eCAL apart.

Best regards,

Tobias

KerstinKeller · 2023-03-20T09:51:11Z

KerstinKeller
Mar 20, 2023
Collaborator

Hi Tobias,

I see you're still following our development 😄
You're making a very good point. Diagnostics are really important if you're trying to find the root cause of a problem, we realized that again just a few days ago.
eCAL is already not very verbose when things go wrong, and I think dropping out of order messages might not make that better.

In the usual sense, we get a drop, when the data doesn't reach the subscriber, for whatever reason. Introducing now a drop might just make things illogical to the user, since it hides the root cause of the problem.

0 replies

FlorianReimold · 2023-03-20T09:51:26Z

FlorianReimold
Mar 20, 2023
Maintainer

I recently wrote a documentation page regarding message drops:
https://eclipse-ecal.github.io/ecal/advanced/message_drops.html

whether it is possible to add a "drop reason" to dropped messages

Hm, honestly this sounds utterly complicated. Quite often, you just don't know why a message has been dropped. E.g. if you send a big UDP datagram, let it fragment into IP Fragments and one of the fragments gets lost, you won't even see any of the other fragments in your userspace application. That's why you often fall back to wireshark, that lets you debug these kind of issues on a much lower level.
TCP for instance even drops messages on the publisher side, if the transfer speed of the last message is too slow. So the subscriber will never see any part of those messages and only notice that the sequence number jumped ahead.

However, in certain occasions you know the reason in the subscriber, i.e. for the "Dropping in application layer" Section (see link above and scroll down). I see how even just knowing this could be beneficial.

Detecting a drop is possible and easy with the sequence number. Determining the dropping reason could theoretically be classified in "application-layer-drop" and "Not-application-layer-drop". For the latter class I think it is very hard to determine the actual reason.

and allow for an option to change this behaviour for a subscriber for debugging, e.g. telling the recorder to also record messages that would otherwise be dropped to e.g. a wrong length, being out of order

I think recording dropped messages is technically just not possible. Because they have been dropped. If e.g. a network switch deleted a message from its internal buffer due to an overflow, there is nothing to record, except from the information that something is missing. And that information is actually still existing in the recording, as the sequence numbers are also saved to the file.

0 replies

Blutkoete · 2023-03-21T07:57:40Z

Blutkoete
Mar 21, 2023
Author

I agree that eCAL cannot determine what the drop reason is if is was below the application layer, but knowing the reason for the application layer to drop it would still be nice in my opinion. Knowing why something doesn't reach us if e.g. the Kernel drops it already is probably impossible (I think even Wireshark just reports those packets as "dropped" if I recall correctly).

Whether dropping e.g. out of order packages at all instead of letting the application decide is probably an intense debate depending on the use case. Someone running a productive algorithm will want the drop without needing to roll out their own logic, while someone debugging the communication in the vehicle will want to record errors as well. That's why I'm a big fan of making something like that configurable; but I see pros and cons for all possible implementations.

The eCAL documentation is quite good, I admit I didn't search for message drops there before I opened this thread.

Related question: Is there a Wireshark dissector for eCAL communication available?

1 reply

FlorianReimold Mar 22, 2023
Maintainer

Related question: Is there a Wireshark dissector for eCAL communication available?

Not publicly at the moment. We did have one internally, but I have no idea if it ever worked. But I just had somebody dig up the code, so let's see if it works and if we can open-source it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a "Drop" reason or allow invalid messages to be received/recorded #1023

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Add a "Drop" reason or allow invalid messages to be received/recorded #1023

Blutkoete Mar 20, 2023

Replies: 3 comments · 1 reply

KerstinKeller Mar 20, 2023 Collaborator

FlorianReimold Mar 20, 2023 Maintainer

Blutkoete Mar 21, 2023 Author

FlorianReimold Mar 22, 2023 Maintainer

Blutkoete
Mar 20, 2023

Replies: 3 comments 1 reply

KerstinKeller
Mar 20, 2023
Collaborator

FlorianReimold
Mar 20, 2023
Maintainer

Blutkoete
Mar 21, 2023
Author

FlorianReimold Mar 22, 2023
Maintainer