Skip to content

sdo.client: Add missing abort messages #594

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 13, 2025

Conversation

acolomb
Copy link
Member

@acolomb acolomb commented Jun 17, 2025

SDO abort messages were sent in some cases when a response times out or has an unexpected server command specifier. But not consistently, thus the following cases are now added:

  • Mismatched scs:

    • ReadableStream, after upload segment request
    • WritableStream, after download initiate request
    • WritableStream, after expedited download request
    • WritableStream, after download segment request
  • Toggle bit mismatch (reports as not toggled):

    • ReadableStream, after upload segment request

Some reformatting of abort code literals squeezed in there, as well as a typo fix.

This is based on the changes in #352, isolated from the typing noise.

Copy link

codecov bot commented Jun 17, 2025

Codecov Report

Attention: Patch coverage is 36.00000% with 16 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
canopen/sdo/client.py 36.00% 16 Missing ⚠️

📢 Thoughts on this report? Let us know!

@acolomb acolomb marked this pull request as draft June 17, 2025 21:37
@acolomb
Copy link
Member Author

acolomb commented Jun 17, 2025

Sending the abort message when SdoClient.read_response() times out is not always desired, as some code paths have their own abort / retry handling. This needs to be investigated more for each call to that function. I guess we'll make the abort message on timeout optional via a new method parameter.

For each call to SdoClient.read_response(), the timeout abort message
can either be sent immediately before raising the exception, or it is
sent later, after the exception has been handled e.g. by a retry.  Add
a boolean parameter to the function, which is usually False to skip
the immediate transmission of an SDO abort.  Only two cases actually
need it, passing the option as True.
@acolomb
Copy link
Member Author

acolomb commented Jun 18, 2025

The last commit introduces such an optional parameter, whether an immediate abort on timeout is appropriate. However, we might want to skip that and just handle timeouts in the two remaining cases directly, by catching the exception, sending the SDO abort there and re-raising.

Investigating all these call sites also uncovered a bug in BlockUploadStream._retransmit(), where a RESPONSE_TIMEOUT is set "manually", but if the same timeout expires in SdoClient.read_response(), the raised exception is not caught and aborts the loop instead of exiting the function with its specialized exception message. That needs a separate fix.

@acolomb acolomb added this to the v2.4.0 milestone Jun 18, 2025
Comment on lines 72 to 73
if timeout_abort:
self.abort(0x0504_0000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified. But I'm not fond of using 0x0504_0000 directly in the code. They should rather be constants addressed by name. Add another PR for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can fix up the magic numbers in code in a follow-up. I'm usually very strict about that, but wanted to keep disruption low while we're looking at an actual bugfix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're touching the code with this PR, I think I would suggest doing a quick and dirty fix to add them as literals. It can still be made as a minimal disruption change. E.g. we don't need to make literals of all SDO Errors, just the few we touch here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's focus on the logic first. I will follow up with the literals change once this is merged. The bigger question is whether we want to get rid of the extra parameter again, as mentioned in #594 (comment)?

@acolomb acolomb marked this pull request as ready for review June 30, 2025 08:06
@acolomb
Copy link
Member Author

acolomb commented Jul 6, 2025

@sveinse could you have another look please? This is basically the last thing I'm waiting on for the v2.4.0 release.

Copy link
Collaborator

@sveinse sveinse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reviewed the changes and I've verified all the changed SDO codes against the 301 standard. I've added two comments which I think should be considered. Other that that this looks good.

Comment on lines +586 to +587
except SdoCommunicationError:
self.abort(0x0504_0000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes that read_response() only generates SdoCommunicationError on timeout. It does so now, so it works, but this assumption might be a little fragile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... I don't see that as fragile, it's even in the same source file to check against.

You're right in the opposite direction though. We don't process other exceptions by sending a different abort message. But that's a different improvement to be made IMHO.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If someone changes the behavior in read_response() where SdoCommunicationError might be used for something else in addition to timeout, then the code here will not be correct. This implicit, undocumented, expectancy is what's fragile. So how can we mitigate that?

Either put a comment in read_response() stating that "the callers of this function assumes SdoCommunicationError is generated for timeout only", or a comment in this function stating that "This assumes read_response() only return SdoCommunicationError on timeout".

The code as it is is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a docstring to the method, explicitly mentioning the timeout as reason.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Looks good.

Comment on lines +748 to +749
except SdoCommunicationError:
self.sdo_client.abort(0x0504_0000)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

@acolomb acolomb merged commit 0130eb8 into canopen-python:master Jul 13, 2025
3 of 5 checks passed
@acolomb
Copy link
Member Author

acolomb commented Jul 13, 2025

Thanks @sveinse!

@acolomb acolomb deleted the sdo-client-missing-aborts branch July 13, 2025 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants