async: review and refactor interaction between main thread and worker threads #143

doudou · 2021-12-02T16:32:11Z

Turns out that the async code was actually doing a lot of sync
stuff behind the scenes, which was causing very noticeable lag
and blocking behaviors when on bad connections.

The first change was to create main_thread_call helper. One marks
a "main thread only" method like so

main_thread_call def some_method
end

and method calls not in the main thread will cause an exception to
be raised. This is rather obviously meant for debugging.

From there, I've fixed calls that were coming from a worker thread
but should not have. In most cases, the pattern I had to change was

def some_method
  @event_loop.defer ... do
    do some async call in sync form (i.e. without blocks)
  end
end

This was causing a lot of headaches as the code that turns async
calls into sync calls depends on a lot of synchronization (e.g.
with @delegator_obj), which was solved by having huge
@mutex.synchronize blocks, serializing major parts of the execution.

These calls have been changed into the non-blocking block form
(a.k.a. callback form), and are called from the main thread.

One major change is the removal of sub-genres of Async::NameService.
These were rather pointless, as turning a sync into an async object
can be done generically through to_async. The "new" Async::NameService
simply delegates to the non-async get in a worker thread and then
turns the non-async object into an async object.

Another major change is how Async::TaskContext creates itself. The
very unfortunate design choice in the async and proxy version of
TaskContext was to let them get a name as argument, and let them
do the resolution. This brings a lot of complexity in the async
case, as there is a zone where the async task context does not
have an underlying task context. Proxies are meant for that, not
async. Generally speaking, this interface should be changed to
do the name resolution in name_service (duh).

Unlike zero! which zeroes even the enums, Type.zero creates a value that is properly initialized, but zero everywhere the "value" is not specified.

…cesses If the call happens before the connection is actually closed, OmniORB handles it as a timeout. Otherwise, it is a ComError. This is actually not-so-great in our case, as we really do not want the timeout to happen. Note that it does not apply to crashing components, so the general Syskit behavior should not be affected.

a846ec3692b5020c3ed920683c3393cfc828943d uses the setTaskStates hook in RTT to get properly ordered state changes (see the corresponding commit in RTT 07d7922ba1f0ced3f0288652edca8c6bbd69200f). This causes some state change notifications to be duplicated (notified once by orogen and once by RTT)

This partially reverts commit c7f516d. cf252a4 changed the name of the "core" read method. This method was overloaded in OutputReader to raise if the remote process was dead, and disconnect the read side. I think the boat has sailed on {#read} not raising, and I'm not willing to re-introduce that behavior. However, I do believe the disconnect_all is a good move: it cleans up, and makes sure that a caller that cares does get notified that the reader is dead.

By reading the IOR of a task context from a pipe, we are not limited by CORBA name service to resolve the tasks. The wait running method must be called to register the IORs for each task in the process. This is done by reading the message from the pipe, validating it and saving it as a Hash. Later, this Hash is used to resolve the tasks when the resolve_all_tasks method is called. Since this method might be called before wait_running is ran, a nonblocking wait_running is called in it to ensure that the IORs are registered. Overall, the expected pipeline for Syskit is: The process client calls the process server, which spawns the Orocos::Process. On spawn, the IOR pipe is created. Then, the execution engine triggers the deployment ready resolution, which calls for the process' wait_running, expecting the IOR mappings to be resolved. They are registered and then resolve_all_tasks is called. If the IORs are valid, the deployment resolve the remote task handles and hopefully is becomes ready.

This implements for ruby tasks what was done for Orocos::Process. Since the ruby tasks already know their IOR from the beginning, all this does is look for each task's IOR on the deployed tasks map. The pipeline is the same as the Orocos::Process

feat: read IOR of a task context from pipe

chore: remove obsolete log management code

… threads Turns out that the async code was actually doing a lot of sync stuff behind the scenes, which was causing very noticeable lag and blocking behaviors when on bad connections. The first change was to create main_thread_call helper. One marks a "main thread only" method like so ~~~ main_thread_call def some_method end ~~~ and method calls not in the main thread will cause an exception to be raised. This is rather obviously meant for debugging. From there, I've fixed calls that were coming from a worker thread but should not have. In most cases, the pattern I had to change was ~~~ def some_method @event_loop.defer ... do do some async call in sync form (i.e. without blocks) end end ~~~ This was causing a lot of headaches as the code that turns async calls into sync calls depends on a lot of synchronization (e.g. with @delegator_obj), which was solved by having huge @mutex.synchronize blocks, serializing major parts of the execution. These calls have been changed into the non-blocking block form (a.k.a. callback form), and are called from the main thread. One major change is the removal of sub-genres of Async::NameService. These were rather pointless, as turning a sync into an async object can be done generically through to_async. The "new" Async::NameService simply delegates to the non-async get in a worker thread and then turns the non-async object into an async object. Another major change is how Async::TaskContext creates itself. The very unfortunate design choice in the async and proxy version of TaskContext was to let them get a name as argument, and let them do the resolution. This brings a lot of complexity in the async case, as there is a zone where the async task context does not have an underlying task context. Proxies are meant for that, not async. Generally speaking, this interface should be changed to do the name resolution in name_service (duh).

There was previously no way to make a task context proxy completely stop working.

doudou · 2025-01-29T16:47:17Z

This was always almost-working and never got to make it work. Closing.

doudou force-pushed the fix_async branch from 20de925 to 1d5da76 Compare February 15, 2022 19:11

doudou force-pushed the fix_async branch from 1d5da76 to 147fbb5 Compare March 8, 2022 14:56

jhonasiv and others added 17 commits October 7, 2022 10:52

fix(test): update failing unit tests

ead7980

fix: change expected exception on partition run options tests

ac974c8

fix: change hash-as-last-arg into proper keyword arguments

dddfe77

fix: use Type.zero to create properly initialized values

5c55ede

Unlike zero! which zeroes even the enums, Type.zero creates a value that is properly initialized, but zero everywhere the "value" is not specified.

fix: style

af5976c

chore: use wait2 instead of $! to get a process exit status

67f5a77

feat(test): add unit tests for ior pipe related behavior

596cae6

Merge pull request #149 from rock-core/read-ior-from-pipe

f3a5c89

feat: read IOR of a task context from pipe

chore: remove obsolete log management code

c24d238

Merge pull request #152 from rock-core/remove_obsolete_code

0c5714b

chore: remove obsolete log management code

feat: define TaskContextProxy#disconnect

4c1e7eb

There was previously no way to make a task context proxy completely stop working.

doudou force-pushed the fix_async branch from 147fbb5 to 4c1e7eb Compare August 3, 2023 18:29

doudou closed this Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

async: review and refactor interaction between main thread and worker threads #143

async: review and refactor interaction between main thread and worker threads #143

Uh oh!

doudou commented Dec 2, 2021

Uh oh!

doudou commented Jan 29, 2025

Uh oh!

Uh oh!

async: review and refactor interaction between main thread and worker threads #143

async: review and refactor interaction between main thread and worker threads #143

Uh oh!

Conversation

doudou commented Dec 2, 2021

Uh oh!

doudou commented Jan 29, 2025

Uh oh!

Uh oh!