Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

async: review and refactor interaction between main thread and worker threads #143

Open
wants to merge 17 commits into
base: master
Choose a base branch
from

Conversation

doudou
Copy link
Member

@doudou doudou commented Dec 2, 2021

Turns out that the async code was actually doing a lot of sync
stuff behind the scenes, which was causing very noticeable lag
and blocking behaviors when on bad connections.

The first change was to create main_thread_call helper. One marks
a "main thread only" method like so

main_thread_call def some_method
end

and method calls not in the main thread will cause an exception to
be raised. This is rather obviously meant for debugging.

From there, I've fixed calls that were coming from a worker thread
but should not have. In most cases, the pattern I had to change was

def some_method
  @event_loop.defer ... do
    do some async call in sync form (i.e. without blocks)
  end
end

This was causing a lot of headaches as the code that turns async
calls into sync calls depends on a lot of synchronization (e.g.
with @delegator_obj), which was solved by having huge
@mutex.synchronize blocks, serializing major parts of the execution.

These calls have been changed into the non-blocking block form
(a.k.a. callback form), and are called from the main thread.

One major change is the removal of sub-genres of Async::NameService.
These were rather pointless, as turning a sync into an async object
can be done generically through to_async. The "new" Async::NameService
simply delegates to the non-async get in a worker thread and then
turns the non-async object into an async object.

Another major change is how Async::TaskContext creates itself. The
very unfortunate design choice in the async and proxy version of
TaskContext was to let them get a name as argument, and let them
do the resolution. This brings a lot of complexity in the async
case, as there is a zone where the async task context does not
have an underlying task context. Proxies are meant for that, not
async. Generally speaking, this interface should be changed to
do the name resolution in name_service (duh).

jhonasiv and others added 17 commits October 7, 2022 10:52
Unlike zero! which zeroes even the enums, Type.zero creates
a value that is properly initialized, but zero everywhere
the "value" is not specified.
…cesses

If the call happens before the connection is actually closed, OmniORB
handles it as a timeout. Otherwise, it is a ComError.

This is actually not-so-great in our case, as we really do not want
the timeout to happen. Note that it does not apply to crashing
components, so the general Syskit behavior should not be affected.
a846ec3692b5020c3ed920683c3393cfc828943d uses the setTaskStates hook
in RTT to get properly ordered state changes (see the corresponding
commit in RTT 07d7922ba1f0ced3f0288652edca8c6bbd69200f).

This causes some state change notifications to be duplicated (notified
once by orogen and once by RTT)
This partially reverts commit c7f516d.

cf252a4 changed the name of the "core"
read method. This method was overloaded in OutputReader to raise if
the remote process was dead, and disconnect the read side.

I think the boat has sailed on {#read} not raising, and I'm not
willing to re-introduce that behavior. However, I do believe
the disconnect_all is a good move: it cleans up, and makes sure
that a caller that cares does get notified that the reader is
dead.
By reading the IOR of a task context from a pipe, we are not limited by
CORBA name service to resolve the tasks. The wait running method must be
called to register the IORs for each task in the process. This is done
by reading the message from the pipe, validating it and saving it as a
Hash. Later, this Hash is used to resolve the tasks when the
resolve_all_tasks method is called. Since this method might be called
before wait_running is ran, a nonblocking wait_running is called in it to
ensure that the IORs are registered.

Overall, the expected pipeline for Syskit is:

The process client calls the process server, which spawns the Orocos::Process.
On spawn, the IOR pipe is created. Then, the execution engine triggers the
deployment ready resolution, which calls for the process' wait_running,
expecting the IOR mappings to be resolved. They are registered and then
resolve_all_tasks is called. If the IORs are valid, the deployment resolve
the remote task handles and hopefully is becomes ready.
This implements for ruby tasks what was done for Orocos::Process. Since
the ruby tasks already know their IOR from the beginning, all this does
is look for each task's IOR on the deployed tasks map. The pipeline is
the same as the Orocos::Process
feat: read IOR of a task context from pipe
chore: remove obsolete log management code
… threads

Turns out that the async code was actually doing a lot of sync
stuff behind the scenes, which was causing very noticeable lag
and blocking behaviors when on bad connections.

The first change was to create main_thread_call helper. One marks
a "main thread only" method like so

~~~
main_thread_call def some_method
end
~~~

and method calls not in the main thread will cause an exception to
be raised. This is rather obviously meant for debugging.

From there, I've fixed calls that were coming from a worker thread
but should not have. In most cases, the pattern I had to change was

~~~
def some_method
  @event_loop.defer ... do
    do some async call in sync form (i.e. without blocks)
  end
end
~~~

This was causing a lot of headaches as the code that turns async
calls into sync calls depends on a lot of synchronization (e.g.
with @delegator_obj), which was solved by having huge
@mutex.synchronize blocks, serializing major parts of the execution.

These calls have been changed into the non-blocking block form
(a.k.a. callback form), and are called from the main thread.

One major change is the removal of sub-genres of Async::NameService.
These were rather pointless, as turning a sync into an async object
can be done generically through to_async. The "new" Async::NameService
simply delegates to the non-async get in a worker thread and then
turns the non-async object into an async object.

Another major change is how Async::TaskContext creates itself. The
very unfortunate design choice in the async and proxy version of
TaskContext was to let them get a name as argument, and let them
do the resolution. This brings a lot of complexity in the async
case, as there is a zone where the async task context does not
have an underlying task context. Proxies are meant for that, not
async. Generally speaking, this interface should be changed to
do the name resolution in name_service (duh).
There was previously no way to make a task context proxy
completely stop working.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants