Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPI_T Events #13133

Open
wants to merge 24 commits into
base: main
Choose a base branch
from
Open

MPI_T Events #13133

wants to merge 24 commits into from

Conversation

kingshuk00
Copy link

This is the work from PR8057 rebased after PR13086.

Feel free to comment any suggestions. I shall list the suggested changes from the old PR and ask for advice about their relevance at present.

hjelmn and others added 22 commits March 10, 2025 14:02
dss.h has been removed since this PR was originally opened

Signed-off-by: Howard Pritchard <[email protected]>
- Correct MPI_T_event_dropped_cb_function arguments
- Check for negative event index

Signed-off-by: Chris Chambreau <[email protected]>
new profile interface generation method

Signed-off-by: Howard Pritchard <[email protected]>
Signed-off-by: Howard Pritchard <[email protected]>
by fixing some code

Signed-off-by: Howard Pritchard <[email protected]>
Signed-off-by: Chris Chambreau <[email protected]>
Signed-off-by: Chris Chambreau <[email protected]>
Signed-off-by: Chris Chambreau <[email protected]>
Signed-off-by: Howard Pritchard <[email protected]>
@kingshuk00
Copy link
Author

kingshuk00 commented Mar 10, 2025

Todo-list- Unresolved comments from the previous PR8057:

  1. at ompi/datatype/ompi_datatype_lookup_by_opal_id.c: distinction between opal datatypes and MPI datatypes link
  2. at ompi/mca/pml/ob1/pml_ob1_recvreq.c: turning events off completely link
  3. Use of opal_clock_gettime() for timestamps link

@kingshuk00
Copy link
Author

@hppritcha @jsquyres @hjelmn @cchambreau
I'm tagging all of you for feedback. Please ignore if this is not relevant for you.

@hppritcha
Copy link
Member

Has this PR been rebased on main? The mpi4py failure looks like it needs rebaing.

@kingshuk00
Copy link
Author

I rebased with main which involved resolving a few conflicts. That may be the reason. I'm checking this.

@devreal
Copy link
Contributor

devreal commented Mar 10, 2025

It looks like the singleton test fails in the mpi4py test. The branch seems up to date with master, minus #13126 that was just merged.

@hppritcha
Copy link
Member

here's the traceback:

#0  0x00007ffff71ec82d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffff71e5ad9 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007ffff17c1c18 in opal_thread_internal_mutex_lock (p_mutex=0x7ffff1acd6e8 <mca_base_source_lock+40>) at ../../../opal/mca/threads/pthreads/threads_pthreads_mutex.h:82
#3  0x00007ffff17c1ccc in opal_mutex_lock (mutex=0x7ffff1acd6c0 <mca_base_source_lock>) at ../../../opal/mca/threads/mutex.h:122
#4  0x00007ffff17c2325 in mca_base_source_get_by_name (name=0x7cabc0 "mca_base_default_source", source_out=0x7fffffff3330) at mca_base_source.c:165
#5  0x00007ffff17c2473 in mca_base_source_register (project=0x7ffff1885685 "opal", framework=0x7ffff1885681 "mca", component=0x7ffff188567c "base", name=0x7ffff188566d "default_source", description=0x7ffff188564f "Default source for MCA events", 
    ordered=true, source_time=0x7ffff17c1dd1 <mca_base_source_default_time_source>, source_ticks=1) at mca_base_source.c:202
#6  0x00007ffff17c1f77 in mca_base_source_init () at mca_base_source.c:76
#7  0x00007ffff17c3438 in mca_base_var_init () at mca_base_var.c:282
#8  0x00007ffff17c5bfe in register_variable (project_name=0x7ffff1889c0a "opal", framework_name=0x0, component_name=0x7ffff1889c05 "base", variable_name=0x7ffff1889bf6 "help_aggregate", 
    description=0x7ffff1889a98 "If opal_base_help_aggregate is true, duplicate help messages will be aggregated rather than displayed individually.  This can be helpful for parallel jobs that experience multiple identical failures; "..., 
    type=MCA_BASE_VAR_TYPE_BOOL, enumerator=0x0, bind=0, flags=(unknown: 0), info_lvl=OPAL_INFO_LVL_9, scope=MCA_BASE_VAR_SCOPE_LOCAL, synonym_for=-1, storage=0x7ffff1ace1e4 <opal_help_want_aggregate>) at mca_base_var.c:1376
#9  0x00007ffff17c65c5 in mca_base_var_register (project_name=0x7ffff1889c0a "opal", framework_name=0x0, component_name=0x7ffff1889c05 "base", variable_name=0x7ffff1889bf6 "help_aggregate", 
    description=0x7ffff1889a98 "If opal_base_help_aggregate is true, duplicate help messages will be aggregated rather than displayed individually.  This can be helpful for parallel jobs that experience multiple identical failures; "..., 
    type=MCA_BASE_VAR_TYPE_BOOL, enumerator=0x0, bind=0, flags=(unknown: 0), info_lvl=OPAL_INFO_LVL_9, scope=MCA_BASE_VAR_SCOPE_LOCAL, storage=0x7ffff1ace1e4 <opal_help_want_aggregate>) at mca_base_var.c:1546
#10 0x00007ffff17e10af in opal_show_help_init () at show_help.c:76
#11 0x00007ffff17afe43 in opal_init_util (pargc=0x0, pargv=0x0) at runtime/opal_init_core.c:481
#12 0x00007ffff1b8ff3d in ompi_mpi_instance_retain () at instance/instance.c:258
#13 0x00007ffff1b90401 in ompi_mpi_instance_init_common (argc=0, argv=0x0) at instance/instance.c:376
#14 0x00007ffff1b91714 in ompi_mpi_instance_init (ts_level=3, info=0x7ffff21f37a0 <ompi_mpi_info_null>, errhandler=0x7ffff21eb0e0 <ompi_mpi_errors_are_fatal>, instance=0x7ffff21fc300 <ompi_mpi_instance_default>, argc=0, argv=0x0)
    at instance/instance.c:837
#15 0x00007ffff1b856cd in ompi_mpi_init (argc=0, argv=0x0, requested=3, provided=0x7fffffff42c8, reinit_ok=false) at runtime/ompi_mpi_init.c:359
#16 0x00007ffff1be6831 in PMPI_Init_thread (argc=0x0, argv=0x0, required=3, provided=0x7fffffff42c8) at init_thread.c:78

@kingshuk00
Copy link
Author

Thank you. Yes, I'm checking if removing opal-locks during registration helps or not.
Such locks aren't used for pvars.

@kingshuk00
Copy link
Author

kingshuk00 commented Mar 10, 2025

Removing OPAL_THREAD_[LOCK/UNLOCK](&mca_base_source_lock) calls at the following functions resolve this issue:

  1. mca_base_source_init() (link)
  2. mca_base_source_finalize() (link)
  3. mca_base_source_register() (link)

Similar functions such as mca_base_pvar_init() (link), mca_base_pvar_finalize() (link), and mca_base_pvar_register() (link) do not have such locks.

Could any of you please suggest any plausible effects I need check for before removing them? It seems to me that they aren't necessary.

The alternative is to unlock before calling mca_base_source_get_by_name(). Due to lack of familiarity, I need suggestion here.

@hppritcha
Copy link
Member

Another option would be to use opal_recursive_mutex_t type mutex. probably removing the locking (which does seem buggy) would cause MPI_T to not support threadmultiple. I suspect this is the first time we're running our full git hub actions CI on this code.

@bosilca
Copy link
Member

bosilca commented Mar 10, 2025

There seems to already be an OPAL_THREAD_UNLOCK in mca_base_source_init before calling mca_base_source_register so the mutex should already be unlocked.

There is a valid reason why pvars do not need to be protected, they can only be created during the init step, and the performance variables will be alive for the entire duration of the application, so as long as opal_util_init/fini are called in a protected way the pvars are getting the same protection.

I don't know what a source event is so I cannot comment on that.

@hppritcha
Copy link
Member

The problem is the lock is taken first in the call to mca_base_source_register which then makes a call to mca_base_source_get_by_name which then also tries to take the lock. As is clear from the tracebackabove.

@hppritcha
Copy link
Member

It is possible the number of sources can increase during an application run (see section 15.3.8 of the MPI 4.1 standard), so there probably should be locking besides just in the source init and finalize methods (which was the only locking in the file in @hjelmn 's original PR).

…line mca_base_source_get_by_name() called from one place.

Signed-off-by: Kingshuk Haldar <[email protected]>
@bosilca
Copy link
Member

bosilca commented Mar 11, 2025

I don't see a lock in mca_base_source_get_by_name but that might be due to the last push. Anyway, there is an unlock left in mca_base_source_get_by_name and that's definitively no good. Moreover, asmca_base_source_get_by_name is only called from mca_base_source_register with the lock help, there shall be no reason for mca_base_source_get_by_name to take it again.

Section 15.3.8 in the MPI standard states indeed that event sources can be added during the application execution, but in OMPI event sources are added during module initialization, and that happens in a single sequential context. Unless we are expecting sessions to be created from multiple threads and allow them to load different sets of modules I don't think we need any protection on the event creation path.

Looking more carefully at this code I see other issues with locking. Let's look at mca_base_source_get a read operation on a array where events cannot be removed (aka source events). It is however protected by the same mca_base_source_lock as the rest of the source events code. However, all usages of mca_base_source_get are also protected by the MPI_T lock (ompi_mpit_lock / ompi_mpit_unlock), so that code must be double safe. We never know !!!

At this point this PR looks very sketchy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants