MPI_T Events #13133

kingshuk00 · 2025-03-10T15:16:38Z

This is the work from PR8057 rebased after PR13086.

Feel free to comment any suggestions. I shall list the suggested changes from the old PR and ask for advice about their relevance at present.

kingshuk00 · 2025-03-10T15:33:29Z

Todo-list- Unresolved comments from the previous PR8057:

at ompi/datatype/ompi_datatype_lookup_by_opal_id.c: distinction between opal datatypes and MPI datatypes link
at ompi/mca/pml/ob1/pml_ob1_recvreq.c: turning events off completely link
Use of opal_clock_gettime() for timestamps link

kingshuk00 · 2025-03-10T15:35:45Z

@hppritcha @jsquyres @hjelmn @cchambreau
I'm tagging all of you for feedback. Please ignore if this is not relevant for you.

hppritcha · 2025-03-10T15:37:18Z

Has this PR been rebased on main? The mpi4py failure looks like it needs rebaing.

kingshuk00 · 2025-03-10T15:43:36Z

I rebased with main which involved resolving a few conflicts. That may be the reason. I'm checking this.

devreal · 2025-03-10T18:09:29Z

It looks like the singleton test fails in the mpi4py test. The branch seems up to date with master, minus #13126 that was just merged.

hppritcha · 2025-03-10T19:21:17Z

here's the traceback:

#0  0x00007ffff71ec82d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007ffff71e5ad9 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007ffff17c1c18 in opal_thread_internal_mutex_lock (p_mutex=0x7ffff1acd6e8 <mca_base_source_lock+40>) at ../../../opal/mca/threads/pthreads/threads_pthreads_mutex.h:82
#3  0x00007ffff17c1ccc in opal_mutex_lock (mutex=0x7ffff1acd6c0 <mca_base_source_lock>) at ../../../opal/mca/threads/mutex.h:122
#4  0x00007ffff17c2325 in mca_base_source_get_by_name (name=0x7cabc0 "mca_base_default_source", source_out=0x7fffffff3330) at mca_base_source.c:165
#5  0x00007ffff17c2473 in mca_base_source_register (project=0x7ffff1885685 "opal", framework=0x7ffff1885681 "mca", component=0x7ffff188567c "base", name=0x7ffff188566d "default_source", description=0x7ffff188564f "Default source for MCA events", 
    ordered=true, source_time=0x7ffff17c1dd1 <mca_base_source_default_time_source>, source_ticks=1) at mca_base_source.c:202
#6  0x00007ffff17c1f77 in mca_base_source_init () at mca_base_source.c:76
#7  0x00007ffff17c3438 in mca_base_var_init () at mca_base_var.c:282
#8  0x00007ffff17c5bfe in register_variable (project_name=0x7ffff1889c0a "opal", framework_name=0x0, component_name=0x7ffff1889c05 "base", variable_name=0x7ffff1889bf6 "help_aggregate", 
    description=0x7ffff1889a98 "If opal_base_help_aggregate is true, duplicate help messages will be aggregated rather than displayed individually.  This can be helpful for parallel jobs that experience multiple identical failures; "..., 
    type=MCA_BASE_VAR_TYPE_BOOL, enumerator=0x0, bind=0, flags=(unknown: 0), info_lvl=OPAL_INFO_LVL_9, scope=MCA_BASE_VAR_SCOPE_LOCAL, synonym_for=-1, storage=0x7ffff1ace1e4 <opal_help_want_aggregate>) at mca_base_var.c:1376
#9  0x00007ffff17c65c5 in mca_base_var_register (project_name=0x7ffff1889c0a "opal", framework_name=0x0, component_name=0x7ffff1889c05 "base", variable_name=0x7ffff1889bf6 "help_aggregate", 
    description=0x7ffff1889a98 "If opal_base_help_aggregate is true, duplicate help messages will be aggregated rather than displayed individually.  This can be helpful for parallel jobs that experience multiple identical failures; "..., 
    type=MCA_BASE_VAR_TYPE_BOOL, enumerator=0x0, bind=0, flags=(unknown: 0), info_lvl=OPAL_INFO_LVL_9, scope=MCA_BASE_VAR_SCOPE_LOCAL, storage=0x7ffff1ace1e4 <opal_help_want_aggregate>) at mca_base_var.c:1546
#10 0x00007ffff17e10af in opal_show_help_init () at show_help.c:76
#11 0x00007ffff17afe43 in opal_init_util (pargc=0x0, pargv=0x0) at runtime/opal_init_core.c:481
#12 0x00007ffff1b8ff3d in ompi_mpi_instance_retain () at instance/instance.c:258
#13 0x00007ffff1b90401 in ompi_mpi_instance_init_common (argc=0, argv=0x0) at instance/instance.c:376
#14 0x00007ffff1b91714 in ompi_mpi_instance_init (ts_level=3, info=0x7ffff21f37a0 <ompi_mpi_info_null>, errhandler=0x7ffff21eb0e0 <ompi_mpi_errors_are_fatal>, instance=0x7ffff21fc300 <ompi_mpi_instance_default>, argc=0, argv=0x0)
    at instance/instance.c:837
#15 0x00007ffff1b856cd in ompi_mpi_init (argc=0, argv=0x0, requested=3, provided=0x7fffffff42c8, reinit_ok=false) at runtime/ompi_mpi_init.c:359
#16 0x00007ffff1be6831 in PMPI_Init_thread (argc=0x0, argv=0x0, required=3, provided=0x7fffffff42c8) at init_thread.c:78

kingshuk00 · 2025-03-10T19:22:57Z

Thank you. Yes, I'm checking if removing opal-locks during registration helps or not.
Such locks aren't used for pvars.

kingshuk00 · 2025-03-10T19:36:22Z

Removing OPAL_THREAD_[LOCK/UNLOCK](&mca_base_source_lock) calls at the following functions resolve this issue:

mca_base_source_init() (link)
mca_base_source_finalize() (link)
mca_base_source_register() (link)

Similar functions such as mca_base_pvar_init() (link), mca_base_pvar_finalize() (link), and mca_base_pvar_register() (link) do not have such locks.

Could any of you please suggest any plausible effects I need check for before removing them? It seems to me that they aren't necessary.

The alternative is to unlock before calling mca_base_source_get_by_name(). Due to lack of familiarity, I need suggestion here.

hppritcha · 2025-03-10T20:44:59Z

Another option would be to use opal_recursive_mutex_t type mutex. probably removing the locking (which does seem buggy) would cause MPI_T to not support threadmultiple. I suspect this is the first time we're running our full git hub actions CI on this code.

bosilca · 2025-03-10T21:03:22Z

There seems to already be an OPAL_THREAD_UNLOCK in mca_base_source_init before calling mca_base_source_register so the mutex should already be unlocked.

There is a valid reason why pvars do not need to be protected, they can only be created during the init step, and the performance variables will be alive for the entire duration of the application, so as long as opal_util_init/fini are called in a protected way the pvars are getting the same protection.

I don't know what a source event is so I cannot comment on that.

hppritcha · 2025-03-10T21:47:22Z

The problem is the lock is taken first in the call to mca_base_source_register which then makes a call to mca_base_source_get_by_name which then also tries to take the lock. As is clear from the tracebackabove.

hppritcha · 2025-03-10T22:03:37Z

It is possible the number of sources can increase during an application run (see section 15.3.8 of the MPI 4.1 standard), so there probably should be locking besides just in the source init and finalize methods (which was the only locking in the file in @hjelmn 's original PR).

bosilca · 2025-03-11T14:57:51Z

I don't see a lock in mca_base_source_get_by_name but that might be due to the last push. Anyway, there is an unlock left in mca_base_source_get_by_name and that's definitively no good. Moreover, asmca_base_source_get_by_name is only called from mca_base_source_register with the lock help, there shall be no reason for mca_base_source_get_by_name to take it again.

Section 15.3.8 in the MPI standard states indeed that event sources can be added during the application execution, but in OMPI event sources are added during module initialization, and that happens in a single sequential context. Unless we are expecting sessions to be created from multiple threads and allow them to load different sets of modules I don't think we need any protection on the event creation path.

Looking more carefully at this code I see other issues with locking. Let's look at mca_base_source_get a read operation on a array where events cannot be removed (aka source events). It is however protected by the same mca_base_source_lock as the rest of the source events code. However, all usages of mca_base_source_get are also protected by the MPI_T lock (ompi_mpit_lock / ompi_mpit_unlock), so that code must be double safe. We never know !!!

At this point this PR looks very sketchy.

kingshuk00 · 2025-03-18T15:23:53Z

@bosilca

I don't see a lock in mca_base_source_get_by_name but that might be due to the last push. Anyway, there is an unlock left in mca_base_source_get_by_name and that's definitively no good. Moreover, asmca_base_source_get_by_name is only called from mca_base_source_register with the lock help, there shall be no reason for mca_base_source_get_by_name to take it again.

Indeed. Thank you! I removed the one that was left. It is a static function and called just from another place. I noted that in the comment.

Looking more carefully at this code I see other issues with locking. Let's look at mca_base_source_get a read operation on a array where events cannot be removed (aka source events). It is however protected by the same mca_base_source_lock as the rest of the source events code. However, all usages of mca_base_source_get are also protected by the MPI_T lock (ompi_mpit_lock / ompi_mpit_unlock), so that code must be double safe. We never know !!!

This was introduced in c9b35e2 after this comment.
The function mca_base_source_get() is called from the following places:

opal/mca/base/mca_base_source.c:129: mca_base_source_set_time_source()
opal/mca/base/mca_base_event.c:260: mca_base_event_register()`
ompi/mpi/tool/source_get_info.c:35: MPI_T_source_get_info()
ompi/mpi/tool/source_get_timestamp.c:38: MPI_T_source_get_timestamp()

The second one (mca_base_event_register()) is not under a lock as far I can see. The first one, might as well be as it locks immediately afterwards.
The 3rd and the 4th one are indeed under different locks.

Oddly enough, mca_base_source_get() calls opal_pointer_array_get_item() which also has its own lock. So, I think the locks at mca_base_source_get() should indeed be removed.
Maybe @jsquyres also can comment here.

opal/mca/base/mca_base_source.c

opal/mca/base/mca_base_event.c

jsquyres · 2025-05-23T22:13:36Z

@kingshuk00 Any progress on this PR, perchance?

Signed-off-by: Nathan Hjelm <[email protected]> Signed-off-by: Kingshuk Haldar <[email protected]>

…or fixes. Signed-off-by: Chris Chambreau <[email protected]>

dss.h has been removed since this PR was originally opened Signed-off-by: Howard Pritchard <[email protected]>

…cking. Signed-off-by: Chris Chambreau <[email protected]>

Signed-off-by: Chris Chambreau <[email protected]>

…functionality Signed-off-by: Chris Chambreau <[email protected]>

- Correct MPI_T_event_dropped_cb_function arguments - Check for negative event index Signed-off-by: Chris Chambreau <[email protected]>

Signed-off-by: Howard Pritchard <[email protected]>

new profile interface generation method Signed-off-by: Howard Pritchard <[email protected]>

Signed-off-by: Howard Pritchard <[email protected]>

by fixing some code Signed-off-by: Howard Pritchard <[email protected]>

Signed-off-by: Chris Chambreau <[email protected]>

values. Signed-off-by: Chris Chambreau <[email protected]>

Signed-off-by: Chris Chambreau <[email protected]>

Signed-off-by: Howard Pritchard <[email protected]>

Signed-off-by: Kingshuk Haldar <[email protected]>

…ponent_event_register, mca_base_component_source_register Signed-off-by: Kingshuk Haldar <[email protected]>

kingshuk00 · 2025-06-03T15:19:39Z

@jsquyres , @bosilca , @hppritcha ,

Two review comments from the original PR regarding datatype and the tools-events design are not addressed.

The respective discussions do not seem to have reached an accord. Please comment.

If my understanding is correct, aiming for 6.0 would be too ambitious and I am okay with this not getting in 6.0 if more redesign is needed. Thank you!

hppritcha · 2025-06-03T15:28:31Z

@kingshuk00 you are correct about the two unaddressed review comments. And i am of the opinion as well that this is too ambitious for the 6.0.x branch/release time frame. I am not sure continuing to work on this PR is the right approach - the basic design should be reconsidered.

github-actions bot added the Target: main label Mar 10, 2025

kingshuk00 force-pushed the mpi_t_events_2025 branch from a2760b0 to 40a6606 Compare March 18, 2025 13:56

bosilca reviewed Mar 18, 2025

View reviewed changes

opal/mca/base/mca_base_source.c Show resolved Hide resolved

opal/mca/base/mca_base_event.c Outdated Show resolved Hide resolved

hjelmn and others added 12 commits June 2, 2025 16:49

Modify PERUSE code to implement MPI_T Events based on MPI 4.0 proposal.

f0313af

Signed-off-by: Nathan Hjelm <[email protected]> Signed-off-by: Kingshuk Haldar <[email protected]>

Modifications for changes to proposed MPI_T Events API as well as min…

0857723

…or fixes. Signed-off-by: Chris Chambreau <[email protected]>

remove include of dss.h

9a20411

dss.h has been removed since this PR was originally opened Signed-off-by: Howard Pritchard <[email protected]>

Remove non-conformant event_get_info extent argument and internal tra…

cad30d3

…cking. Signed-off-by: Chris Chambreau <[email protected]>

Add user_data argument to MPI_T_event_handle_free function.

8dee643

Signed-off-by: Chris Chambreau <[email protected]>

Fix generation of event_get_info displacements and related copy_data …

fca745b

…functionality Signed-off-by: Chris Chambreau <[email protected]>

Minor cleanup

004c0fe

- Correct MPI_T_event_dropped_cb_function arguments - Check for negative event index Signed-off-by: Chris Chambreau <[email protected]>

mpi_t events: convert to new profiling approach

24c1449

Signed-off-by: Howard Pritchard <[email protected]>

mpi_t events fix some missing files for

cdb3579

new profile interface generation method Signed-off-by: Howard Pritchard <[email protected]>

pr review feedback

cde20de

Signed-off-by: Howard Pritchard <[email protected]>

pr feedback - switch to opal_clock_gettime

c7f4907

Signed-off-by: Howard Pritchard <[email protected]>

suppress a compiler warning

ad594d6

by fixing some code Signed-off-by: Howard Pritchard <[email protected]>

cchambreau and others added 12 commits June 2, 2025 16:49

pr feedback - remove assert

a8c7aca

Signed-off-by: Chris Chambreau <[email protected]>

pr feedback - corrected comment references to events

13908f4

Signed-off-by: Chris Chambreau <[email protected]>

pr feedback - added locking for setting dropped event handler

ff7df46

Signed-off-by: Chris Chambreau <[email protected]>

Protect source data access with thread locking.

0e32cc5

Signed-off-by: Chris Chambreau <[email protected]>

Add OPAL_LIKELY

b723f76

Signed-off-by: Chris Chambreau <[email protected]>

Correct OSC event descriptions.

527d5ed

Signed-off-by: Chris Chambreau <[email protected]>

Ensure consistency between ompi and opal callback safety requirement

48f414f

values. Signed-off-by: Chris Chambreau <[email protected]>

Ensure thread safety for incremented variables.

29f8147

Signed-off-by: Chris Chambreau <[email protected]>

Add function to lookup ompi datatype from opal datatype id.

6a9224b

Signed-off-by: Chris Chambreau <[email protected]>

fix compile problem after rebase

8466cc5

Signed-off-by: Howard Pritchard <[email protected]>

Do not use mutex lock during mca_base_source_get_by_name()

a78f760

Signed-off-by: Kingshuk Haldar <[email protected]>

remove unused functions mca_base_source_set_time_source, mca_base_com…

8bc5a77

…ponent_event_register, mca_base_component_source_register Signed-off-by: Kingshuk Haldar <[email protected]>

kingshuk00 force-pushed the mpi_t_events_2025 branch from 40a6606 to 8bc5a77 Compare June 3, 2025 15:07

MPI_T Events #13133

Are you sure you want to change the base?

MPI_T Events #13133

Uh oh!

Conversation

kingshuk00 commented Mar 10, 2025

Uh oh!

kingshuk00 commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kingshuk00 commented Mar 10, 2025

Uh oh!

hppritcha commented Mar 10, 2025

Uh oh!

kingshuk00 commented Mar 10, 2025

Uh oh!

devreal commented Mar 10, 2025

Uh oh!

hppritcha commented Mar 10, 2025

Uh oh!

kingshuk00 commented Mar 10, 2025

Uh oh!

kingshuk00 commented Mar 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hppritcha commented Mar 10, 2025

Uh oh!

bosilca commented Mar 10, 2025

Uh oh!

hppritcha commented Mar 10, 2025

Uh oh!

hppritcha commented Mar 10, 2025

Uh oh!

bosilca commented Mar 11, 2025

Uh oh!

kingshuk00 commented Mar 18, 2025

Uh oh!

Uh oh!

Uh oh!

jsquyres commented May 23, 2025

Uh oh!

kingshuk00 commented Jun 3, 2025

Uh oh!

hppritcha commented Jun 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

kingshuk00 commented Mar 10, 2025 •

edited

Loading

kingshuk00 commented Mar 10, 2025 •

edited

Loading