smcuda initialization fails when MPI is initialized before CUDA

## Background information

### What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
 v5.0.8


### Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Installed using conda forge


### If you are building/installing from a git clone, please copy-n-paste the output from `git submodule status`.
NA


### Please describe the system on which you are running

* Operating system/version: Ubuntu 22.04
* Computer hardware: Single node with AMD EPYC 7742 CPU and 8xA30 Nvidia GPUs
* Network type: NA

-----------------------------

## Details of the problem

When using `OMPI_MCA_pml=ob1` for my multi-GPU application, I noticed that that MPI communication is staged through the CPU buffers instead of using Peer-to-Peer Direct Memory Access. Upon investigation, I found that the culprit is a failed initialization of smcuda :
```
select: initializing btl component smcuda
CUDA: cuCtxGetCurrent returned NULL context
select: init of component smcuda returned failure
mca: base: close: component smcuda closed
```
Seeing the NULL context, I suspected that making some dummy CUDA call to initialize CUDA before any MPI call would fix this. And indeed it does resolve the issue and I get the expected bandwidth! 

### Steps to reproduce

*  Setup a fresh conda environment and run the cupy mpi benchmark using ob1:
```shell
conda create -n smcuda_issue -y python=3.12 mpi4py openmpi cupy-core cuda-cudart cuda-version=12 
conda activate smcuda_issue
export OMPI_MCA_opal_cuda_support=true
export OMPI_MCA_pml=ob1
mpiexec -n 2 python -m mpi4py.bench pingpong -a cupy -m 134217728
```
**Output1 :** (your bandwidths will likely differ)
```
# Size [B]  Bandwidth [MB/s] | Time Mean [s] ± StdDev [s]  Samples
 134217728           2704.35 | 4.9630280e-02 ± 1.4886e-03       10
 268435456           2759.26 | 9.7285419e-02 ± 1.6067e-04       10
 536870912           2761.97 | 1.9437956e-01 ± 1.1675e-04       10
1073741824           2765.35 | 3.8828477e-01 ± 7.7874e-04       10
```

* Add the line `import cupy; _ = cupy.zeros(10)` to the top of the file ` $CONDA_PREFIX/lib/python3.12/site-packages/mpi4py/bench.py` and rerun the benchmark.

**Output2:** (your bandwidths will be noticeably higher than output1 if your GPUs support P2P DMA)
```
# MPI PingPong Test
# Size [B]  Bandwidth [MB/s] | Time Mean [s] ± StdDev [s]  Samples
 134217728         260967.35 | 5.1430850e-04 ± 3.2618e-05       10
 268435456         325483.86 | 8.2472740e-04 ± 2.1475e-06       10
 536870912         365462.14 | 1.4690192e-03 ± 1.5596e-06       10
1073741824         391377.22 | 2.7434960e-03 ± 1.0634e-05       10
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

smcuda initialization fails when MPI is initialized before CUDA #13354

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

If you are building/installing from a git clone, please copy-n-paste the output from `git submodule status`.

Please describe the system on which you are running

Details of the problem

Steps to reproduce

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

smcuda initialization fails when MPI is initialized before CUDA #13354

Description

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

Details of the problem

Steps to reproduce

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

If you are building/installing from a git clone, please copy-n-paste the output from `git submodule status`.