Skip to content

Lingering Ghost Processes with 0MiB GPU Memory Reported After Model Actor Creation Observed Behavior #135

@jeesonwang

Description

@jeesonwang

Observed Behavior

When creating model actors through Xorbits actor pools, nvidia-smi shows lingering processes with 0MiB GPU memory allocation entry while actual device memory is occupied.

# nvidia-smi showing "0MiB" processes
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A    964159      C   ...sheng/ModelService/.venv/bin/python          0MiB |
|    1   N/A  N/A    963976      C   ...sheng/ModelService/.venv/bin/python          0MiB |
+-----------------------------------------------------------------------------------------+
# Actual GPU memory utilization (409MiB shown)
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:02:00.0 Off |                  Off |
| N/A   57C    P0             85W /  350W |     409MiB /  49140MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

**My code: **

async def create_worker_actor_pool(address: str) -> "xo.MainActorPoolType":
    subprocess_start_method = "forkserver" if os.name != "nt" else "spawn"

    return await xo.create_actor_pool(
        address=address,
        n_process=0,
        auto_recover="process",
        subprocess_start_method=subprocess_start_method,
    )

async def test_create_model_actor():
    # Setup main pool and sub-pools
    main_pool = await create_worker_actor_pool("localhost:9999")
    
    # Create sub-pools with different CUDA devices
    sub_pool_1 = await main_pool.append_sub_pool(env={"CUDA_VISIBLE_DEVICES": "1"})
    actor_1 = await xo.create_actor(ModelActor, address=sub_pool_1, ...)
    
    sub_pool_2 = await main_pool.append_sub_pool(env={"CUDA_VISIBLE_DEVICES": "0"})
    actor_2 = await xo.create_actor(ModelActor, address=sub_pool_2, ...)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions