Skip to content

Unable to Run Ray Inference model using adlfs #516

@mittachaitu

Description

@mittachaitu

Description

Unable to run Ray inference model using adlfs as connector to read and write to azure blob storage.

Expected Behavior

Ray job should succeed instead of failure.

Current Behavior

Ray job is failing with pickling errors

Parquet dataset sampling 0:   0%|          | 0.00/3.00 [00:00<?, ? file/s]
Parquet dataset sampling 0:   0%|          | 0.00/3.00 [00:01<?, ? file/s]
Parquet dataset sampling 0:   0%|          | 0.00/3.00 [00:02<?, ? file/s]
Parquet dataset sampling 0:   0%|          | 0.00/3.00 [00:03<?, ? file/s]
Parquet dataset sampling 0:  33%|███▎      | 1.00/3.00 [00:04<00:08, 4.02s/ file]
Parquet dataset sampling 0:  33%|███▎      | 1.00/3.00 [00:04<00:08, 4.02s/ file]
Parquet dataset sampling 0: 100%|██████████| 3.00/3.00 [00:04<00:00, 1.38s/ file]
Parquet dataset sampling 0: 100%|██████████| 3.00/3.00 [00:04<00:00, 1.38s/ file]


Parquet dataset sampling 0: 100%|██████████| 3.00/3.00 [00:04<00:00, 1.65s/ file]
2025-10-16 03:55:04,836 INFO parquet_datasource.py:699 -- Estimated parquet encoding ratio is 0.850.
2025-10-16 03:55:04,837 INFO parquet_datasource.py:759 -- Estimated parquet reader batch size at 383480 rows
Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 852, in put_object
    serialized_value = self.get_serialization_context().serialize(value)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/serialization.py", line 722, in serialize
    return self._serialize_to_msgpack(value)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/serialization.py", line 648, in _serialize_to_msgpack
    pickle5_serialized_object = self._serialize_to_pickle5(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/serialization.py", line 595, in _serialize_to_pickle5
    raise e
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/serialization.py", line 590, in _serialize_to_pickle5
    inband = pickle.dumps(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
  File "<stringsource>", line 2, in pyarrow._fs.FileSystem.__reduce_cython__
TypeError: self.fs,self.wrapped cannot be converted to a Python object for pickling

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/tmp/ray/session_2025-10-16_03-53-48_703950_1/runtime_resources/working_dir_files/https_github_com_mittachaitu_aks-ray-sample_archive_sai_wi/sample-tuning-setup/direct-blob-access/train_batch_inference_pyarrow.py", line 410, in <module>
    main(args)
  File "/tmp/ray/session_2025-10-16_03-53-48_703950_1/runtime_resources/working_dir_files/https_github_com_mittachaitu_aks-ray-sample_archive_sai_wi/sample-tuning-setup/direct-blob-access/train_batch_inference_pyarrow.py", line 338, in main
    result = train(framework, data_path, num_workers, cpus_per_worker, args.storage_interface)
  File "/tmp/ray/session_2025-10-16_03-53-48_703950_1/runtime_resources/working_dir_files/https_github_com_mittachaitu_aks-ray-sample_archive_sai_wi/sample-tuning-setup/direct-blob-access/train_batch_inference_pyarrow.py", line 276, in train
    result = trainer.fit()
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/train/base_trainer.py", line 664, in fit
    trainable = self.as_trainable()
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/train/base_trainer.py", line 911, in as_trainable
    return tune.with_parameters(trainable_cls, **base_config)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/util.py", line 107, in with_parameters
    parameter_registry.put(prefix + k, v)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/tune/registry.py", line 306, in put
    self.flush()
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/tune/registry.py", line 318, in flush
    self.references[k] = ray.put(v)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
    return fn(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return func(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 3047, in put
    object_ref = worker.put_object(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 861, in put_object
    raise TypeError(msg) from e
TypeError: Could not serialize the put value RunConfig(name='xgboost_benchmark', storage_path='traineddata/cluster_storage', storage_filesystem=<AzureBlobFilesystem(account=mittassaaccount)>, verbose=1):
================================================================================
Checking Serializability of RunConfig(name='xgboost_benchmark', storage_path='traineddata/cluster_storage', storage_filesystem=<AzureBlobFilesystem(account=mittassaaccount)>, verbose=1)
================================================================================
!!! FAIL serialization: self.fs,self.wrapped cannot be converted to a Python object for pickling
    Serializing '_annotated' RunConfig...
    Serializing '_annotated_api_group' Others...
    Serializing '_annotated_type' AnnotationType.PUBLIC_API...
    Serializing '_repr_html_' <bound method RunConfig._repr_html_ of RunConfig(name='xgboost_benchmark', storage_path='traineddata/cluster_storage', storage_filesystem=<AzureBlobFilesystem(account=mittassaaccount)>, verbose=1)>...
    !!! FAIL serialization: self.fs,self.wrapped cannot be converted to a Python object for pickling
        Serializing '__func__' <function RunConfig._repr_html_ at 0x7f2f6a15a830>...
    WARNING: Did not find non-serializable object in <bound method RunConfig._repr_html_ of RunConfig(name='xgboost_benchmark', storage_path='traineddata/cluster_storage', storage_filesystem=<AzureBlobFilesystem(account=mittassaaccount)>, verbose=1)>. This may be an oversight.
================================================================================
Variable:

        FailTuple(_repr_html_ [obj=<bound method RunConfig._repr_html_ of RunConfig(name='xgboost_benchmark', storage_path='traineddata/cluster_storage', storage_filesystem=<AzureBlobFilesystem(account=mittassaaccount)>, verbose=1)>, parent=RunConfig(name='xgboost_benchmark', storage_path='traineddata/cluster_storage', storage_filesystem=<AzureBlobFilesystem(account=mittassaaccount)>, verbose=1)])

was found to be non-serializable. There may be multiple other undetected variables that were non-serializable.
Consider either removing the instantiation/imports of these variables or moving the instantiation into the scope of the function/class.
================================================================================
Check https://docs.ray.io/en/master/ray-core/objects/serialization.html#troubleshooting for more information.
If you have any suggestions on how to improve this error message, please reach out to the Ray developers on github.com/ray-project/ray/issues/
================================================================================

Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.10/weakref.py", line 667, in _exitfunc
    f()
  File "/home/ray/anaconda3/lib/python3.10/weakref.py", line 591, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "/home/ray/anaconda3/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/home/ray/anaconda3/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/home/ray/anaconda3/lib/python3.10/site-packages/adlfs/utils.py", line 85, in close_credential
    await file_obj.credential.close()
TypeError: object NoneType can't be used in 'await' expression
2025-10-16 03:55:14,053 ERR cli.py:73 -- -----------------------------------
2025-10-16 03:55:14,053 ERR cli.py:74 -- Job 'rayjob-tune-gpt2-f92d7' failed
2025-10-16 03:55:14,053 ERR cli.py:75 -- -----------------------------------
2025-10-16 03:55:14,053 INFO cli.py:88 -- Status message: Job entrypoint command failed with exit code 1, last available logs (truncated to 20,000 chars):
    f()
  File "/home/ray/anaconda3/lib/python3.10/weakref.py", line 591, in __call__
    return info.func(*info.args, **(info.kwargs or {}))
  File "/home/ray/anaconda3/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/home/ray/anaconda3/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
  File "/home/ray/anaconda3/lib/python3.10/site-packages/adlfs/utils.py", line 85, in close_credential
    await file_obj.credential.close()
TypeError: object NoneType can't be used in 'await' expression

Steps to Reproduce

  • Run the kuberay job [by updating required values] by deploying in Kubernetes setup

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions