-
Notifications
You must be signed in to change notification settings - Fork 113
Open
Labels
Description
Description
Unable to run Ray inference model using adlfs as connector to read and write to azure blob storage.
Expected Behavior
Ray job should succeed instead of failure.
Current Behavior
Ray job is failing with pickling errors
Parquet dataset sampling 0: 0%| | 0.00/3.00 [00:00<?, ? file/s]
Parquet dataset sampling 0: 0%| | 0.00/3.00 [00:01<?, ? file/s]
Parquet dataset sampling 0: 0%| | 0.00/3.00 [00:02<?, ? file/s]
Parquet dataset sampling 0: 0%| | 0.00/3.00 [00:03<?, ? file/s]
Parquet dataset sampling 0: 33%|███▎ | 1.00/3.00 [00:04<00:08, 4.02s/ file]
Parquet dataset sampling 0: 33%|███▎ | 1.00/3.00 [00:04<00:08, 4.02s/ file]
Parquet dataset sampling 0: 100%|██████████| 3.00/3.00 [00:04<00:00, 1.38s/ file]
Parquet dataset sampling 0: 100%|██████████| 3.00/3.00 [00:04<00:00, 1.38s/ file]
Parquet dataset sampling 0: 100%|██████████| 3.00/3.00 [00:04<00:00, 1.65s/ file]
2025-10-16 03:55:04,836 INFO parquet_datasource.py:699 -- Estimated parquet encoding ratio is 0.850.
2025-10-16 03:55:04,837 INFO parquet_datasource.py:759 -- Estimated parquet reader batch size at 383480 rows
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 852, in put_object
serialized_value = self.get_serialization_context().serialize(value)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/serialization.py", line 722, in serialize
return self._serialize_to_msgpack(value)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/serialization.py", line 648, in _serialize_to_msgpack
pickle5_serialized_object = self._serialize_to_pickle5(
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/serialization.py", line 595, in _serialize_to_pickle5
raise e
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/serialization.py", line 590, in _serialize_to_pickle5
inband = pickle.dumps(
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/cloudpickle/cloudpickle.py", line 1479, in dumps
cp.dump(obj)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/cloudpickle/cloudpickle.py", line 1245, in dump
return super().dump(obj)
File "<stringsource>", line 2, in pyarrow._fs.FileSystem.__reduce_cython__
TypeError: self.fs,self.wrapped cannot be converted to a Python object for pickling
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/tmp/ray/session_2025-10-16_03-53-48_703950_1/runtime_resources/working_dir_files/https_github_com_mittachaitu_aks-ray-sample_archive_sai_wi/sample-tuning-setup/direct-blob-access/train_batch_inference_pyarrow.py", line 410, in <module>
main(args)
File "/tmp/ray/session_2025-10-16_03-53-48_703950_1/runtime_resources/working_dir_files/https_github_com_mittachaitu_aks-ray-sample_archive_sai_wi/sample-tuning-setup/direct-blob-access/train_batch_inference_pyarrow.py", line 338, in main
result = train(framework, data_path, num_workers, cpus_per_worker, args.storage_interface)
File "/tmp/ray/session_2025-10-16_03-53-48_703950_1/runtime_resources/working_dir_files/https_github_com_mittachaitu_aks-ray-sample_archive_sai_wi/sample-tuning-setup/direct-blob-access/train_batch_inference_pyarrow.py", line 276, in train
result = trainer.fit()
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/train/base_trainer.py", line 664, in fit
trainable = self.as_trainable()
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/train/base_trainer.py", line 911, in as_trainable
return tune.with_parameters(trainable_cls, **base_config)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/tune/trainable/util.py", line 107, in with_parameters
parameter_registry.put(prefix + k, v)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/tune/registry.py", line 306, in put
self.flush()
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/tune/registry.py", line 318, in flush
self.references[k] = ray.put(v)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
return fn(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
return func(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 3047, in put
object_ref = worker.put_object(
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/_private/worker.py", line 861, in put_object
raise TypeError(msg) from e
TypeError: Could not serialize the put value RunConfig(name='xgboost_benchmark', storage_path='traineddata/cluster_storage', storage_filesystem=<AzureBlobFilesystem(account=mittassaaccount)>, verbose=1):
================================================================================
Checking Serializability of RunConfig(name='xgboost_benchmark', storage_path='traineddata/cluster_storage', storage_filesystem=<AzureBlobFilesystem(account=mittassaaccount)>, verbose=1)
================================================================================
!!! FAIL serialization: self.fs,self.wrapped cannot be converted to a Python object for pickling
Serializing '_annotated' RunConfig...
Serializing '_annotated_api_group' Others...
Serializing '_annotated_type' AnnotationType.PUBLIC_API...
Serializing '_repr_html_' <bound method RunConfig._repr_html_ of RunConfig(name='xgboost_benchmark', storage_path='traineddata/cluster_storage', storage_filesystem=<AzureBlobFilesystem(account=mittassaaccount)>, verbose=1)>...
!!! FAIL serialization: self.fs,self.wrapped cannot be converted to a Python object for pickling
Serializing '__func__' <function RunConfig._repr_html_ at 0x7f2f6a15a830>...
WARNING: Did not find non-serializable object in <bound method RunConfig._repr_html_ of RunConfig(name='xgboost_benchmark', storage_path='traineddata/cluster_storage', storage_filesystem=<AzureBlobFilesystem(account=mittassaaccount)>, verbose=1)>. This may be an oversight.
================================================================================
Variable:
FailTuple(_repr_html_ [obj=<bound method RunConfig._repr_html_ of RunConfig(name='xgboost_benchmark', storage_path='traineddata/cluster_storage', storage_filesystem=<AzureBlobFilesystem(account=mittassaaccount)>, verbose=1)>, parent=RunConfig(name='xgboost_benchmark', storage_path='traineddata/cluster_storage', storage_filesystem=<AzureBlobFilesystem(account=mittassaaccount)>, verbose=1)])
was found to be non-serializable. There may be multiple other undetected variables that were non-serializable.
Consider either removing the instantiation/imports of these variables or moving the instantiation into the scope of the function/class.
================================================================================
Check https://docs.ray.io/en/master/ray-core/objects/serialization.html#troubleshooting for more information.
If you have any suggestions on how to improve this error message, please reach out to the Ray developers on github.com/ray-project/ray/issues/
================================================================================
Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.10/weakref.py", line 667, in _exitfunc
f()
File "/home/ray/anaconda3/lib/python3.10/weakref.py", line 591, in __call__
return info.func(*info.args, **(info.kwargs or {}))
File "/home/ray/anaconda3/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/home/ray/anaconda3/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
File "/home/ray/anaconda3/lib/python3.10/site-packages/adlfs/utils.py", line 85, in close_credential
await file_obj.credential.close()
TypeError: object NoneType can't be used in 'await' expression
2025-10-16 03:55:14,053 ERR cli.py:73 -- -----------------------------------
2025-10-16 03:55:14,053 ERR cli.py:74 -- Job 'rayjob-tune-gpt2-f92d7' failed
2025-10-16 03:55:14,053 ERR cli.py:75 -- -----------------------------------
2025-10-16 03:55:14,053 INFO cli.py:88 -- Status message: Job entrypoint command failed with exit code 1, last available logs (truncated to 20,000 chars):
f()
File "/home/ray/anaconda3/lib/python3.10/weakref.py", line 591, in __call__
return info.func(*info.args, **(info.kwargs or {}))
File "/home/ray/anaconda3/lib/python3.10/site-packages/fsspec/asyn.py", line 103, in sync
raise return_result
File "/home/ray/anaconda3/lib/python3.10/site-packages/fsspec/asyn.py", line 56, in _runner
result[0] = await coro
File "/home/ray/anaconda3/lib/python3.10/site-packages/adlfs/utils.py", line 85, in close_credential
await file_obj.credential.close()
TypeError: object NoneType can't be used in 'await' expressionSteps to Reproduce
- Run the kuberay job [by updating required values] by deploying in Kubernetes setup