You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
with metaflow.S3() as s3i:
result = s3i.info_many(s3_path, return_missing=True)
Can this be put in a metaflow.multicore_utils.parallel_map ?
i.e. parallel_map(wrapper_for_s3_info_many, s3_paths)
When I try, I get this error:
2024-09-30 23:08:29.391 [261693/start/3201226 (pid 1400063)] metaflow.plugins.datatools.s3.s3.MetaflowS3URLException: Specify S3(run=self) when you use S3 inside a running flow. Otherwise you have to use S3 with full s3:// urls.
2024-09-30 23:08:29.391 [261693/start/3201226 (pid 1400063)] Internal error
However, s3_paths=["s3://path/to/something.jpg","s3://path/to/something_else.jpg", ...] and I know 100% that every path in s3_paths starts with "s3://"
Putting run=self in the S3 instantiation within the wrapper yields
@hasush the s3.xx_many calls are already parallelized behind the scenes so one shouldn't necessarily need parallel_map. regardless, the error that you highlighted looks like a bug that we will address.
Consider this call:
with metaflow.S3() as s3i:
result = s3i.info_many(s3_path, return_missing=True)
Can this be put in a metaflow.multicore_utils.parallel_map ?
i.e. parallel_map(wrapper_for_s3_info_many, s3_paths)
When I try, I get this error:
However, s3_paths=["s3://path/to/something.jpg","s3://path/to/something_else.jpg", ...] and I know 100% that every path in s3_paths starts with "s3://"
Putting run=self in the S3 instantiation within the wrapper yields
The text was updated successfully, but these errors were encountered: