Skip to content

DVC can not push file larger than 100MiB due to an upstream bug #10643

Closed
@CNLHC

Description

@CNLHC

Bug Report

Description

Since a bug in the ossfs dependency, dvc can not push file larger than 100MiB to the oss remote.

Reproduce

dvc init
mkfile -n 200m test.blob
dvc add test.blob
dvc remote add foo oss://<oss_bucket>
dvc push -r s9t

The output report the file is pushed to remote which is false.

...python3.12/site-packages/ossfs/async_oss.py:388: RuntimeWarning: coroutine 'resumable_upload' was never awaited                                                                  
  await self._call_oss(
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Pushing
1 file pushed

Expected

The file should be pushed to the oss, or if something unexpected happen, the CLI should report an error.

Environment information

Output of dvc doctor:

$ dvc doctor

DVC version: 3.58.0 (pip)
-------------------------
Platform: Python 3.12.7 on macOS-15.1.1-arm64-arm-64bit
Subprojects:
	dvc_data = 3.16.7
	dvc_objects = 5.1.0
	dvc_render = 1.0.2
	dvc_task = 0.40.2
	scmrepo = 3.3.9
Supports:
	http (aiohttp = 3.9.5, aiohttp-retry = 2.9.1),
	https (aiohttp = 3.9.5, aiohttp-retry = 2.9.1),
	oss (ossfs = 2023.12.0),
	s3 (s3fs = 2024.10.0, boto3 = 1.35.36)
Config:
	Global: /Users/liuhancheng/Library/Application Support/dvc
	System: /Library/Application Support/dvc
Cache types: reflink, hardlink, symlink
Cache directory: apfs on /dev/disk3s1s1
Caches: local
Remotes: oss
Workspace directory: apfs on /dev/disk3s1s1
Repo: dvc, git
Repo.site_cache_dir: /Library/Caches/dvc/repo/adbc8e36d46e0788fce6cf0882302974

Additional Information (if any):

The root cause of this issue is a bug in the ossfs package. According to the warning info, this line is used to upload large file
https://github.com/fsspec/ossfs/blob/224e98868f32018fabacdac1eb5daddb16ce419c/src/ossfs/async_oss.py#L388
and finally this line(L159) will invoke the underlying method to perform uploading.
https://github.com/fsspec/ossfs/blob/224e98868f32018fabacdac1eb5daddb16ce419c/src/ossfs/async_oss.py#L159

when the method_name is , the method(service, *args, **kwargs) returns a future that is not awaited, which cause this problem.

A small and quick fix is applying this patch:

diff --git a/src/ossfs/async_oss.py b/src/ossfs/async_oss.py
index 8a07b5e..f01b501 100644
--- a/src/ossfs/async_oss.py
+++ b/src/ossfs/async_oss.py
@@ -156,7 +156,10 @@ class AioOSSFileSystem(BaseOSSFileSystem, AsyncFileSystem):
             if not method:
                 method = getattr(aiooss2, method_name)
                 logger.debug("CALL: %s - %s - %s", method.__name__, args, kwargs)
-                out = method(service, *args, **kwargs)
+                if method_name =="resumable_upload":
+                    out = await method(service, *args, **kwargs)
+                else:
+                    out = method(service, *args, **kwargs)
             else:
                 logger.debug("CALL: %s - %s - %s", method.__name__, args, kwargs)
                 out = await method(*args, **kwargs)

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: data-syncRelated to dvc get/fetch/import/pull/pushbugDid we break something?fs: ossRelated to the Alibaba Cloud OSS filesystem

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions