-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Forward args to _get_remote_config() and honour core/no_scm if present #10719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #10719 +/- ##
==========================================
+ Coverage 90.68% 91.07% +0.38%
==========================================
Files 504 504
Lines 39795 39953 +158
Branches 3141 3159 +18
==========================================
+ Hits 36087 36386 +299
+ Misses 3042 2939 -103
+ Partials 666 628 -38 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
# It seems some tests might be passing a 'config' key that is not a dict | ||
if not isinstance(user_config, dict): | ||
user_config = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into this, some tests send kwargs = {'config': None, ...
; this safeguard protects against this.
|
||
if no_scm_flag is not None: | ||
# Honour specific SCM treatment if requested in the call | ||
repo = Repo(url, config={"core": {"no_scm": no_scm_flag}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIR, Repo(config=...)
should just work.
repo = Repo(url, config={"core": {"no_scm": no_scm_flag}}) | |
repo = Repo(url, config=kwargs.get("config")) |
I don't want to specialize core.no_scm
in any way or handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it doesn't feel ideal to handle core.no_scm
itself. Your solution makes sense and it works for my specific use case, but it triggers other errors in the dvc
test suite; which makes me think there are other non-core.no_scm
configuration options that are being used that _get_remote_config()
doesn't like.
(I went with the core.no_scm
specific approach to highlight the need.)
These are the errors I get when using repo = Repo(url, config=kwargs.get("config"))
:
FAILED tests/func/test_import.py::test_import_no_hash[files1-expected_info_calls1] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_repo_index.py::test_data_index - dvc_data.index.index.DataIndexDirError: failed to load directory ('edir',)
FAILED tests/func/repro/test_repro_pull.py::test_repro_pulls_missing_import - dvc.exceptions.ReproductionError: failed to reproduce 'foo.dvc'
FAILED tests/func/test_data_cloud.py::test_pull_external_dvc_imports - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_data_cloud.py::test_pull_external_dvc_imports_mixed - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/repro/test_repro.py::test_repro_pulls_missing_import - dvc.exceptions.ReproductionError: failed to reproduce 'foo.dvc'
FAILED tests/func/test_import.py::test_import_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_file_from_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_file_from_dir_to_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_rev - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_import.py::test_pull_imported_stage - dvc.exceptions.CheckoutError: Checkout failed for following targets:
FAILED tests/func/test_import.py::test_pull_import_no_download - FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rodrigo.goya/pytest-17/popen-gw13/test_pull_import_no_download0/.dvc/cache/fs/local/e3501e821bcee8f40107794afbe767d1/.F0VDC8H_fGFnvnEcrt...
FAILED tests/func/test_import.py::test_pull_import_no_download_rev_lock - dvc.exceptions.DownloadError: 1 files failed to download
FAILED tests/func/test_import.py::test_pull_imported_directory_stage[dir] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_pull_imported_directory_stage[dir/] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_pull_wildcard_imported_directory_stage - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir123',)
FAILED tests/func/test_update.py::test_update_import[True] - FileNotFoundError: [Errno 2] No storage files available: 'version'
FAILED tests/func/test_import.py::test_pull_non_workspace - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_update.py::test_update_import_after_remote_updates_to_dvc - FileNotFoundError: [Errno 2] No storage files available: 'version'
FAILED tests/func/test_import.py::test_import_with_jobs - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir1',)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, looking closer at one of the failing tests, tests/func/test_repo_index.py::test_data_index
, which gives the error:
FAILED tests/func/test_repo_index.py::test_data_index - dvc_data.index.index.DataIndexDirError: failed to load directory ('edir',)
It seems that there is a competition between cache
folders. (Maybe an artifact on how tests are designed?)
-
A call to
dvc.repo.imp.py:imp()
eventually triggers a call to_get_remote_config()
(call stack at the bottom) -
A
url
is provided, and a**kwargs
that contains aconfig
key, which provides information on the cache, including a cache directory.url
=/private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rgoya/pytest-27/erepo0
config
={'cache': {'protected': False, 'slow_link_warning': True, 'verify': False, 'dir': '/private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rgoya/pytest-27/test_data_index0/.dvc/cache'}}
-
Besides the remote's name,
_get_remote_config()
returnsrepo.cache.local_cache_dir
. -
The presence of
config = {"cache": {"dir": "/path/to/cache"}}
changes the return value ofrepo.cache.local_cache_dir
:- with
Repo(url)
alone it is:/private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rgoya/pytest-27/erepo0/.dvc/cache
, that isurl
+/.dvc/cache
- with
Repo(url, config=kwargs.get("config"))
it is:'/private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rgoya/pytest-27/test_data_index0/.dvc/cache
, that isconfig['cache']['dir']
.
- with
-
The test was built expecting to receive
url
+.dvc/cache
, and is now receivingconfig['cache']['dir']
This is a proposed fix for #10608, the code here makes steps 9 and 10 described in the issue work.
Summary:
This change allows a user to access the dvc information in an environment that is disconnected from the original Git backend (e.g. in a deployed container, see #10608), by using something like:
Description:
Mainly, a call to
dvc/repo/open_repo.py:open_repo(url, *args, **kwargs)
may contain a parameterconfig
in**kwargs
. With thisconfig
a user might indicate they do not want to access the repo with Git support, by usingconfig={"core": {"no_scm": True}}
.During the execution of
dvc/repo/open_repo.py:open_repo()
, there is a call to a functiondvc/repo/open_repo.py:_get_remote_config()
that returns the remote configuration({"core": {"remote"}}
. This is then merged to the user providedconfig
parameter before callingRepo(url, *args, **kwargs)
.dvc/repo/open_repo.py:_get_remote_config()
, in turn, does a quickRepo()
call to get the remote configuration. However, it does not use any of the parameters requested viadvc/repo/open_repo.py:open_repo()
and thus relies entirely on the contents of.dvc/config
. This means that even if the user requested no SCM support, it will try to look for a Git repo if.dvc/config
says so, and fail if it does not find it.This PR modifies
dvc/repo/open_repo.py:_get_remote_config()
to receive*args, **kwargs
and honour the request to use or ignore Git support when accessing the dvc repo.❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏