Skip to content

Forward args to _get_remote_config() and honour core/no_scm if present #10719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 16 additions & 3 deletions dvc/repo/open_repo.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ def open_repo(url, *args, **kwargs):
if os.path.exists(url):
url = os.path.abspath(url)
try:
config = _get_remote_config(url)
config = _get_remote_config(url, *args, **kwargs)
config.update(kwargs.get("config") or {})
kwargs["config"] = config
return Repo(url, *args, **kwargs)
Expand Down Expand Up @@ -97,9 +97,22 @@ def clean_repos():
_remove(path)


def _get_remote_config(url):
def _get_remote_config(url, *args, **kwargs):
try:
repo = Repo(url)
# Get the config dict passed from the parent call, default to empty dict
user_config = kwargs.get("config", {})
# It seems some tests might be passing a 'config' key that is not a dict
if not isinstance(user_config, dict):
user_config = {}
Comment on lines +104 to +106
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this, some tests send kwargs = {'config': None, ...; this safeguard protects against this.

no_scm_flag = user_config.get("core", {}).get("no_scm", None)

if no_scm_flag is not None:
# Honour specific SCM treatment if requested in the call
repo = Repo(url, config={"core": {"no_scm": no_scm_flag}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, Repo(config=...) should just work.

Suggested change
repo = Repo(url, config={"core": {"no_scm": no_scm_flag}})
repo = Repo(url, config=kwargs.get("config"))

I don't want to specialize core.no_scm in any way or handle it.

Copy link
Author

@rgoya rgoya Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it doesn't feel ideal to handle core.no_scm itself. Your solution makes sense and it works for my specific use case, but it triggers other errors in the dvc test suite; which makes me think there are other non-core.no_scm configuration options that are being used that _get_remote_config() doesn't like.

(I went with the core.no_scm specific approach to highlight the need.)

These are the errors I get when using repo = Repo(url, config=kwargs.get("config")):

FAILED tests/func/test_import.py::test_import_no_hash[files1-expected_info_calls1] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_repo_index.py::test_data_index - dvc_data.index.index.DataIndexDirError: failed to load directory ('edir',)
FAILED tests/func/repro/test_repro_pull.py::test_repro_pulls_missing_import - dvc.exceptions.ReproductionError: failed to reproduce 'foo.dvc'
FAILED tests/func/test_data_cloud.py::test_pull_external_dvc_imports - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_data_cloud.py::test_pull_external_dvc_imports_mixed - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/repro/test_repro.py::test_repro_pulls_missing_import - dvc.exceptions.ReproductionError: failed to reproduce 'foo.dvc'
FAILED tests/func/test_import.py::test_import_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_file_from_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_file_from_dir_to_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_rev - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_import.py::test_pull_imported_stage - dvc.exceptions.CheckoutError: Checkout failed for following targets:
FAILED tests/func/test_import.py::test_pull_import_no_download - FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rodrigo.goya/pytest-17/popen-gw13/test_pull_import_no_download0/.dvc/cache/fs/local/e3501e821bcee8f40107794afbe767d1/.F0VDC8H_fGFnvnEcrt...
FAILED tests/func/test_import.py::test_pull_import_no_download_rev_lock - dvc.exceptions.DownloadError: 1 files failed to download
FAILED tests/func/test_import.py::test_pull_imported_directory_stage[dir] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_pull_imported_directory_stage[dir/] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_pull_wildcard_imported_directory_stage - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir123',)
FAILED tests/func/test_update.py::test_update_import[True] - FileNotFoundError: [Errno 2] No storage files available: 'version'
FAILED tests/func/test_import.py::test_pull_non_workspace - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_update.py::test_update_import_after_remote_updates_to_dvc - FileNotFoundError: [Errno 2] No storage files available: 'version'
FAILED tests/func/test_import.py::test_import_with_jobs - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir1',)

Copy link
Author

@rgoya rgoya Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, looking closer at one of the failing tests, tests/func/test_repo_index.py::test_data_index, which gives the error:

FAILED tests/func/test_repo_index.py::test_data_index - dvc_data.index.index.DataIndexDirError: failed to load directory ('edir',)

It seems that there is a competition between cache folders. (Maybe an artifact on how tests are designed?)

  • A call to dvc.repo.imp.py:imp() eventually triggers a call to _get_remote_config() (call stack at the bottom)

  • A url is provided, and a **kwargs that contains a config key, which provides information on the cache, including a cache directory.

    • url = /private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rgoya/pytest-27/erepo0
    • config = {'cache': {'protected': False, 'slow_link_warning': True, 'verify': False, 'dir': '/private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rgoya/pytest-27/test_data_index0/.dvc/cache'}}
  • Besides the remote's name, _get_remote_config() returns repo.cache.local_cache_dir.

  • The presence of config = {"cache": {"dir": "/path/to/cache"}} changes the return value of repo.cache.local_cache_dir:

    • with Repo(url) alone it is: /private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rgoya/pytest-27/erepo0/.dvc/cache, that is url + /.dvc/cache
    • with Repo(url, config=kwargs.get("config")) it is: '/private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rgoya/pytest-27/test_data_index0/.dvc/cache, that is config['cache']['dir'].
  • The test was built expecting to receive url + .dvc/cache, and is now receiving config['cache']['dir']

image

else:
# Default to the .dvc/config contents otherwise
repo = Repo(url)

except NotDvcRepoError:
return {}

Expand Down
Loading