Skip to content

Forward args to _get_remote_config() and honour core/no_scm if present #10719

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rgoya
Copy link

@rgoya rgoya commented Apr 10, 2025

This is a proposed fix for #10608, the code here makes steps 9 and 10 described in the issue work.

Summary:

This change allows a user to access the dvc information in an environment that is disconnected from the original Git backend (e.g. in a deployed container, see #10608), by using something like:

dvc.api.get_url(path,
                repo,
                config={"core": {"no_scm": True}}
                )

Description:

Mainly, a call to dvc/repo/open_repo.py:open_repo(url, *args, **kwargs) may contain a parameter config in **kwargs. With this config a user might indicate they do not want to access the repo with Git support, by using config={"core": {"no_scm": True}}.

During the execution of dvc/repo/open_repo.py:open_repo(), there is a call to a function dvc/repo/open_repo.py:_get_remote_config() that returns the remote configuration({"core": {"remote"}}. This is then merged to the user provided config parameter before calling Repo(url, *args, **kwargs).

dvc/repo/open_repo.py:_get_remote_config(), in turn, does a quick Repo() call to get the remote configuration. However, it does not use any of the parameters requested via dvc/repo/open_repo.py:open_repo() and thus relies entirely on the contents of .dvc/config. This means that even if the user requested no SCM support, it will try to look for a Git repo if .dvc/config says so, and fail if it does not find it.

This PR modifies dvc/repo/open_repo.py:_get_remote_config() to receive *args, **kwargs and honour the request to use or ignore Git support when accessing the dvc repo.

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

Copy link

codecov bot commented Apr 10, 2025

Codecov Report

Attention: Patch coverage is 77.77778% with 2 lines in your changes missing coverage. Please review.

Project coverage is 91.07%. Comparing base (2431ec6) to head (d03f25e).
Report is 36 commits behind head on main.

Files with missing lines Patch % Lines
dvc/repo/open_repo.py 77.77% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10719      +/-   ##
==========================================
+ Coverage   90.68%   91.07%   +0.38%     
==========================================
  Files         504      504              
  Lines       39795    39953     +158     
  Branches     3141     3159      +18     
==========================================
+ Hits        36087    36386     +299     
+ Misses       3042     2939     -103     
+ Partials      666      628      -38     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +104 to +106
# It seems some tests might be passing a 'config' key that is not a dict
if not isinstance(user_config, dict):
user_config = {}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into this, some tests send kwargs = {'config': None, ...; this safeguard protects against this.


if no_scm_flag is not None:
# Honour specific SCM treatment if requested in the call
repo = Repo(url, config={"core": {"no_scm": no_scm_flag}})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, Repo(config=...) should just work.

Suggested change
repo = Repo(url, config={"core": {"no_scm": no_scm_flag}})
repo = Repo(url, config=kwargs.get("config"))

I don't want to specialize core.no_scm in any way or handle it.

Copy link
Author

@rgoya rgoya Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it doesn't feel ideal to handle core.no_scm itself. Your solution makes sense and it works for my specific use case, but it triggers other errors in the dvc test suite; which makes me think there are other non-core.no_scm configuration options that are being used that _get_remote_config() doesn't like.

(I went with the core.no_scm specific approach to highlight the need.)

These are the errors I get when using repo = Repo(url, config=kwargs.get("config")):

FAILED tests/func/test_import.py::test_import_no_hash[files1-expected_info_calls1] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_repo_index.py::test_data_index - dvc_data.index.index.DataIndexDirError: failed to load directory ('edir',)
FAILED tests/func/repro/test_repro_pull.py::test_repro_pulls_missing_import - dvc.exceptions.ReproductionError: failed to reproduce 'foo.dvc'
FAILED tests/func/test_data_cloud.py::test_pull_external_dvc_imports - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_data_cloud.py::test_pull_external_dvc_imports_mixed - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/repro/test_repro.py::test_repro_pulls_missing_import - dvc.exceptions.ReproductionError: failed to reproduce 'foo.dvc'
FAILED tests/func/test_import.py::test_import_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_file_from_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_file_from_dir_to_dir - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_import_rev - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_import.py::test_pull_imported_stage - dvc.exceptions.CheckoutError: Checkout failed for following targets:
FAILED tests/func/test_import.py::test_pull_import_no_download - FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rodrigo.goya/pytest-17/popen-gw13/test_pull_import_no_download0/.dvc/cache/fs/local/e3501e821bcee8f40107794afbe767d1/.F0VDC8H_fGFnvnEcrt...
FAILED tests/func/test_import.py::test_pull_import_no_download_rev_lock - dvc.exceptions.DownloadError: 1 files failed to download
FAILED tests/func/test_import.py::test_pull_imported_directory_stage[dir] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_pull_imported_directory_stage[dir/] - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir',)
FAILED tests/func/test_import.py::test_pull_wildcard_imported_directory_stage - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir123',)
FAILED tests/func/test_update.py::test_update_import[True] - FileNotFoundError: [Errno 2] No storage files available: 'version'
FAILED tests/func/test_import.py::test_pull_non_workspace - FileNotFoundError: [Errno 2] No storage files available: 'foo'
FAILED tests/func/test_update.py::test_update_import_after_remote_updates_to_dvc - FileNotFoundError: [Errno 2] No storage files available: 'version'
FAILED tests/func/test_import.py::test_import_with_jobs - dvc_data.index.index.DataIndexDirError: failed to load directory ('dir1',)

Copy link
Author

@rgoya rgoya Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, looking closer at one of the failing tests, tests/func/test_repo_index.py::test_data_index, which gives the error:

FAILED tests/func/test_repo_index.py::test_data_index - dvc_data.index.index.DataIndexDirError: failed to load directory ('edir',)

It seems that there is a competition between cache folders. (Maybe an artifact on how tests are designed?)

  • A call to dvc.repo.imp.py:imp() eventually triggers a call to _get_remote_config() (call stack at the bottom)

  • A url is provided, and a **kwargs that contains a config key, which provides information on the cache, including a cache directory.

    • url = /private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rgoya/pytest-27/erepo0
    • config = {'cache': {'protected': False, 'slow_link_warning': True, 'verify': False, 'dir': '/private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rgoya/pytest-27/test_data_index0/.dvc/cache'}}
  • Besides the remote's name, _get_remote_config() returns repo.cache.local_cache_dir.

  • The presence of config = {"cache": {"dir": "/path/to/cache"}} changes the return value of repo.cache.local_cache_dir:

    • with Repo(url) alone it is: /private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rgoya/pytest-27/erepo0/.dvc/cache, that is url + /.dvc/cache
    • with Repo(url, config=kwargs.get("config")) it is: '/private/var/folders/_3/h8jc4f6d5gg6dwk7t6464_z80000gp/T/pytest-of-rgoya/pytest-27/test_data_index0/.dvc/cache, that is config['cache']['dir'].
  • The test was built expecting to receive url + .dvc/cache, and is now receiving config['cache']['dir']

image

@rgoya rgoya requested a review from skshetry April 10, 2025 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants