Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Git related ploomber placeholders not respected #114

Open
feribg opened this issue Sep 14, 2022 · 4 comments
Open

Git related ploomber placeholders not respected #114

feribg opened this issue Sep 14, 2022 · 4 comments

Comments

@feribg
Copy link
Contributor

feribg commented Sep 14, 2022

Since only the python package is copied in the Docker image, any placeholder that relies on .git folder trigger an error.

For example the following will fail at runtime when executed on AWS Batch

    name: products
    product:
        nb: 'products/{{now}}_{{git_hash}}/

* {{git_hash}}: Ensure git is installed and git repository exists

One suggestion would be to pass those as build variables to the Docker image and expose them as env vars even in the absence of a repo, since they tend to be quite commonly used to track experiments and partition products based on.

@edublancas
Copy link
Contributor

thanks for your feedback! I agree this is important.

I think we could have soopervisor extract the git_hash value (and other placeholder values) and put them in a special file (e.g. .ploomber_placeholders.json), then have ploomber look them up from there. this way {{git_hash}} will work the same when running in ploomber and in soopervisor. thoughts?

this would imply changes to soopervisor and ploomber. I cannot promise a date since we are a small team, so let me know if you'd be interested in helping with this. happy to guide you through the contribution process.

@feribg
Copy link
Contributor Author

feribg commented Sep 28, 2022

That is one possibility @edublancas but do you think we could override those with env vars instead. So if no env var called "git_hash" exists then do the parsing logic, if it exists use the env var, that way we can just change the Docker build process and have it injected at that time, also allowing people to have custom logic, I'm actually taking this path as a temp fix for what we're currently doing

@edublancas
Copy link
Contributor

Ok, I thought a bit more about this. I think there are two problems here.

The first one is regarding environment variables, I'd rather have an explicit expression for reading environment variables. For example:

# pipeline.yaml
name: products
product:
    nb: 'products/{{now}}_{{env["git_hash"]}}'

Then, calling env[something] would read from os.environ; this would be a Ploomber feature.

Now, regarding soopervisor. There's an option in soopervisor.yaml that allows you to include/exclude specific files or directories from the Docker file. Currently, the .git directory is hardcoded to be excluded, hence, {{git_hash}} breaks.

However, we could easily fix this (without introducing any new parameters with something like this:

# soopervisor.yaml
include: [.git]

This would explicitly tell soopervisor to copy the .git directory, hence, {{git_hash}} will work.

thoughts?

@feribg
Copy link
Contributor Author

feribg commented Oct 9, 2022

I agree with both of those.

Just to clarify on the second comment, the include currently doesn't work as the hardcoded exclusion seems to happen after it right? I think any after hardcoded exclusions both for the .git folder as well as things to respect .gitignore should happen before the explicit includes, so then those can be overriden like the way you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants