You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you want to use non-standard python libraries in an Airflow job, you'd need to build a custom image, pip install those and then use your custom image in your cluster.
Preferred Situation
You can configure a requirements.txt, which then will be installed in the Airflow deployment.
Example
E.g. you want to use pandas==2.2.2 in a DAG, currently you would need to setup a CI/CD way of building and deploying a custom Airflow image. The Dockerfile would look like:
Although this is fairly easy doable it implies maintenance and resources. I consider this being a fairly common use case and thus we should think about if we could cover it with e.g. ( no strong opinion neither on naming nor where it should be in the crd and how )
I think a solution on operator level would remove the pain to construct and maintain a build pipeline to the cluster. It moves the maintenance effort into the Airflow Operator, but this already needs attention ( stackable versions, product versions ).
However, I can't evaluate how much effort we need to put in to archive this and what kind of risks this would imply.
The text was updated successfully, but these errors were encountered:
This approach has the major downside that it installs the DAG requirements in Airflow's own virtual environment which may lead to conflicts and break the Airflow stacklet.
Stackable should make it as easy as possible to use isolate DAG envs and encourage the use of Python*Operator as described in here
Venvs can be built off-site and and provisioned with PEX or venv-pack or conda-pack as described here.
Current Situation
If you want to use non-standard python libraries in an Airflow job, you'd need to build a custom image, pip install those and then use your custom image in your cluster.
Preferred Situation
You can configure a
requirements.txt
, which then will be installed in the Airflow deployment.Example
E.g. you want to use
pandas==2.2.2
in a DAG, currently you would need to setup a CI/CD way of building and deploying a custom Airflow image. TheDockerfile
would look like:Although this is fairly easy doable it implies maintenance and resources. I consider this being a fairly common use case and thus we should think about if we could cover it with e.g. ( no strong opinion neither on naming nor where it should be in the crd and how )
and a
configMap
I think a solution on operator level would remove the pain to construct and maintain a build pipeline to the cluster. It moves the maintenance effort into the Airflow Operator, but this already needs attention ( stackable versions, product versions ).
However, I can't evaluate how much effort we need to put in to archive this and what kind of risks this would imply.
The text was updated successfully, but these errors were encountered: