Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Airflow operator by implementing KubernetesExecutor #2

Closed
adwk67 opened this issue Jan 28, 2022 · 3 comments · Fixed by #311 or stackabletech/docker-images#435
Closed
Assignees
Labels
customer-request release/23.11.0 release-note/action-required Denotes a PR that introduces potentially breaking changes that require user action. release-note Denotes a PR that will be considered when it comes time to generate release notes.

Comments

@adwk67
Copy link
Member

adwk67 commented Jan 28, 2022

As a user I want to have the option of running my airflow DAGs with the KubernetesExecutor, so that I have greater control over resource configuration (some settings can be defined per job) and usage (each job runs in its own pod which is created on-demand).

Implementation

  • the airflow controller must define a pod template according to the specification details see here
  • this template must be mounted via e.g. PVC at the location defined by AIRFLOW__KUBERNETES__POD_TEMPLATE_FILE (see configuration)
  • if the airflow resource specifies KubernetesExecutor, then the scheduler recognises this and the KubernetesExecutor requests a worker pod from the Kubernetes API according the the template definition

Background/Context

Currently the airflow-operator implements the CeleryExecutor (Local- and SequentialExecutors are also supported but are not scalable) whereby webserver and scheduler pods interact with multiple (celery-)worker pods: celery reads job data from the external database and queues jobs via an external Redis instance. There are other executors available:

  • KubernetesExecutor
    • each job is spun up in its own pod, which is then destroyed
    • no queue component is needed
    • accessing logs is more complicated
    • not sure if complex jobs can be distributed over multiple workers (as is the case with Celery)

The full list is here: https://airflow.apache.org/docs/apache-airflow/stable/executor/index.html#executor-types

See also #313

@adwk67 adwk67 self-assigned this Jan 28, 2022
@adwk67 adwk67 mentioned this issue Jan 28, 2022
@adwk67
Copy link
Member Author

adwk67 commented Feb 10, 2022

In the context of this issue it would also make sense to look at taking management of the product config, rather than overriding the standard one with environment variables.

@adwk67 adwk67 removed their assignment May 13, 2022
@adwk67 adwk67 changed the title Extend Airflow operator by adding Celery-Flower frontend service and CeleryKubernetes executor Extend Airflow operator by implementing KubernetesExecutor and CeleryKubernetesExecutor May 16, 2022
maltesander added a commit that referenced this issue Nov 14, 2022
@adwk67 adwk67 moved this to Development: In Progress in Stackable Engineering Jul 27, 2023
@adwk67 adwk67 linked a pull request Aug 1, 2023 that will close this issue
@lfrancke lfrancke moved this from Next to In Progress in Stackable End-to-End Coordination Aug 2, 2023
@adwk67 adwk67 changed the title Extend Airflow operator by implementing KubernetesExecutor and CeleryKubernetesExecutor Extend Airflow operator by implementing KubernetesExecutor Aug 7, 2023
@adwk67 adwk67 moved this from Development: In Progress to Development: Waiting for Review in Stackable Engineering Aug 14, 2023
@sbernauer sbernauer moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Aug 15, 2023
@adwk67 adwk67 reopened this Aug 17, 2023
@adwk67 adwk67 moved this from Development: In Review to Development: Done in Stackable Engineering Aug 23, 2023
@adwk67 adwk67 moved this from In Progress to Done in Stackable End-to-End Coordination Aug 29, 2023
@lfrancke lfrancke moved this from Development: Done to Acceptance: In Progress in Stackable Engineering Aug 29, 2023
@lfrancke lfrancke added release/23.11.0 release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed scheduled-for/2023-11 labels Aug 29, 2023
@lfrancke
Copy link
Member

Was the CRD changed for this?
Any breaking changes?

@adwk67
Copy link
Member Author

adwk67 commented Aug 29, 2023

Yes, breaking CRD changes were made and approved in the arch meeting.

@lfrancke lfrancke added release-note/action-required Denotes a PR that introduces potentially breaking changes that require user action. changelog/crd-change labels Aug 29, 2023
@lfrancke lfrancke moved this from Acceptance: In Progress to Done in Stackable Engineering Aug 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-request release/23.11.0 release-note/action-required Denotes a PR that introduces potentially breaking changes that require user action. release-note Denotes a PR that will be considered when it comes time to generate release notes.
Projects
Archived in project
2 participants