Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ephemeral clusters not getting deleted for jobs #162

Open
elmiko opened this issue Jan 4, 2018 · 1 comment
Open

ephemeral clusters not getting deleted for jobs #162

elmiko opened this issue Jan 4, 2018 · 1 comment

Comments

@elmiko
Copy link
Contributor

elmiko commented Jan 4, 2018

while testing the build and job workflows i've run into a situation where it appears that ephemeral clusters are not getting deleted even with the delete cluster option set to true.

steps to reproduce

  1. oc new-project test
  2. oc create -f https://radanalytics.io/resources.yaml
  3. oc create -f pysparkbuild.json
  4. oc create -f pysparkjob.json
  5. oc new-app --template oshinko-pyspark-build -p GIT_URI=https://github.com/radanalyticsio/s2i-integration-test-apps
  6. oc new-app --template oshinko-pyspark-job -p IMAGE=<Docker pull spec here>

observed result

the cluster created for the job is never cleaned, and the output seems to not recognize that it is ephemeral.

logs

18/01/04 16:07:04 INFO SparkContext: Invoking stop() from shutdown hook
18/01/04 16:07:04 INFO SparkUI: Stopped Spark web UI at http://172.17.0.2:4040
18/01/04 16:07:04 INFO StandaloneSchedulerBackend: Shutting down all executors
18/01/04 16:07:04 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
18/01/04 16:07:04 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/01/04 16:07:04 INFO MemoryStore: MemoryStore cleared
18/01/04 16:07:04 INFO BlockManager: BlockManager stopped
18/01/04 16:07:04 INFO BlockManagerMaster: BlockManagerMaster stopped
18/01/04 16:07:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/01/04 16:07:04 INFO SparkContext: Successfully stopped SparkContext
18/01/04 16:07:04 INFO ShutdownHookManager: Shutdown hook called
18/01/04 16:07:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-05af0c6c-9bb4-49a7-b62d-6a67b83fc749/pyspark-a5c3ba26-8a73-4e22-b314-602f07296267
18/01/04 16:07:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-05af0c6c-9bb4-49a7-b62d-6a67b83fc749
Deleting cluster 'cluster-efdcc4'
cluster is not ephemeral
cluster not deleted 'cluster-efdcc4'

the pods are never deleted,

$ oc get pods
NAME                       READY     STATUS      RESTARTS   AGE
cluster-efdcc4-m-1-l7xds   1/1       Running     0          12m
cluster-efdcc4-w-1-v7kmn   1/1       Running     0          12m
pyspark-m6va-cb82v         0/1       Completed   0          12m
pyspark-y8bl-1-build       0/1       Completed   0          30m

expected result

all cluster pods should be deleted after the job has completed.

possible cause

i think that the way the $ephemeral variable is being calculated in this function in the common start script is probably causing the issues here. it probably needs to account for jobs differently than deployments.

@elmiko elmiko added the bug label Jan 4, 2018
@tmckayus
Copy link
Collaborator

tmckayus commented Jan 4, 2018

This is a known limitation, since ephemeral-ness is tracked via labels on deploymentconfigs.

We need another solution for jobs

@tmckayus tmckayus added enhancement and removed bug labels Apr 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants