ephemeral clusters not getting deleted for jobs #162

elmiko · 2018-01-04T16:25:39Z

while testing the build and job workflows i've run into a situation where it appears that ephemeral clusters are not getting deleted even with the delete cluster option set to true.

steps to reproduce

oc new-project test
oc create -f https://radanalytics.io/resources.yaml
oc create -f pysparkbuild.json
oc create -f pysparkjob.json
oc new-app --template oshinko-pyspark-build -p GIT_URI=https://github.com/radanalyticsio/s2i-integration-test-apps
oc new-app --template oshinko-pyspark-job -p IMAGE=<Docker pull spec here>

observed result

the cluster created for the job is never cleaned, and the output seems to not recognize that it is ephemeral.

logs

18/01/04 16:07:04 INFO SparkContext: Invoking stop() from shutdown hook
18/01/04 16:07:04 INFO SparkUI: Stopped Spark web UI at http://172.17.0.2:4040
18/01/04 16:07:04 INFO StandaloneSchedulerBackend: Shutting down all executors
18/01/04 16:07:04 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
18/01/04 16:07:04 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/01/04 16:07:04 INFO MemoryStore: MemoryStore cleared
18/01/04 16:07:04 INFO BlockManager: BlockManager stopped
18/01/04 16:07:04 INFO BlockManagerMaster: BlockManagerMaster stopped
18/01/04 16:07:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/01/04 16:07:04 INFO SparkContext: Successfully stopped SparkContext
18/01/04 16:07:04 INFO ShutdownHookManager: Shutdown hook called
18/01/04 16:07:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-05af0c6c-9bb4-49a7-b62d-6a67b83fc749/pyspark-a5c3ba26-8a73-4e22-b314-602f07296267
18/01/04 16:07:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-05af0c6c-9bb4-49a7-b62d-6a67b83fc749
Deleting cluster 'cluster-efdcc4'
cluster is not ephemeral
cluster not deleted 'cluster-efdcc4'

the pods are never deleted,

$ oc get pods
NAME                       READY     STATUS      RESTARTS   AGE
cluster-efdcc4-m-1-l7xds   1/1       Running     0          12m
cluster-efdcc4-w-1-v7kmn   1/1       Running     0          12m
pyspark-m6va-cb82v         0/1       Completed   0          12m
pyspark-y8bl-1-build       0/1       Completed   0          30m

expected result

all cluster pods should be deleted after the job has completed.

possible cause

i think that the way the $ephemeral variable is being calculated in this function in the common start script is probably causing the issues here. it probably needs to account for jobs differently than deployments.

The text was updated successfully, but these errors were encountered:

tmckayus · 2018-01-04T16:38:25Z

This is a known limitation, since ephemeral-ness is tracked via labels on deploymentconfigs.

We need another solution for jobs

elmiko added the bug label Jan 4, 2018

tmckayus added enhancement and removed bug labels Apr 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ephemeral clusters not getting deleted for jobs #162

ephemeral clusters not getting deleted for jobs #162

elmiko commented Jan 4, 2018 •

edited

Loading

tmckayus commented Jan 4, 2018

ephemeral clusters not getting deleted for jobs #162

ephemeral clusters not getting deleted for jobs #162

Comments

elmiko commented Jan 4, 2018 • edited Loading

steps to reproduce

observed result

expected result

possible cause

tmckayus commented Jan 4, 2018

elmiko commented Jan 4, 2018 •

edited

Loading