Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark operator crashes if it fails to hit spark-cluster service #284

Open
erikerlandson opened this issue Feb 19, 2020 · 0 comments
Open

Comments

@erikerlandson
Copy link

I made the mistake of manually deleting some spark cluster pods that had been spun up by odh via this operator. The spark operator got a REST call failure when it tried to hit the corresponding spark master service endpoint during its reconciliation pass, and the entire operator crashed. Once that happened, it also failed to respond to new cluster requests coming from ODH - like it wasn't ever making it to the "new cluster" part of the reconciliation pass.

Anyway, should make the various operations during the reconciliation pass error tolerant, so if it encounters some exception on one operation, it can isolate that in try/catch and move on to reconcile all the other spark clusters it is managing gracefully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant