You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I made the mistake of manually deleting some spark cluster pods that had been spun up by odh via this operator. The spark operator got a REST call failure when it tried to hit the corresponding spark master service endpoint during its reconciliation pass, and the entire operator crashed. Once that happened, it also failed to respond to new cluster requests coming from ODH - like it wasn't ever making it to the "new cluster" part of the reconciliation pass.
Anyway, should make the various operations during the reconciliation pass error tolerant, so if it encounters some exception on one operation, it can isolate that in try/catch and move on to reconcile all the other spark clusters it is managing gracefully.
The text was updated successfully, but these errors were encountered:
I made the mistake of manually deleting some spark cluster pods that had been spun up by odh via this operator. The spark operator got a REST call failure when it tried to hit the corresponding spark master service endpoint during its reconciliation pass, and the entire operator crashed. Once that happened, it also failed to respond to new cluster requests coming from ODH - like it wasn't ever making it to the "new cluster" part of the reconciliation pass.
Anyway, should make the various operations during the reconciliation pass error tolerant, so if it encounters some exception on one operation, it can isolate that in try/catch and move on to reconcile all the other spark clusters it is managing gracefully.
The text was updated successfully, but these errors were encountered: