Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminal crashes on windows but job completes. #145

Open
andrie opened this issue Apr 12, 2018 · 4 comments
Open

Terminal crashes on windows but job completes. #145

andrie opened this issue Apr 12, 2018 · 4 comments

Comments

@andrie
Copy link
Member

andrie commented Apr 12, 2018

This may not be an R issues, but something on the CloudML end.

I received a crash report in the terminal, despite the job still running on CloudML.

This happens after submitting:

cloudml::cloudml_train(...)

Terminal output:

INFO    2018-04-11 17:03:18 +0100       master-replica-0                Copying gs://adv-cloudml-test-195616/r-% Done
cloudml/cache/ubuntu_16044_lts/r_3_4_4/r/hms.tar...
INFO    2018-04-11 17:03:18 +0100       master-replica-0                Copying gs://adv-cloudml-test-195616/r-
cloudml/cache/ubuntu_16044_lts/r_3_4_4/r/cloudml.tar...
INFO    2018-04-11 17:03:18 +0100       master-replica-0                Copying gs://adv-cloudml-test-195616/r-
cloudml/cache/ubuntu_16044_lts/r_3_4_4/r/digest.tar...
INFO    2018-04-11 17:03:18 +0100       master-replica-0                / [0/48 files][    0.0 B/ 61.0 MiB]   0
% Done
IERROR: gcloud crashed (IOError): [Errno 0] Error

If you would like to report this issue, please run the following command:
  gcloud feedback

To check gcloud for common problems, please run the following command:
  gcloud info --run-diagnostics
>>> Job 'cloudml_2018_04_11_155929102' is currently running -- please wait...
>>> [state: RUNNING; last updated 2018-04-11 17:03:48]
Execution halted
Error in shell.exec(url) :
  'C:/Users/apdev/OneDrive/github/experiments/cloudml-deployment/runs/cloudml_2018_04_11_155929102/tfruns.d/vie
w.html' not found
Calls: <Anonymous> -> shell.exec
Execution halted
@andrie
Copy link
Member Author

andrie commented Apr 24, 2018

This still happens. Another terminal dump, in case it helps:

INFO    2018-04-24 14:54:35 +0100       master-replica-0                / [5/48 files][900.0 KiB/ 61.0
MiB]   1% Done
INFO    2018-04-24 14:54:35 +0100       master-replica-0                Copying gs://adv-cloudml-test-1
95616/r-cloudml/cache/ubuntu_16044_lts/r_3_4_4/r/packrat.tar...
INFO    2018-04-24 14:54:35 +0100       master-replica-0                / [6/48 files][  3.0 MiB/ 61.0
MiB]   4% Done
IERROR: gcloud crashed (IOError): [Errno 0] Error

@javierluraschi
Copy link
Contributor

Most likely, this is external and we would need a consistent repro to open an issue with Google CloudML. I've seen this a couple times, but I can't hit this consistently.

@philipus
Copy link

got the same problem by applying mnist_mlp.R (https://github.com/rstudio/keras/blob/master/vignettes/examples/mnist_mlp.R) using cloudml_train on google cloud platform.

I think the download functionality does not work properly. I also do not have a local runs directory created as it does in the mnist_mlp.R script. I think job_collect is the problem

cloudml::job_collect('Project Name', destination = '../runs', view = 'save')

does not copy anything in the destination folder

Any Idea what we can do?

R commands:

library(cloudml)
cloudml_train("mnist_mlp.R", config = "config.yml")

config.yml:

trainingInput:
scaleTier: BASIC
runtimeVersion: "2.1"
pythonVersion: "3.7"

@philipus
Copy link

Most likely, this is external and we would need a consistent repro to open an issue with Google CloudML. I've seen this a couple times, but I can't hit this consistently.

did we make some progress here. I just saw that the issue is open for a long time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants