Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

master-replica-0 Failed building wheel for cloudml #130

Open
rtjohn opened this issue Feb 27, 2018 · 1 comment
Open

master-replica-0 Failed building wheel for cloudml #130

rtjohn opened this issue Feb 27, 2018 · 1 comment

Comments

@rtjohn
Copy link

rtjohn commented Feb 27, 2018

I've been following the tutorial from here:
https://tensorflow.rstudio.com/tools/cloudml/articles/getting_started.html

Submitting a job to Google cloud that works on my local machine produces the following:

>cloudml_train("R/BuildingNetwork.R")
Submitting training job to CloudML...
Job 'cloudml_2018_02_27_220654435' successfully submitted.

View job in the Cloud Console at:
https://console.cloud.google.com/ml/jobs/cloudml_2018_02_27_220654435?project=dogvcat-196520

View logs at:
https://console.cloud.google.com/logs?resource=ml.googleapis.com%2Fjob_id%2Fcloudml_2018_02_27_220654435&project=dogvcat-196520

Check job status with:     job_status("cloudml_2018_02_27_220654435")

Collect job output with:   job_collect("cloudml_2018_02_27_220654435")

After collect, view with:  view_run("runs/cloudml_2018_02_27_220654435")
> job_status("cloudml_2018_02_27_220654435")
 $ createTime    : chr "2018-02-27T22:09:40Z"
 $ endTime       : chr "2018-02-27T22:17:11Z"
 $ errorMessage  : chr "The replica master 0 exited with a non-zero status of 1."
 $ jobId         : chr "cloudml_2018_02_27_220654435"
 $ startTime     : chr "2018-02-27T22:10:04Z"
 $ state         : chr "FAILED"
 $ trainingInput :List of 3
  ..$ jobDir        : chr "gs://dogvcat-196520/r-cloudml/staging"
  ..$ region        : chr "us-central1"
  ..$ runtimeVersion: chr "1.4"
 $ trainingOutput:List of 1
  ..$ consumedMLUnits: num 0.09

View job in the Cloud Console at:
https://console.cloud.google.com/ml/jobs/cloudml_2018_02_27_220654435?project=dogvcat-196520

View logs at:
https://console.cloud.google.com/logs?resource=ml.googleapis.com%2Fjob_id%2Fcloudml_2018_02_27_220654435&project=dogvcat-196520

The logs show a few errors. The first is:

2018-02-27 14:10:43.490 PST
master-replica-0 Failed building wheel for cloudml

The next error is:

master-replica-0 Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-0yV9J_-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-VBGpOM-record/install-record.txt --single-version-externally-managed --compile --user --prefix=" failed with error code 1 in /tmp/pip-0yV9J_-build/

Followed by:

master-replica-0  Command '['pip', 'install', '--user', '--upgrade', '--force-reinstall', '--no-deps', u'cloudml-1.0.0.0.tar.gz']' returned non-zero exit status 1

Followed by:

The replica master 0 exited with a non-zero status of 1.

I'm also getting errors related to copying files throughout. Such as:

error: can't copy 'cloudml-model/datasmall': doesn't exist or not a regular file

I tried just remove the directory listed the first time (wasn't a necessary directory) but then this error just showed up for a different directory.

Any ideas?

@javierluraschi
Copy link
Contributor

@rtjohn I would first try training simple MNIST by running:

dir.create("mnist-train")
file.copy(system.file("examples/mnist/train.R", package = "cloudml"), "mnist-train")
setwd("mnist-train")
cloudml::cloudml_train()

Would the above script train correctly? If it does, then I would start by moving the data you want to copy into "mnist-train" and rerun to make sure data can also be copied, then I would switch to the original training script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants