Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bq_auth() fails in GCP AI Platform Notebooks #340

Closed
ZainRizvi opened this issue Jul 23, 2019 · 11 comments
Closed

bq_auth() fails in GCP AI Platform Notebooks #340

ZainRizvi opened this issue Jul 23, 2019 · 11 comments

Comments

@ZainRizvi
Copy link

(apologies if this is the incorrect place to file this bug)

When authenticating using bg_auth() / gargle::token_fetch(), the google redirect url expects to redirect the browser to a new port on localhost. However, when using a managed service such as GCP AI Platform Notebooks the localhost redirect means you end up at the wrong location.

Expected behavior: bq_auth() should have an option to authenticate with Google using just the command line.

I suspect there's a bug in gargle's authentication code that prevents it from detecting when it should use the offline auth method. For example, if you run 'gcloud auth login' on your local dev box, the authentication url generated will have redirect_uri set to a port on localhost. But the same command on GCP AI Platform Notebooks will instead set redirect_uri to urn:ietf:wg:oauth:2.0:oob. The auth url for the latter option will result in you getting a string to copy/paste into your command line to complete the authentication process without having to access a second port.

Repro steps:

  1. In the GCP console go to AI Platform -> Notebooks -> New Instance -> R 3.x.x -> Create
  2. Wait for the notebook to be created and then click "OPEN JUPYTERLAB"
  3. Run the following commands to install and load bigrquery:
install.packages("httpuv")
install.packages("gargle")
install.packages("bigrquery")
library(httpuv)
library(gargle)
library(bigrquery)
  1. Try to authenticate yourself:
    bq_auth(email="[email protected]")

  2. The authentication hangs forever. Even if you copy/paste the resulting url, the redirect will fail since the 'localhost' is on a VM running in GCP, not your local box

Warning message in file(txt):
“'raw = FALSE' but '/home/jupyter/.config/gcloud' is not a regular file”Warning message in open.connection(con, "rb"):
“cannot open file '/home/jupyter/.config/gcloud': it is a directory”Waiting for authentication in browser...
Press Esc/Ctrl + C to abort
Please point your browser to the following url: 
https://accounts.google.com/o/oauth2/auth?client_id=603366585132-0l3n5tr582q443rnomebdeeo0156b2bc.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email&redirect_uri=http%3A%2F%2Flocalhost%3A1410%2F&response_type=code&state=pzQAF2AlxL&login_hint=zainr%40google.com
  1. The above steps will fail even if you try to authenticate from within JupyterLab's terminal

I suspect that this is the login flow you want to follow: https://developers.google.com/identity/protocols/OAuth2ForDevices

Though it's possible that I simply haven't understood how the library is supposed to be used in this scenario.

Thanks for all your hard work!

@ZainRizvi ZainRizvi changed the title Can't bq_auth fails in GCP AI Platform Notebooks bq_auth() fails in GCP AI Platform Notebooks Jul 23, 2019
@jennybc
Copy link
Collaborator

jennybc commented Jul 23, 2019

What happens if you use out-of-bound auth?

For example, what if you do bq_auth(use_oob = TRUE)?

@jennybc
Copy link
Collaborator

jennybc commented Jul 23, 2019

@craigcitro In the context @ZainRizvi describes, which credential fetching function should be succeeding and taking care of this? Should we really be falling all the way through to OAuth2?

@ZainRizvi
Copy link
Author

If I use bq_auth(use_oob = TRUE) in the Jupyter Notebook I get the output:

Warning message in file(txt):
“'raw = FALSE' but '/home/jupyter/.config/gcloud' is not a regular file”Warning message in open.connection(con, "rb"):
“cannot open file '/home/jupyter/.config/gcloud': it is a directory”
Error: Can't get Google credentials.
Are you running bigrquery in a non-interactive session? Consider:
  * Call `bq_auth()` directly with all necessary specifics.

Traceback:

1. bq_auth(use_oob = TRUE)
2. stop("Can't get Google credentials.\n", "Are you running bigrquery in a non-interactive session? Consider:\n", 
 .     "  * Call `bq_auth()` directly with all necessary specifics.\n", 
 .     call. = FALSE)

If I instead open up R in a terminal (still within JupyterLab) and run that command I get:

> library(httpuv)
> library(gargle)
> library(bigrquery)
> bq_auth(use_oob = TRUE)

1: Yes
2: No

Selection: 

Note that it's asking me to choose between Yes and No but the question that I'm responding yes/no to is not displayed.

I selected "yes" and got the following error:

> library(httpuv)
> library(gargle)
> library(bigrquery)
> bq_auth(use_oob = TRUE)

1: Yes
2: No

Selection: 1
Enter authorization code: /usr/bin/xdg-open: 778: /usr/bin/xdg-open: www-browser: not found
/usr/bin/xdg-open: 778: /usr/bin/xdg-open: links2: not found
/usr/bin/xdg-open: 778: /usr/bin/xdg-open: elinks: not found
/usr/bin/xdg-open: 778: /usr/bin/xdg-open: links: not found
/usr/bin/xdg-open: 778: /usr/bin/xdg-open: lynx: not found
/usr/bin/xdg-open: 778: /usr/bin/xdg-open: w3m: not found
xdg-open: no method available for opening 'https://accounts.google.com/o/oauth2/auth?client_id=603366585132-0l3n5tr582q443rnomebdeeo0156b2bc.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code'

Copy/pasting that link into a web browser did take me through the appropriate auth workflow and gave me a code to copy/paste into the bigrquery auth process. I pasted it below the error message even though the console hadn't appeared to be asking for an input. Pasting that code and hitting "enter" caused the following error message to appear:

xdg-open: no method available for opening 'https://accounts.google.com/o/oauth2/auth?client_id=603366585132-0l3n5tr582q443rnomebdeeo0156b2bc.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code'
4/jwG-xxxxxxxxxx-PdExxxxxxxxxxxxxxxxxxxxxxxxek_1xxxU <==== CODE I PASTED IN
Warning messages:
1: In file(txt) :
  'raw = FALSE' but '/home/jupyter/.config/gcloud' is not a regular file
2: In open.connection(con, "rb") :
  cannot open file '/home/jupyter/.config/gcloud': it is a directory
>

@ZainRizvi
Copy link
Author

Interestingly, some level of auth seems to have succeeded with those previous steps since now when I try to run bq_quth(use_oob = TRUE) in the R terminal I get a different ouput. It seems to have learned my email address at least

> bq_auth(use_oob = TRUE)
The bigrquery package is requesting access to your Google account. Select a pre-authorised account or enter '0' to obtain a new token. Press Esc/Ctrl + C to abort.

1: [email protected]

Selection: 1
Warning messages:
1: In file(txt) :
  'raw = FALSE' but '/home/jupyter/.config/gcloud' is not a regular file
2: In open.connection(con, "rb") :
  cannot open file '/home/jupyter/.config/gcloud': it is a directory
>

However, if I got back to the jupyter notebook and try to bq_auth(use_oob=TRUE) from there it still fails with the same error as before. Running bq_auth(email="[email protected]") in the Jupyter Notebook also still fails with the same error mentioned in my original comment

@jennybc
Copy link
Collaborator

jennybc commented Jul 23, 2019

The Yes/No menu that is printing without its header and the warnings that are leaking through 🙈 have both already been fixed upstream in gargle. So you will need to install dev gargle. I'll do another gargle CRAN release soon that includes those fixes. I'm just working through my release prep for another package (googledrive) first, in case I need to include any other tweaks.

So, please "power cycle" via:

  1. Install dev gargle from GitHub: devtools::install_github("r-dbi/gargle")
  2. Delete any ~/.R/gargle directory that the above attempts might have created

And let's resume the troubleshooting for this platform. To be clear, I don't think the above will fix your problems, but it gets us to a saner and less noisy workflow.

@ZainRizvi
Copy link
Author

Sure, here are the results.

TLDR: I still can't use the bq_auth(use_oob=TRUE) in the notebook. But I can use that to authenticate myself in the jupyter lab terminal and cache the credentials that way. Then if I restart the jupyter notebook kernel (restarting appears to be required) I can use the bq_auth(email="[email protected]") auth method to successfully authenticate myself in the notebook. (Still a bit of a wonky workaround but at least there's a work around)

Detailed results:

Setting up the environment (in a new GCP AI Platform Notebook using R 3.x):

install.packages("httpuv")
install.packages("devtools")
devtools::install_github("r-lib/gargle")
install.packages("bigrquery")
install.packages("readr") # To read BigQuery results

library(httpuv)
library(gargle)
library(bigrquery)

Running bq_auth(email="[email protected]") from inside a notebook hangs with the following output (because it's expecting additional user input which the notebook can't provide):

Waiting for authentication in browser...
Press Esc/Ctrl + C to abort
Please point your browser to the following url: 
https://accounts.google.com/o/oauth2/auth?client_id=603366585132-0l3n5tr582q443rnomebdeeo0156b2bc.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email&redirect_uri=http%3A%2F%2Flocalhost%3A1410%2F&response_type=code&state=YbjZX4Oj2d&login_hint=zxxxxx%40gmail.com

Running bq_auth(use_oob = TRUE) in the notebook fails with the error:

Error: Can't get Google credentials.
Are you running bigrquery in a non-interactive session? Consider:
  * Call `bq_auth()` directly with all necessary specifics.

Traceback:

1. bq_auth(use_oob = TRUE)
2. stop("Can't get Google credentials.\n", "Are you running bigrquery in a non-interactive session? Consider:\n", 
 .     "  * Call `bq_auth()` directly with all necessary specifics.\n", 
 .     call. = FALSE)

Trying use_oob in the Jupyter Lab console results in the following outputs. It seems to be somewhat working despite the unnecessary warning.

Three usability issues with this step:

  1. Note that I had to copy/paste the url into a new browser window but the console output doesn't inform me of that requirement
  2. The console did not make it obvious that it was expecting a code to be pasted in
  3. No confirmation of successful authentication. I had to attempt to reauthenticate myself to confirm that the auth had worked
> library(httpuv)
> library(gargle)
> library(bigrquery)
> bq_auth(use_oob = TRUE)
Is it OK to cache OAuth access credentials in the folder '/home/jupyter/.R/gargle/gargle-oauth' between R sessions?

1: Yes
2: No

Selection: 1
Enter authorization code: /usr/bin/xdg-open: 778: /usr/bin/xdg-open: www-browser: not found
/usr/bin/xdg-open: 778: /usr/bin/xdg-open: links2: not found
/usr/bin/xdg-open: 778: /usr/bin/xdg-open: elinks: not found
/usr/bin/xdg-open: 778: /usr/bin/xdg-open: links: not found
/usr/bin/xdg-open: 778: /usr/bin/xdg-open: lynx: not found
/usr/bin/xdg-open: 778: /usr/bin/xdg-open: w3m: not found
xdg-open: no method available for opening 'https://accounts.google.com/o/oauth2/auth?client_id=603366585132-0l3n5tr582q443rnomebdeeo0156b2bc.apps.googleusercontent.com&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fcloud-platform%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fuserinfo.email&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&response_type=code'
4/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx <===== the GCP auth code I copy/pasted in
> bq_auth(use_oob = TRUE) <===== retrying the auth to see if it worked
The bigrquery package is requesting access to your Google account. Select a pre-authorised account or enter '0' to obtaina new token. Press Esc/Ctrl + C to abort.

1: [email protected]  <===== The auth credentials were cached

Selection: 1
> bq_auth(email="[email protected]")
>

At this point I am able to access BigQuery using the JupyterLab command line. If I restart the notebook kernel I can now run queries against bigquery if I use the email based auth to pull the cached credentials (the use_oob=TRUE option still fails with the previous error):

library(httpuv)
library(gargle)
library(bigrquery)

bq_auth(email="[email protected]")

project_id <- 'my-project-id'
test_query_text <- "SELECT * FROM `bigquery-public-data.usa_names.usa_1910_current` LIMIT 10"
test_results <- query_exec(test_query_text, project_id, use_legacy_sql = FALSE)
test_results 

@jennybc
Copy link
Collaborator

jennybc commented Jul 24, 2019

Thanks for the update. We will try to make this smoother.

cc @craigcitro Would love to hear any observations. These are flows / contexts I have no personal experience with but that I think you were targeting with your initial work on gargle.

@ZainRizvi
Copy link
Author

Fyi, I've posted a blog post describing the workaround. Thinking about it, this issue actually affects any instance of a Jupyter Notebook that's being executed on a remote machine

http://zainrizvi.io/blog/authenticating-to-bigrquery-on-gcp-ai-platform-notebooks/

@jennybc
Copy link
Collaborator

jennybc commented Aug 5, 2019

BTW the stuff I describe above as being only fixed in dev gargle are now fixed in the CRAN version. So, I'm sure this flow is not yet fixed, but the ancillary annoyances should be gone.

@jennybc
Copy link
Collaborator

jennybc commented Aug 5, 2019

Gentle ping again @craigcitro. I'd love your opinion on which flow should kick in here.

@jennybc
Copy link
Collaborator

jennybc commented May 1, 2020

Now tracking in r-lib/gargle#138.

@jennybc jennybc closed this as completed May 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants