Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bq_auth() fails in GCP AI Platform Notebooks #138

Closed
jennybc opened this issue May 1, 2020 · 19 comments
Closed

bq_auth() fails in GCP AI Platform Notebooks #138

jennybc opened this issue May 1, 2020 · 19 comments

Comments

@jennybc
Copy link
Member

jennybc commented May 1, 2020

Manually transferring from bigrquery, since cross-org transfer is not possible.

This is about auth in GCP AI Platform Notebooks, so a combination of "on GCP" and "in a Jupyter Notebook".

r-dbi/bigrquery#340

@MarkEdmondson1234
Copy link
Contributor

MarkEdmondson1234 commented May 2, 2020

Is this still an issue? googleAuthR/gargle are installed in the R notebooks and credentials_gce(scopes = "https://www.googleapis.com/auth/cloud-platform", service_account = service_account) should work?

I have an example here: https://github.com/MarkEdmondson1234/googleAuthR/blob/master/inst/jupyter/googleAuthR.ipynb

@jennybc
Copy link
Member Author

jennybc commented May 2, 2020

Interesting! I have not looked into this personally. But it would be great to learn that all is well. I’ll leave open for a bit to encourage further comment (or maybe I’ll try it myself). But I consider this semi-closed, then.

@jennybc
Copy link
Member Author

jennybc commented May 2, 2020

@ZainRizvi If you can take a look at @MarkEdmondson1234's example and react here, that would be great. I'm submitting a gargle release to CRAN soon.

@jennybc
Copy link
Member Author

jennybc commented May 3, 2020

It is possible that the same print debugging described in #130 (comment) would reveal useful information here as well and generate some forward momentum.

@snowlover173
Copy link

snowlover173 commented Oct 8, 2020

@jennybc and @MarkEdmondson1234,
We are trying to use big_auth() to authenticate with an email from the AI Platform R notebook or GCE(RStudio) to BigQuery.

based on the documentation we run:
library(bigrquery)
bq_auth(
email = '[email protected]',
path = NULL,
scopes = c("https://www.googleapis.com/auth/bigquery",
"https://www.googleapis.com/auth/cloud-platform"),
cache = gargle::gargle_oauth_cache(),
use_oob = gargle::gargle_oob_default(),
token = NULL
)

sql <- 'SELECT * FROM target_projectid.dataset.table LIMIT 1'
tb <- bq_project_query(source_project, sql)
bq_table_download(tb, max_results = 10)

But it doesn't work... we have tried many versions of the key-values in the big_auth(),
Also tried whatever is suggested in the
r-dbi/bigrquery#340

We don't even get the question to pass the token. it runs without any output.
If we don't use the big_auth() and run the queries. it automatically fetches the service account(attached to the AI notebook) credentials and runs based on its permission. But we don't want this.. because different developers have access to different datasets outside of this project. So we need to make it work with individuals' credentials.

The access logic works with the python SDK. we don't have any problem with this access design in terms of GCP when we run python but when we follow the big_auth() documentation to do the same with R, we don't get prompted to access a url and get credentials and the big_auth(email) seems doesn't work. We tried it with RStudio as well and it didn't work. Because we have 2 types of R projects, with R notebook and with RStudio.

@MarkEdmondson1234
Copy link
Contributor

MarkEdmondson1234 commented Oct 9, 2020

I'm pretty sure bq_auth() with email won't work as it's interactive and jupyter notebooks are in non-interactive realm.

But if you auth instead with the GCE auth (gargle::credentials_gce() ) it should work as it will reuse the auth that comes with the notebooks, or you can upload a json account service key and auth via that.

@snowlover173
Copy link

@MarkEdmondson1234 this is what we dont want to do. AI notebook and Rstudio at the moment use the service acccount which is attached to the instance to interact with datasets in BQ.
But it doesnt work with our use case.
Not everyone has access to all datasets.
People login to gcp with their email and that emails is being used to give access to datasets.
Lets say Aand B are developers and are allowed to work on datasets C and D , repectively.
Those datasets will be shared with their emails accordingly. And they should call the BQ with their own credential to be able to run queries.

This pattern works with python because the python sdk has a client that calls.
Client has a param, named credentials which are the tempo token which we can get from the generated url.

But we need the same for R. The user needs to call the dataset with their email not the service account of the machine.

@MarkEdmondson1234
Copy link
Contributor

Have you tried use_oob=TRUE ?

@snowlover173
Copy link

We tried big_auth with use_oob=TRUE, it didn't do anything on the RStudio running on a GCE, and the R AI notebook. However, when one of our R developers ran it from their RStudio installed on their personal laptop, it worked.
It opened a window to select the email and proceeded with the authentication. It generated an auth file in the gargle OAuth folder. After saving that file on the RStudio-GCE, it worked on GCE as well. But this approach didn't work for the notebook.
Now the problems:
1- not everyone has an Rstudio installed on their laptop because we give them a work environment within GCP, and I am not sure how convenient it is to ask any contractor to install RStudio and go through this process.
2- It doesn't work on AI notebooks.
Is there any workaround?

@ryandaryl
Copy link

ryandaryl commented Oct 26, 2020

The issue identified above by snowlover173 appears to be caused by the following:

  1. The interactive prompt created by httr::oauth2.0_token doesn't respond to input in Jupyter notebooks or the web version of R Studio. If I find the time, I will open a new issue with a reprex in httr repo.
  2. bigrquery::bq_auth(use_oob = True, email = FALSE) doesn't completely override other credential discovery methods called by bq_auth. As a result, when some other credentials exist (I haven't had time to work out exactly which ones), bq_auth(use_oob = True, email = FALSE) uses those credentials, and doesn't initiate the interactive OAuth dance. This is probably an issue with bigrquery, rather than gargle.

For anyone who wants to use the interactive, browser based OAuth dance for user authentication in AI platform notebooks or web-based R Studio, I recommend the following workaround:

Step 1

Complete the OAuth dance using gcloud command line utility. The gcloud interactive prompt works in both Jupyter notebooks and web-based R Studio. (If gcloud is not installed, you can run install.packages('cloudml')to invoke an installer from R).

gcloud oauth login

Step 2

Run the commands below. Make sure you edit it first, so that the email address is the same address you chose in the browser during Step 1.

cd ~/.config/gcloud/legacy_credentials/[email protected]
cp adc.json ../../application_default_credentials.json

This will copy the oauth2.0 token generated by gcloud to a location where gargle expects to find it. As a result, all R libraries that use gargle will be successfully authenticated.

@herambgadgil
Copy link

Getting similar error -
The script works without any issues and without asking for credentials on ai-notebooks cli. But when the same script is passed on as a job submit to ai-platform, it gives out error.

# R-script
library(bigrquery)
library(tidyverse)

# Initial Parameters
strProject <- "cloud-project"
strSQL <- "select * from rssp_test.tbl_iris"

bqPull <- bq_project_query(strProject, strSQL)
tblPull <- bq_table_download(bqPull)

tblTest <- tblPull %>% 
  group_by(species) %>% 
  summarise(
    msl = mean(sepal_length), 
    msw = mean(sepal_width), 
    mpl = mean(petal_length), 
    mpw = mean(petal_width)
  ) %>% 
  select(species, msl, msw, mpl, mpw)

Thanks!
Heramb
bigrquery_error

@herambgadgil
Copy link

++
The way our security and architecture is structured in GCP, having a .json file for credentials in individual script is not feasible. The ai-notebooks are linked to a Service Account which is enabled for each instance.

@abalter
Copy link

abalter commented May 1, 2022

Windows Users

I've been struggling with bigrquery authentication for a long time. Recently, I haven't been able to use it at all.

"Step 2" above completely fixed my problem. Now I don't even need to do any interactive authentication!

I set about determining how to do the same procedure in Windows. Follow the same steps, but you will find the gcloud directory in %AppData% (i.e. C:\Users\username\AppData\Roaming\gcloud).

@jennybc
Copy link
Member Author

jennybc commented Oct 24, 2022

This is a combination of #187 (how do you indicate that you don't want to use the default service account) and how to do an OOB-ish type of flow in GCE-hosted notebooks. Now that we've inlined much more of httr to enable pseudo-OOB, it's possible that the latter is getting more in reach.

@jennybc
Copy link
Member Author

jennybc commented Feb 22, 2023

Progress on #140 may also be relevant here.

@jennybc
Copy link
Member Author

jennybc commented Apr 13, 2023

I think this might be solved through some combination of recent developments (some of these are only in dev gargle but I will release soon):

  • Pseudo-OOB flow: Pseudo-OOB auth flow #214.
  • Improved user flow in Google Colab / Jupyter notebooks: Make interactive user auth work on Google Colab #246. If that doesn't automatically translate into better behaviour in GCP AI Platform Notebooks, I will need someone to help me find an environment variable or similar that can be used to detect we are running in a GCP AI Platform Notebook.
  • Improved ability to explicitly skip the automatic auth with the default service account on GCE:
    cred_funs_add(credentials_gce = NULL) 

I'm going to close this because I am hopeful that it's solved. Even if it's not, I think it might be down to advice on usage for this context. In any case, it would be better to start with a fresh issue, if necessary, based on current gargle.

@jennybc jennybc closed this as completed Apr 13, 2023
@abalter
Copy link

abalter commented Dec 1, 2023

@jennybc --- I just had to do "step 2" again. I have a VM on gcp. Same problem. Then I remembered that I had found a solution. Searched around until I found my post. I copied the file and now I'm authenticating.

Why is this still an issue?

@jennybc
Copy link
Member Author

jennybc commented Dec 1, 2023

If we're going to restart this discussion, it needs to be a fresh issue, confirming that the problem, whatever it may be, is seen with current gargle, and with a full account of what we're talking about.

@abalter
Copy link

abalter commented Dec 1, 2023

Ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants