Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on S3 Bucket Access options #383

Open
wildintellect opened this issue Mar 6, 2024 · 3 comments
Open

Clarification on S3 Bucket Access options #383

wildintellect opened this issue Mar 6, 2024 · 3 comments
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@wildintellect
Copy link
Collaborator

When accessing a protected bucket, like all NASA DAACs, that require authentication (EDL), to get a short term S3 session, you must make sure not to use the environment option:
os.environ['AWS_NO_SIGN_REQUEST'] = 'YES'
This option is only for some public buckets that specifically do not accept account information. If you accidentally include this, tools like rasterio will skip your AWS Session.

We need to add a note to pages that use rio.env
https://docs.maap-project.org/en/latest/search.html?q=rio.env&check_keywords=yes&area=default

And possibly make a clearer general page. I noticed https://docs.maap-project.org/en/latest/technical_tutorials/access/lpdaac_gedi_access.html only talks about GEDI S3 but we don't have a page about all EarthDataCloud data.

@Phabs can provide some code examples

@wildintellect wildintellect added the documentation Improvements or additions to documentation label Mar 6, 2024
@pahbs
Copy link

pahbs commented Mar 6, 2024

Here is a reproducible example of how to access an ORNL DAAC dataset:


import rasterio as rio
import boto3
from maap.maap import MAAP
maap = MAAP(maap_host='api.maap-project.org')

def get_aws_session_DAAC(creds):
    """Create a Rasterio AWS Session with Credentials"""
    #creds = maap.aws.earthdata_s3_credentials('https://data.lpdaac.earthdatacloud.nasa.gov/s3credentials')
    boto3_session = boto3.Session(
        aws_access_key_id=creds['accessKeyId'], 
        aws_secret_access_key=creds['secretAccessKey'],
        aws_session_token=creds['sessionToken'],
        region_name='us-west-2'
    )
    return AWSSession(boto3_session)

# URL of DAAC s3 file
s3_url = 's3://ornl-cumulus-prod-protected/above/DeciduousFractionl_CanopyCover/data/deciduousfraction_2015_prediction.tif'

os.environ['AWS_NO_SIGN_REQUEST'] = 'NO'

rio_env_session = rio.Env(get_aws_session_DAAC(maap.aws.earthdata_s3_credentials('https://data.ornldaac.earthdata.nasa.gov/s3credentials')))

with rio_env_session:
    with rasterio.open(s3_url, mode='r') as dataset:
        print(dataset.profile)

@smk0033 smk0033 modified the milestone: 3.1.5 Mar 8, 2024
@smk0033
Copy link
Contributor

smk0033 commented Mar 8, 2024

@wildintellect in those notebooks, I'm adding the environment variable and setting it to 'NO' like above and making an additional note to set it to no/not include it at all or the data cannot be accessed - is that fine for the users?

@wildintellect
Copy link
Collaborator Author

We probably need a new page specifically about S3 access, where we discuss when to use or not to use. Then can reference any relevant page to it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants