-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit f3cf2be
Showing
112 changed files
with
12,927 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file records the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: ee2edea7cd0e405e075f4650e6ba2801 | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Empty file.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
Welcome to LmBISON - RIIS Analysis | ||
====================================== | ||
|
||
The BISON repository contains data and scripts to annotate GBIF occurrence records | ||
with information regarding geographic location and USGS RIIS status of the record. | ||
|
||
|
||
Current | ||
------------ | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
pages/about | ||
pages/workflow | ||
|
||
Setup AWS | ||
------------ | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
pages/aws/aws_setup | ||
pages/aws/ec2_setup | ||
pages/aws/lambda | ||
pages/aws/roles | ||
pages/aws/automation | ||
|
||
Using BISON | ||
------------ | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
pages/interaction/debug | ||
pages/interaction/deploy | ||
|
||
Old Stuff | ||
------------ | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
pages/history/year4_planB | ||
pages/history/year4_planA | ||
pages/history/year3 | ||
pages/history/year5 | ||
pages/history/aws_experiments | ||
|
||
* :ref:`genindex` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
About | ||
======== | ||
|
||
The `Lifemapper BISON repository <https://github.com/lifemapper/bison>`_ is an open | ||
source project supported by USGS award G19AC00211. | ||
|
||
The aim of this repository is to provide a workflow for annotating and analyzing a | ||
large set of United States specimen occurrence records for the USGS BISON project. | ||
|
||
.. image:: ../.static/lm_logo.png | ||
:width: 150 | ||
:alt: Lifemapper |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
Workflow Automation | ||
##################################### | ||
|
||
Lambda Functions For Workflow Steps | ||
===================================== | ||
|
||
Overview | ||
---------- | ||
Lambda functions: | ||
|
||
* can run for 15 minutes or less | ||
* must contain only a single function | ||
|
||
Therefore, long-running processes and computations that require a complex programming | ||
are less suitable for Lambda. The alternative we use for this workflow is to use | ||
Lambda to launch an EC2 instance to complete the processing. | ||
|
||
For the BISON workflow the first step is to annotate RIIS records with GBIF accepted | ||
taxa, a process that takes 30-60 minutes to resolve the approximately 15K names in RIIS. | ||
|
||
The final step is to build 2d matrices, species by region, from the data, and compute | ||
biogeographic statistics on them. This process requires more complex code which is | ||
present in the BISON codebase. | ||
|
||
In both cases, we install the code onto the newly launched EC2 instance, and build a | ||
Docker container to install all dependencies and run the code. | ||
|
||
In future iterations, we will download a pre-built Docker image. | ||
|
||
More detailed setup instructions in lambda | ||
|
||
|
||
Initiate Workflow on a Schedule | ||
------------------------------------------------ | ||
|
||
Step 1: Annotate RIIS with GBIF accepted taxa | ||
...................................... | ||
|
||
This ensures that we can match RIIS records with the GBIF records that we | ||
will annotate with RIIS determination. This process requires sending the scientific | ||
name in the RIIS record to the GBIF 'species' API, to find the accepted name, | ||
`acceptedScientificName` (and GBIF identifier, `acceptedTaxonKey`). | ||
|
||
* Create an AWS EventBridge Schedule | ||
|
||
* Create a lambda function for execution when the trigger condition is activated, in | ||
this case, the time/date in the schedule. | ||
aws/lambda/bison_s0_annotate_riis_lambda.py | ||
|
||
* The lambda function will make sure the data to be created does not already exist | ||
in S3, execute if needed, return if it does not. | ||
|
||
|
||
|
||
Triggering execution | ||
------------------------- | ||
The first step will be executed on a schedule, such as the second day of the month | ||
(GBIF data is deposited on the first day of the month). | ||
|
||
Scheduled execution (Temporary): Each step after the first, is also executed on a | ||
schedule, roughly estimating completion of the previous step. These steps with a | ||
dependency on previous outputs will first check for the existence of required inputs, | ||
failing immediately if inputs are not present. | ||
|
||
Automatic execution (TODO): The successful deposition of output of the first | ||
(scheduled) and all following steps into S3 or Redshift triggers subsequent steps. | ||
|
||
Both automatic and scheduled execution will require examining the logs to ensure | ||
successful completion. | ||
|
||
|
||
TODO: Create rule to initiate lambda function based on previous step | ||
------------------------------------------------ | ||
|
||
* Check for existence of new GBIF data | ||
* Use a blueprint, python, "Get S3 Object" | ||
* Function name: bison_find_current_gbif_lambda | ||
* S3 trigger: | ||
|
||
* Bucket: arn:aws:s3:::gbif-open-data-us-east-1 | ||
|
||
* Create a rule in EventBridge to use as the trigger | ||
|
||
* Event source : AWS events or EventBridge partner events | ||
* Sample event, "S3 Object Created", aws/events/test_trigger_event.json | ||
* Creation method: Use pattern form | ||
* Event pattern | ||
|
||
* Event Source: AWS services | ||
* AWS service: S3 | ||
* Event type: Object-Level API Call via CloudTrail | ||
* Event Type Specifications | ||
|
||
* Specific operation(s): GetObject | ||
* Specific bucket(s) by name: arn:aws:s3:::bison-321942852011-us-east-1 | ||
|
||
* Select target(s) | ||
|
||
* AWS service |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
AWS Resource Setup | ||
################### | ||
|
||
Security | ||
******************** | ||
|
||
Create policies and roles | ||
=========================================================== | ||
|
||
|
||
The :ref:`_bison_redshift_lambda_role` allows access to the bison Redshift | ||
namespace/workgroup, lambda functions, EventBridge Scheduler, and S3 data. | ||
The Trusted Relationships on this policy allow each to | ||
|
||
The :ref:`_bison_redshift_lambda_role_trusted_relationships policy allow | ||
|
||
The :ref:`_bison_ec2_s3_role` allows an EC2 instance to access the public S3 data and | ||
the bison S3 bucket. Its trust relationship grants AssumeRole to ec2 and s3 services. | ||
This role will be assigned to an EC2 instance that will initiate | ||
computations and compute matrices. | ||
|
||
The :ref:`_bison_redshift_s3_role` allows Redshift to access public S3 data and | ||
the bison S3 bucket, and allows Redshift to perform glue functions. Its trust | ||
relationship grants AssumeRole to redshift service. | ||
|
||
Make sure that the same role granted to the namespace is used for creating an external | ||
schema and lambda functions. When mounting external data as a redshift table to the | ||
external schema, you may encounter an error indicating that the "dev" database does not | ||
exist. This refers to the external database, and may indicate that the role used by the | ||
command and/or namespace differs from the role granted to the schema upon creation. | ||
|
||
Create a Security Group for the region | ||
=========================================================== | ||
|
||
* Test this group! | ||
* Create a security group for the project/region | ||
|
||
* inbound rules allow: | ||
|
||
* Custom TCP, port 8000 | ||
* Custom TCP, port 8080 | ||
* HTTPS, port 80 | ||
* HTTPS, port 443 | ||
* SSH, port 22 | ||
|
||
* Consider restricting SSH to campus | ||
|
||
* or use launch-wizard-1 security group (created by some EC2 instance creation in 2023) | ||
|
||
* inbound rules IPv4: | ||
|
||
* Custom TCP 8000 | ||
* Custom TCP 8080 | ||
* SSH 22 | ||
* HTTP 80 | ||
* HTTPS 443 | ||
|
||
* outbound rules IPv4, IPv6: | ||
|
||
* All traffic all ports | ||
|
||
Redshift Namespace and Workgroup | ||
=========================================================== | ||
|
||
Namespace and Workgroup | ||
------------------------------ | ||
|
||
A namespace is storage-related, with database objects and users. A workspace is | ||
a collection of compute resources such as security groups and other properties and | ||
limitations. | ||
https://docs.aws.amazon.com/redshift/latest/mgmt/serverless-workgroup-namespace.html | ||
|
||
External Schema | ||
------------------------ | ||
The command below creates an external schema, redshift_spectrum, and also creates a | ||
**new** external database "dev". It appears in the console to be the same "dev" | ||
database that contains the public schema, but it is separate. Also note the IAM role | ||
used to create the schema must match the role attached to the namespace:: | ||
|
||
CREATE external schema redshift_spectrum | ||
FROM data catalog | ||
DATABASE dev | ||
IAM_ROLE 'arn:aws:iam::321942852011:role/bison_redshift_s3_role' | ||
CREATE external database if NOT exists; |
Oops, something went wrong.