Skip to content

Commit 19f7ed2

Browse files
committed
Merge branch 'main' of github.com:lifemapper/bison into main
2 parents f0396a5 + 4223f67 commit 19f7ed2

File tree

6 files changed

+199
-167
lines changed

6 files changed

+199
-167
lines changed

_sphinx_config/index.rst

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Current
99
------------
1010

1111
.. toctree::
12-
:maxdepth: 2
12+
:maxdepth: 1
1313

1414
pages/about
1515
pages/workflow
@@ -18,23 +18,28 @@ Setup AWS
1818
------------
1919

2020
.. toctree::
21-
:maxdepth: 2
21+
:maxdepth: 1
2222

2323
pages/aws/aws_setup
24+
pages/aws/ec2_setup
25+
pages/aws/lambda
26+
pages/aws/roles
27+
pages/aws/automation
2428

2529
Using BISON
2630
------------
2731

2832
.. toctree::
29-
:maxdepth: 2
33+
:maxdepth: 1
3034

31-
pages/interaction/about
35+
pages/interaction/debug
36+
pages/interaction/deploy
3237

33-
History
38+
Old Stuff
3439
------------
3540

3641
.. toctree::
37-
:maxdepth: 2
42+
:maxdepth: 1
3843

3944
pages/history/year4_planB
4045
pages/history/year4_planA
Lines changed: 64 additions & 105 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,75 @@
1-
Create lambda function to initiate processing
1+
Workflow Automation
2+
#####################################
3+
4+
Lambda Functions For Workflow Steps
5+
=====================================
6+
7+
Overview
8+
----------
9+
Lambda functions:
10+
11+
* can run for 15 minutes or less
12+
* must contain only a single function
13+
14+
Therefore, long-running processes and computations that require a complex programming
15+
are less suitable for Lambda. The alternative we use for this workflow is to use
16+
Lambda to launch an EC2 instance to complete the processing.
17+
18+
For the BISON workflow the first step is to annotate RIIS records with GBIF accepted
19+
taxa, a process that takes 30-60 minutes to resolve the approximately 15K names in RIIS.
20+
21+
The final step is to build 2d matrices, species by region, from the data, and compute
22+
biogeographic statistics on them. This process requires more complex code which is
23+
present in the BISON codebase.
24+
25+
In both cases, we install the code onto the newly launched EC2 instance, and build a
26+
Docker container to install all dependencies and run the code.
27+
28+
In future iterations, we will download a pre-built Docker image.
29+
30+
More detailed setup instructions in lambda
31+
32+
33+
Initiate Workflow on a Schedule
234
------------------------------------------------
3-
* Create a lambda function for execution when the trigger condition is activated,
4-
aws/events/bison_find_current_gbif_lambda.py
535

6-
* This trigger condition is a file deposited in the BISON bucket
36+
Step 1: Annotate RIIS with GBIF accepted taxa
37+
......................................
738

8-
* TODO: change to the first of the month
39+
This ensures that we can match RIIS records with the GBIF records that we
40+
will annotate with RIIS determination. This process requires sending the scientific
41+
name in the RIIS record to the GBIF 'species' API, to find the accepted name,
42+
`acceptedScientificName` (and GBIF identifier, `acceptedTaxonKey`).
943

10-
* The lambda function will delete the new file, and test the existence of
11-
GBIF data for the current month
44+
* Create an AWS EventBridge Schedule
1245

13-
* TODO: change to mount GBIF data in Redshift, subset, unmount
46+
* Create a lambda function for execution when the trigger condition is activated, in
47+
this case, the time/date in the schedule.
48+
aws/lambda/bison_s0_annotate_riis_lambda.py
1449

15-
Edit the execution role for lambda function
16-
--------------------------------------------
17-
* Under Configuration/Permissions see the Execution role Role name
18-
(bison_find_current_gbif_lambda-role-fb05ks88) automatically created for this function
19-
* Open in a new window and under Permissions policies, Add permissions
50+
* The lambda function will make sure the data to be created does not already exist
51+
in S3, execute if needed, return if it does not.
2052

21-
* bison_s3_policy
22-
* redshift_glue_policy
2353

24-
Create trigger to initiate lambda function
54+
55+
Triggering execution
56+
-------------------------
57+
The first step will be executed on a schedule, such as the second day of the month
58+
(GBIF data is deposited on the first day of the month).
59+
60+
Scheduled execution (Temporary): Each step after the first, is also executed on a
61+
schedule, roughly estimating completion of the previous step. These steps with a
62+
dependency on previous outputs will first check for the existence of required inputs,
63+
failing immediately if inputs are not present.
64+
65+
Automatic execution (TODO): The successful deposition of output of the first
66+
(scheduled) and all following steps into S3 or Redshift triggers subsequent steps.
67+
68+
Both automatic and scheduled execution will require examining the logs to ensure
69+
successful completion.
70+
71+
72+
TODO: Create rule to initiate lambda function based on previous step
2573
------------------------------------------------
2674

2775
* Check for existence of new GBIF data
@@ -49,92 +97,3 @@ Create trigger to initiate lambda function
4997
* Select target(s)
5098

5199
* AWS service
52-
53-
54-
Lambda to query Redshift
55-
--------------------------------------------
56-
57-
https://repost.aws/knowledge-center/redshift-lambda-function-queries
58-
59-
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift-data/client/execute_statement.html
60-
61-
* Connect to a serverless workgroup (bison), namespace (bison), database name (dev)
62-
63-
* When connecting to a serverless workgroup, specify the workgroup name and database
64-
name. The database user name is derived from the IAM identity. For example,
65-
arn:iam::123456789012:user:foo has the database user name IAM:foo. Also, permission
66-
to call the redshift-serverless:GetCredentials operation is required.
67-
* need redshift:GetClusterCredentialsWithIAM permission for temporary authentication
68-
with a role
69-
70-
Lambda to start EC2 for task
71-
--------------------------------------------
72-
73-
Lambda functions must be single-function tasks that run in less than 15 minutes.
74-
For complex or long-running tasks we start an EC2 instance containing bison code
75-
and execute it in a docker container.
76-
77-
For each task, the lambda function should create a Spot EC2 instance with a template
78-
containing userdata that will either 1) pull the Github repo, then build the docker
79-
image, or 2) pull a docker image directly.
80-
81-
Annotating the RIIS records with GBIF accepted taxa takes about 1 hour and uses
82-
multiple bison modules.
83-
84-
EC2/Docker setup
85-
....................
86-
87-
* Create the first EC2 Launch Template as a "one-time" Spot instance, no hibernation
88-
89-
* The Launch template should have the following settings::
90-
91-
Name: bison_spot_task
92-
Application and OS Images: Ubuntu
93-
AMI: Ubuntu 24.04 LTS
94-
Architecture: 64-bit ARM
95-
Instance type: t4g.micro
96-
Key pair: bison-task-key
97-
Network settings/Select existing security group: launch-wizard-1
98-
Configure storage: 8 Gb gp3 (default)
99-
Details - encrypted
100-
Advanced Details:
101-
IAM instance profile: bison_ec2_s3_role
102-
Shutdown behavior: Terminate
103-
Cloudwatch monitoring: Enable
104-
Purchasing option: Spot instances
105-
Request type: One-time
106-
107-
* Use the launch template to create a version for each task.
108-
* The launch template task versions must have the task name in the description, and
109-
have the following script in the userdata::
110-
111-
#!/bin/bash
112-
sudo apt-get -y update
113-
sudo apt-get -y install docker.io
114-
sudo apt-get -y install docker-compose-v2
115-
git clone https://github.com/lifemapper/bison.git
116-
cd bison
117-
sudo docker compose -f compose.test_task.yml up
118-
sudo shutdown -h now
119-
120-
121-
* For each task **compose.test_task.yml** must be replaced with the appropriate compose file.
122-
* On EC2 instance startup, the userdata script will execute
123-
* The compose file sets an environment variable (TASK_APP) containing a python module
124-
to be executed from the Dockerfile.
125-
* Tasks should deposit outputs and logfiles into S3.
126-
* After completion, the docker container will stop automatically and the EC2 instance
127-
will stop because of the shutdown command in the final line of the userdata script.
128-
* **TODO**: once the workflow is stable, to eliminate Docker build time, create a Docker
129-
image and download it in userdata script.
130-
131-
Lambda setup
132-
....................
133-
134-
Triggering execution
135-
-------------------------
136-
The first step may be executed on a schedule, such as the second day of the month (since
137-
GBIF data is deposited on the first day of the month).
138-
139-
Upon successful completion, the deposition of successful output into S3 can trigger
140-
following steps.

_sphinx_config/pages/aws/aws_setup.rst

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,13 @@
11
AWS Resource Setup
2+
###################
3+
4+
Security
25
********************
36

47
Create policies and roles
58
===========================================================
69

10+
711
The :ref:`_bison_redshift_lambda_role` allows access to the bison Redshift
812
namespace/workgroup, lambda functions, EventBridge Scheduler, and S3 data.
913
The Trusted Relationships on this policy allow each to
@@ -25,6 +29,36 @@ external schema, you may encounter an error indicating that the "dev" database d
2529
exist. This refers to the external database, and may indicate that the role used by the
2630
command and/or namespace differs from the role granted to the schema upon creation.
2731

32+
Create a Security Group for the region
33+
===========================================================
34+
35+
* Test this group!
36+
* Create a security group for the project/region
37+
38+
* inbound rules allow:
39+
40+
* Custom TCP, port 8000
41+
* Custom TCP, port 8080
42+
* HTTPS, port 80
43+
* HTTPS, port 443
44+
* SSH, port 22
45+
46+
* Consider restricting SSH to campus
47+
48+
* or use launch-wizard-1 security group (created by some EC2 instance creation in 2023)
49+
50+
* inbound rules IPv4:
51+
52+
* Custom TCP 8000
53+
* Custom TCP 8080
54+
* SSH 22
55+
* HTTP 80
56+
* HTTPS 443
57+
58+
* outbound rules IPv4, IPv6:
59+
60+
* All traffic all ports
61+
2862
Redshift Namespace and Workgroup
2963
===========================================================
3064

_sphinx_config/pages/aws/ec2_setup.rst

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,3 +151,50 @@ Hop Limit for AWS communication
151151
--http-endpoint enabled
152152

153153
* or in console, add metadata tag/value HttpPutResponseHopLimit/2
154+
155+
EC2/Docker setup
156+
....................
157+
158+
* Create the first EC2 Launch Template as a "one-time" Spot instance, no hibernation
159+
160+
* The Launch template should have the following settings::
161+
162+
Name: bison_spot_task
163+
Application and OS Images: Ubuntu
164+
AMI: Ubuntu 24.04 LTS
165+
Architecture: 64-bit ARM
166+
Instance type: t4g.micro
167+
Key pair: bison-task-key
168+
Network settings/Select existing security group: launch-wizard-1
169+
Configure storage: 8 Gb gp3 (default)
170+
Details - encrypted
171+
Advanced Details:
172+
IAM instance profile: bison_ec2_s3_role
173+
Shutdown behavior: Terminate
174+
Cloudwatch monitoring: Enable
175+
Purchasing option: Spot instances
176+
Request type: One-time
177+
178+
* Use the launch template to create a version for each task.
179+
* The launch template task versions must have the task name in the description, and
180+
have the following script in the userdata::
181+
182+
#!/bin/bash
183+
sudo apt-get -y update
184+
sudo apt-get -y install docker.io
185+
sudo apt-get -y install docker-compose-v2
186+
git clone https://github.com/lifemapper/bison.git
187+
cd bison
188+
sudo docker compose -f compose.test_task.yml up
189+
sudo shutdown -h now
190+
191+
192+
* For each task **compose.test_task.yml** must be replaced with the appropriate compose file.
193+
* On EC2 instance startup, the userdata script will execute
194+
* The compose file sets an environment variable (TASK_APP) containing a python module
195+
to be executed from the Dockerfile.
196+
* Tasks should deposit outputs and logfiles into S3.
197+
* After completion, the docker container will stop automatically and the EC2 instance
198+
will stop because of the shutdown command in the final line of the userdata script.
199+
* **TODO**: once the workflow is stable, to eliminate Docker build time, create a Docker
200+
image and download it in userdata script.

_sphinx_config/pages/aws/lambda.rst

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
Create lambda function to initiate processing
2+
--------------------------------------------
3+
* Create a lambda function for execution when the trigger condition is activated,
4+
i.e. aws/lambda/bison_s0_test_task_lambda.py
5+
* This trigger condition can be either a schedule (i.e. midnight on the second day of
6+
every month) or a rule (i.e. file matching xxx* deposited in an S3 bucket)
7+
8+
Edit the execution role for lambda function
9+
--------------------------------------------
10+
* Under Configuration/Permissions set the Execution role to the Workflow role
11+
(bison_redshift_lambda_role)
12+
13+
14+
15+
Lambda to query Redshift
16+
--------------------------------------------
17+
18+
https://repost.aws/knowledge-center/redshift-lambda-function-queries
19+
20+
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift-data/client/execute_statement.html
21+
22+
* Connect to a serverless workgroup (bison), namespace (bison), database name (dev)
23+
24+
* When connecting to a serverless workgroup, specify the workgroup name and database
25+
name. The database user name is derived from the IAM identity. For example,
26+
arn:iam::123456789012:user:foo has the database user name IAM:foo. Also, permission
27+
to call the redshift-serverless:GetCredentials operation is required.
28+
* need redshift:GetClusterCredentialsWithIAM permission for temporary authentication
29+
with a role
30+
31+
Lambda to start EC2 for task
32+
--------------------------------------------
33+
34+
Lambda functions must be single-function tasks that run in less than 15 minutes.
35+
For complex or long-running tasks we start an EC2 instance containing bison code
36+
and execute it in a docker container.
37+
38+
For each task, the lambda function should create a Spot EC2 instance with a template
39+
containing userdata that will either 1) pull the Github repo, then build the docker
40+
image, or 2) pull a docker image directly.
41+
42+
Annotating the RIIS records with GBIF accepted taxa takes about 1 hour and uses
43+
multiple bison modules.

0 commit comments

Comments
 (0)