|
1 |
| -Create lambda function to initiate processing |
| 1 | +Workflow Automation |
| 2 | +##################################### |
| 3 | + |
| 4 | +Lambda Functions For Workflow Steps |
| 5 | +===================================== |
| 6 | + |
| 7 | +Overview |
| 8 | +---------- |
| 9 | +Lambda functions: |
| 10 | + |
| 11 | +* can run for 15 minutes or less |
| 12 | +* must contain only a single function |
| 13 | + |
| 14 | +Therefore, long-running processes and computations that require a complex programming |
| 15 | +are less suitable for Lambda. The alternative we use for this workflow is to use |
| 16 | +Lambda to launch an EC2 instance to complete the processing. |
| 17 | + |
| 18 | +For the BISON workflow the first step is to annotate RIIS records with GBIF accepted |
| 19 | +taxa, a process that takes 30-60 minutes to resolve the approximately 15K names in RIIS. |
| 20 | + |
| 21 | +The final step is to build 2d matrices, species by region, from the data, and compute |
| 22 | +biogeographic statistics on them. This process requires more complex code which is |
| 23 | +present in the BISON codebase. |
| 24 | + |
| 25 | +In both cases, we install the code onto the newly launched EC2 instance, and build a |
| 26 | +Docker container to install all dependencies and run the code. |
| 27 | + |
| 28 | +In future iterations, we will download a pre-built Docker image. |
| 29 | + |
| 30 | +More detailed setup instructions in lambda |
| 31 | + |
| 32 | + |
| 33 | +Initiate Workflow on a Schedule |
2 | 34 | ------------------------------------------------
|
3 |
| -* Create a lambda function for execution when the trigger condition is activated, |
4 |
| - aws/events/bison_find_current_gbif_lambda.py |
5 | 35 |
|
6 |
| - * This trigger condition is a file deposited in the BISON bucket |
| 36 | +Step 1: Annotate RIIS with GBIF accepted taxa |
| 37 | +...................................... |
7 | 38 |
|
8 |
| - * TODO: change to the first of the month |
| 39 | +This ensures that we can match RIIS records with the GBIF records that we |
| 40 | +will annotate with RIIS determination. This process requires sending the scientific |
| 41 | +name in the RIIS record to the GBIF 'species' API, to find the accepted name, |
| 42 | +`acceptedScientificName` (and GBIF identifier, `acceptedTaxonKey`). |
9 | 43 |
|
10 |
| - * The lambda function will delete the new file, and test the existence of |
11 |
| - GBIF data for the current month |
| 44 | +* Create an AWS EventBridge Schedule |
12 | 45 |
|
13 |
| - * TODO: change to mount GBIF data in Redshift, subset, unmount |
| 46 | +* Create a lambda function for execution when the trigger condition is activated, in |
| 47 | + this case, the time/date in the schedule. |
| 48 | + aws/lambda/bison_s0_annotate_riis_lambda.py |
14 | 49 |
|
15 |
| -Edit the execution role for lambda function |
16 |
| --------------------------------------------- |
17 |
| -* Under Configuration/Permissions see the Execution role Role name |
18 |
| - (bison_find_current_gbif_lambda-role-fb05ks88) automatically created for this function |
19 |
| -* Open in a new window and under Permissions policies, Add permissions |
| 50 | + * The lambda function will make sure the data to be created does not already exist |
| 51 | + in S3, execute if needed, return if it does not. |
20 | 52 |
|
21 |
| - * bison_s3_policy |
22 |
| - * redshift_glue_policy |
23 | 53 |
|
24 |
| -Create trigger to initiate lambda function |
| 54 | + |
| 55 | +Triggering execution |
| 56 | +------------------------- |
| 57 | +The first step will be executed on a schedule, such as the second day of the month |
| 58 | +(GBIF data is deposited on the first day of the month). |
| 59 | + |
| 60 | +Scheduled execution (Temporary): Each step after the first, is also executed on a |
| 61 | +schedule, roughly estimating completion of the previous step. These steps with a |
| 62 | +dependency on previous outputs will first check for the existence of required inputs, |
| 63 | +failing immediately if inputs are not present. |
| 64 | + |
| 65 | +Automatic execution (TODO): The successful deposition of output of the first |
| 66 | +(scheduled) and all following steps into S3 or Redshift triggers subsequent steps. |
| 67 | + |
| 68 | +Both automatic and scheduled execution will require examining the logs to ensure |
| 69 | +successful completion. |
| 70 | + |
| 71 | + |
| 72 | +TODO: Create rule to initiate lambda function based on previous step |
25 | 73 | ------------------------------------------------
|
26 | 74 |
|
27 | 75 | * Check for existence of new GBIF data
|
@@ -49,92 +97,3 @@ Create trigger to initiate lambda function
|
49 | 97 | * Select target(s)
|
50 | 98 |
|
51 | 99 | * AWS service
|
52 |
| - |
53 |
| - |
54 |
| -Lambda to query Redshift |
55 |
| --------------------------------------------- |
56 |
| - |
57 |
| -https://repost.aws/knowledge-center/redshift-lambda-function-queries |
58 |
| - |
59 |
| -https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift-data/client/execute_statement.html |
60 |
| - |
61 |
| -* Connect to a serverless workgroup (bison), namespace (bison), database name (dev) |
62 |
| - |
63 |
| -* When connecting to a serverless workgroup, specify the workgroup name and database |
64 |
| - name. The database user name is derived from the IAM identity. For example, |
65 |
| - arn:iam::123456789012:user:foo has the database user name IAM:foo. Also, permission |
66 |
| - to call the redshift-serverless:GetCredentials operation is required. |
67 |
| -* need redshift:GetClusterCredentialsWithIAM permission for temporary authentication |
68 |
| - with a role |
69 |
| - |
70 |
| -Lambda to start EC2 for task |
71 |
| --------------------------------------------- |
72 |
| - |
73 |
| -Lambda functions must be single-function tasks that run in less than 15 minutes. |
74 |
| -For complex or long-running tasks we start an EC2 instance containing bison code |
75 |
| -and execute it in a docker container. |
76 |
| - |
77 |
| -For each task, the lambda function should create a Spot EC2 instance with a template |
78 |
| -containing userdata that will either 1) pull the Github repo, then build the docker |
79 |
| -image, or 2) pull a docker image directly. |
80 |
| - |
81 |
| -Annotating the RIIS records with GBIF accepted taxa takes about 1 hour and uses |
82 |
| -multiple bison modules. |
83 |
| - |
84 |
| -EC2/Docker setup |
85 |
| -.................... |
86 |
| - |
87 |
| -* Create the first EC2 Launch Template as a "one-time" Spot instance, no hibernation |
88 |
| - |
89 |
| -* The Launch template should have the following settings:: |
90 |
| - |
91 |
| - Name: bison_spot_task |
92 |
| - Application and OS Images: Ubuntu |
93 |
| - AMI: Ubuntu 24.04 LTS |
94 |
| - Architecture: 64-bit ARM |
95 |
| - Instance type: t4g.micro |
96 |
| - Key pair: bison-task-key |
97 |
| - Network settings/Select existing security group: launch-wizard-1 |
98 |
| - Configure storage: 8 Gb gp3 (default) |
99 |
| - Details - encrypted |
100 |
| - Advanced Details: |
101 |
| - IAM instance profile: bison_ec2_s3_role |
102 |
| - Shutdown behavior: Terminate |
103 |
| - Cloudwatch monitoring: Enable |
104 |
| - Purchasing option: Spot instances |
105 |
| - Request type: One-time |
106 |
| - |
107 |
| -* Use the launch template to create a version for each task. |
108 |
| -* The launch template task versions must have the task name in the description, and |
109 |
| - have the following script in the userdata:: |
110 |
| - |
111 |
| - #!/bin/bash |
112 |
| - sudo apt-get -y update |
113 |
| - sudo apt-get -y install docker.io |
114 |
| - sudo apt-get -y install docker-compose-v2 |
115 |
| - git clone https://github.com/lifemapper/bison.git |
116 |
| - cd bison |
117 |
| - sudo docker compose -f compose.test_task.yml up |
118 |
| - sudo shutdown -h now |
119 |
| - |
120 |
| - |
121 |
| -* For each task **compose.test_task.yml** must be replaced with the appropriate compose file. |
122 |
| -* On EC2 instance startup, the userdata script will execute |
123 |
| -* The compose file sets an environment variable (TASK_APP) containing a python module |
124 |
| - to be executed from the Dockerfile. |
125 |
| -* Tasks should deposit outputs and logfiles into S3. |
126 |
| -* After completion, the docker container will stop automatically and the EC2 instance |
127 |
| - will stop because of the shutdown command in the final line of the userdata script. |
128 |
| -* **TODO**: once the workflow is stable, to eliminate Docker build time, create a Docker |
129 |
| - image and download it in userdata script. |
130 |
| - |
131 |
| -Lambda setup |
132 |
| -.................... |
133 |
| - |
134 |
| -Triggering execution |
135 |
| -------------------------- |
136 |
| -The first step may be executed on a schedule, such as the second day of the month (since |
137 |
| -GBIF data is deposited on the first day of the month). |
138 |
| - |
139 |
| -Upon successful completion, the deposition of successful output into S3 can trigger |
140 |
| -following steps. |
0 commit comments