-
Notifications
You must be signed in to change notification settings - Fork 4
Setup Guide
Please follow these steps to setup the Snowplow Indicative Relay on AWS Lambda:
If you do not have an Indicative account, go to http://app.indicative.com/login/#/register to create an account.
- If you are a new Indicative user, go to https://app.indicative.com/#/onboarding/snowplow. Then select Snowplow and copy the API Key. Save it, you will need it later.
- If you want to send data to an existing project, go to https://app.indicative.com/#/account/projects
Your AWS Lambda needs to have an Execution Role that allows it to use the Kinesis Stream and CloudWatch. Open the AWS Management Console and follow these steps:
- Go to IAM Management in the Console, choose Roles from the sidebar, then click Create role.
- As shown in the screenshot below, for the type of trusted entity select AWS Service and for the service that will use this role choose Lambda.
- Now you need to choose a permission policy for the role. The Lambda needs to have read access to Kinesis and write access to CloudWatch logs - for that we will choose AWSLambdaKinesisExecutionRole.
- On the next screen provide a name for the newly created role, then click Create role to finish the process.
As with the IAM Role, we will be using the AWS Console to get our Lambda function up and running.
- On the Console navigate to
Lambda
section and clickCreate a function
. Runtime should be Java 8. In the Role dropdown pick Choose an existing role, then in the dropdown below choose the name of the role you have created in the previous part of the guide. Click Create function.
-
The Lambda has been created, although it does not do anything yet. We need to provide the code and configure the function. Take a look at the Function code box. In the Handler textbox paste:
com.snowplowanalytics.indicative.LambdaHandler::recordHandler
From the Code entry type dropdown pick Upload a file from Amazon S3. A textbox labeled S3 Link URL will appear. We are hosting the code through our hosted assets. You will need to choose the S3 bucket in the same region as your AWS Lambda function, for example if your Lambda is
us-east-1
region, paste the following URL:s3://snowplow-hosted-assets-us-east-1/relays/indicative/indicative-relay-0.4.0.jar
in the textbox. Take a look at this table to pick the right bucket name for your region.
- Below Function code settings you will find a section called Environment variables. You need to use these environment variables to configure some additional settings for the relay, such as the the API key and filters.
3.1. Setting up the API key
In the first row, first
column (the key) type INDICATIVE_API_KEY
. In the second column (the value) paste your API Key obtained in
the beginning of this guide.
3.2. Setting up filters
The relay lets you configure the following filters:
-
UNUSED_EVENTS
: events that will not be relayed to Indicative; -
UNUSED_ATOMIC_FIELDS
: fields of the canonical Snowplow event that will not be relayed to Indicative; -
UNUSED_CONTEXTS
: contexts whose fields will not be relayed to Indicative.
Out of the box, the relay is configured to use the following defaults:
Unused events | Unused atomic fields | Unused contexts |
---|---|---|
app_heartbeat | etl_tstamp | application_context |
app_initialized | collector_tstamp | application_error |
app_shutdown | dvce_created_tstamp | duplicate |
app_warning | event | geolocation_context |
create_event | txn_id | instance_identity_document |
emr_job_failed | name_tracker | java_context |
emr_job_started | v_tracker | jobflow_step_status |
emr_job_status | v_collector | parent_event |
emr_job_succeeded | v_etl | performance_timing |
incident | user_fingerprint | timing |
incident_assign | geo_latitude | |
incident_notify_of_close | geo_longitude | |
incident_notify_user | ip_isp | |
job_update | ip_organization | |
load_failed | ip_domain | |
load_succeeded | ip_netspeed | |
page_ping | page_urlscheme | |
s3_notification_event | page_urlport | |
send_email | page_urlquery | |
send_message | page_urlfragment | |
storage_write_failed | refr_urlscheme | |
stream_write_failed | refr_urlport | |
task_update | refr_urlquery | |
wd_access_log | refr_urlfragment | |
pp_xoffset_min | ||
pp_xoffset_max | ||
pp_yoffset_min | ||
pp_yoffset_max | ||
br_features_pdf | ||
br_features_flash | ||
br_features_java | ||
br_features_director | ||
br_features_quicktime | ||
br_features_realplayer | ||
br_features_windowsmedia | ||
br_features_gears | ||
br_features_silverlight | ||
br_cookies | ||
br_colordepth | ||
br_viewwidth | ||
br_viewheight | ||
dvce_ismobile | ||
dvce_screenwidth | ||
dvce_screenheight | ||
doc_charset | ||
doc_width | ||
doc_height | ||
tr_currency | ||
mkt_clickid | ||
etl_tags | ||
dvce_sent_tstamp | ||
refr_domain_userid | ||
refr_device_tstamp | ||
derived_tstamp | ||
event_vendor | ||
event_name | ||
event_format | ||
event_version | ||
event_fingerprint | ||
true_tstamp |
To change the defaults, you can pass in your own lists of events, atomic fields or contexts to be filtered out. For example:
Environment variable key | Environment variable value |
---|---|
UNUSED_EVENTS | page_ping,file_download |
UNUSED_ATOMIC_FIELDS | name_tracker,event_vendor |
UNUSED_CONTEXTS | performance_timing,client_context |
Similarly to setting up the API key, the first column (key) needs to be set to the specified environment variable name in ALLCAPS. The second column (value) is your own list as a comma-separated string with no spaces.
If you only specify the environment variable name but do not provide a list of values, then nothing will be filtered out.
If you do not set any of the environment variables, the defaults will be used.
3.3. Setting up the Indicative API URI
By default, the relay uses the standard URI. To change that, you can set the INDICATIVE_URI
environment variable.
3.4. Setting up the field whose value should be used as the event name for struct
events.
In Snowplow's canonical event model, there's a legacy type of custom structured event, which is known as a struct
or 'structured event'. These are still fairly popular with users, however the value of the event_name
field for those events (which is simply event
) can be confusing. To help group similar events, Snowplow users often designate one of their special fields (most commonly se_action
) to be the 'event name field'.
Since version 0.5.0 by default se_action
is used as the event name field for structured events. But you can override that by setting the Lambda environment variable STRUCTURED_EVENT_NAME_FIELD
to the field whose value you'd rather use, eg se_category
.
- Scroll down a bit and take a look at the Basic settings box. There you can set memory and timeout limits for the Lambda. We recommend setting 256 MB of memory or higher (on AWS Lambda the CPU performance scales linearly with the amount of memory). The timeout should be set quite high - we recommend one and half minute - because of so-called JVM cold starts. The cold starts happen when the Lambda function is invoked for the first time on a new instance and it can take a significant amount of time.
- Now let's add our enriched Kinesis stream as an event source for the function. From the list of triggers in the Designer configuration up top, choose Kinesis.
Take a look at the Configure triggers section which just appeared below. Choose your Kinesis stream that contains Snowplow enriched events. Set the batch size to your liking - 100 is a reasonable setting. Note that this a maximum batch size, the function can be triggered with less records. For the starting position we recommend Trim horizon, which starts processing the stream from an observable start. Click Add button to finish the trigger configuration. Make sure Enable trigger is selected.
- Save the changes by clicking the Save button in the top-right part of the page.
After a while the events should start flowing into Indicative. You can go Settings -> Events and Properties to see incoming event types, change their labels, descriptions and categories.