Skip to content

Releases: snowplow-incubator/snowplow-event-recovery

Version 0.7.2

26 Jun 16:25
Compare
Choose a tag to compare

This version adds an iglu-webhook workaround for snowplow-badrows inconsistency in encoding query parameters.

As a reminder, since 0.7.0, recovery comes with a useful cli for simplifying testing and validating configurations.

Version 0.7.1

18 Jun 12:05
Compare
Choose a tag to compare

Fixes recovering failed events for querystring objects containing name-value pair objects with values not encoded as string.

Version 0.7.0

24 Apr 17:42
Compare
Choose a tag to compare

This release changes configuration semantics and adds local testing cli.

Configuration semantics change

Previously, only the first config matching bad row was triggered. Now, all configs for which conditions are fulfilled will be combined
and ran in a sequence. This makes modularising configurations and building reusable scenarios much easier.

Local testing CLI

Previously we used out-of-tree script which was hard to manage and got deprecated along the way. N
The current CLI allows for validating configurations:

snowplow-event-recovery validate --config /opt/scenarios.json
OK! Config valid

And running small-scale tests outputting both good and bad events into files:

snowplow-event-recovery run --config $PWD/scenarios.json -o /tmp -i $PWD/input/ --resolver $PWD/resolver.json
Config loaded
Enriching event...
Enriched event 2dfeb9b7-5a87-4214-8a97-a8b23176856b
...
Total Lines: 156, Recovered: 142

It allows for configuring enrichments (minus ones that depend on upstream storage for now) and setting custom iglu resolver:

snowplow-event-recovery run --config $PWD/scenarios.json -o /tmp -i $PWD/input/ --resolver $PWD/resolver.json --enrichments $PWD/enrichments

Total Lines: 3, Recovered: 0
OK! Successfully ran recovery.

Version 0.6.1

23 Nov 11:42
Compare
Choose a tag to compare

This patch release brings a couple of small improvements to the process of recovering base64 encoded values. It also provides dependency upgrades to mitigate security vulnerabilities.

Changes

Allow filtering base64-encoded fields in conditions (#134)
Upgrade dependencies to mitigate security vulnerabilities (#136)
Handle url-unsafe Base64 encodings (#137)

Version 0.6.0

25 May 19:35
Compare
Choose a tag to compare

This release deprecates Spark runner. Spark Kinesis connection is not reliable enough for data recovery.
Meanwhile we've had overall good experience with Flink's connector. Therefore we bring Flink runner to event-recovery.

Spark runner is unlikely to see any new major releases in future. We advise moving to Flink.

Flink runner

Event recovery jobs in EMR are ran using our dataflow-runner CLI. An example configuration for running is available in this repository.

To run a transient EMR cluster and execute the job, run:

dataflow-runner run-transient \
  --emr-playbook $REPO_DIR/spark-playbook.json.tmpl \
  --emr-config $REPO_DIR/spark-cluster.json.tmpl \
  --vars bucket,$BUCKET_INPUT,region,AWS_$REGION,subnet,$AWS_SUBNET,role,$AWS_IAM_ROLE,keypair,$AWS_KEYPAIR,client,$JOB_OWNER,version,$RECOVERY_VERSION,config,$RECOVERY_CONFIG,resolver,$IGLU_RESOLVER,output,$KINESIS_OUTPUT,inputdir,$BUCKET_INPUT_DIRECTORY,interval,$INTERVAL

Checkpointing

The job conditionally allows checkpointing to allow for job restarts. Kinesis connector which is a part of the checkpointing mechanism guarantees at-least-once delivery semantics and provides backpressure to the job.
To enable checkpointing set --checkpoint-interval and --checkpoint-dir flags.

Monitoring

EMR Flink Job emits a wide range of metrics through Cloudwatch Agent which is installed during cluster bootstrap using a provided script. Custom metrics include:

  • number of unrecoverable events in the run: taskmanager_container_XXXX_XXX__Process_0_events_unrecoverable
  • number of failed recovery attempts taskmanager_container_XXXX_XXX__Process_0_events_failed
  • number of recovered events: taskmanager_container_XXXX_XXX__Process_0_events_recovered
  • total number of events submitted for processing: taskmanager_container_XXXX_XXX__Process_0_numRecordsIn

The metrics are delivered to snowplow/event-recovery Cloudwatch namespace.

Given a one-second aggregation in Cloudwatch the diagram should show an always increasing metrics that should balance themselves up to total sum.

ℹ️ Please note, that Flink's numRecordsOut metrics for sinks does not reflect an actual number of records saved in the sink.

Changes

Set event recovery version dynamically in bad rows (#84)
Add stable Flink kinesis job and deprecate Spark (#132)
Add devenv shell and pre-commit hooks
Add dataflow-runner templates

Version 0.5.2

28 Apr 06:15
Compare
Choose a tag to compare

This is a minor release adding exporting Spark metrics to Cloudwatch.
To configure the export set two new flags --cloudwatch-namespace and --cloudwatch-dimensions.
For example, to use snowplow/event-recovery namespace and customer: com.snplw.eng, job: recovery dimensions, dataflow-runner section for this job may look like this:

{
        "type": "CUSTOM_JAR",
        "name": "snowplow-event-recovery",
        "actionOnFailure": "CANCEL_AND_WAIT",
        "jar": "command-runner.jar",
        "arguments": [
          "spark-submit",
          "--class",
          "com.snowplowanalytics.snowplow.event.recovery.Main",
          "--master",
          "yarn",
          "--deploy-mode",
          "cluster",
          "s3://recovery-test/snowplow-event-recovery-spark-0.5.2.jar",
          "--input",
          "hdfs:///local/to-recover/",
          "--directoryOutput",
          "hdfs:///local/recovered/",
          "--region",
          "eu-central-1",
          "--failedOutput",
          "s3://recovery-test/bad-rows/left",
          "--unrecoverableOutput",
          "s3://ecovery-test/bad-rows/left/unrecoverable",
          "--resolver",
          "eyJzY2hlbWEiOiJpZ2x1OmNvbS5zbm93cGxvd2FuYWx5dGljcy5pZ2x1L3Jlc29sdmVyLWNvbmZpZy9qc29uc2NoZW1hLzEtMC0xIiwiZGF0YSI6eyJjYWNoZVNpemUiOjAsInJlcG9zaXRvcmllcyI6W3sibmFtZSI6ICJJZ2x1IENlbnRyYWwiLCJwcmlvcml0eSI6IDAsInZlbmRvclByZWZpeGVzIjogWyAiY29tLnNub3dwbG93YW5hbHl0aWNzIiBdLCJjb25uZWN0aW9uIjogeyJodHRwIjp7InVyaSI6Imh0dHA6Ly9pZ2x1Y2VudHJhbC5jb20ifX19XX19",
          "--config",
          "eyAic2NoZW1hIjogImlnbHU6Y29tLnNub3dwbG93YW5hbHl0aWNzLnNub3dwbG93L3JlY292ZXJpZXMvanNvbnNjaGVtYS80LTAtMCIsICJkYXRhIjogeyAiaWdsdTpjb20uc25vd3Bsb3dhbmFseXRpY3Muc25vd3Bsb3cuYmFkcm93cy9lbnJpY2htZW50X2ZhaWx1cmVzL2pzb25zY2hlbWEvMS0wLTAiOiBbeyJuYW1lIjogInBhc3N0aHJvdWdoIiwgImNvbmRpdGlvbnMiOiBbXSwgInN0ZXBzIjogW119XX19",
          "--cloudwatch-namespace",
          "snowplow/event-recovery",
          "--cloudwatch-dimensions",
          "customer:com.snplw.eng;job:recovery"
        ]
      }

Changes

Add basic cloudwatch metrics (#128)
Remove flink (#127)
Add CPFV handling for double Base64-encoded payloads (#127)

Version 0.5.1

12 Aug 18:49
Compare
Choose a tag to compare

Bump framework dependencies (#121)
Fix Snyk github action (#119)

Version 0.5.0

15 Apr 15:59
Compare
Choose a tag to compare

A maintenance dependency-update release. Does not include new features or bugfixes.

Changes

Bump cats to 2.7.0 (#112)
Bump circe-optics to 0.14.1 (#109)
Bump circe to 0.14.1 (#108)
Bump cats-effect to 2.5.3 (#107)
Bump monocle-macro to 2.1.0 (#106)
Bump cats-core to 2.6.1 (#105)
Bump snowplow-badrows to 2.1.1 (#104)
Bump iglu-scala-client to 1.1.1 (#103)
Bump spark to 3.1.2 (#102)
Bump jackson-databind to 2.10.5.1 (#101)
Bump slf4j to 1.7.36 (#100)
Bump beam-runners-google-cloud-dataflow-java to 2.36.0 (#99)
Bump scio to 0.11.5 (#98)
Add tag for beam sink (#115)
Add bootstrap script for EMR (#118)
Add detailed check for integration spec (#110)
Change Docker base image to eclipse-temurin:11-jre-focal (#93)

Version 0.4.1

28 Mar 08:30
Compare
Choose a tag to compare

This is a bugfix release addressing too generic handling of ue_px and iglu-webhook logic.

Changes

Only extract iglu-webhook messages to top-level event (#96)

Version 0.4.0

17 Feb 12:32
Compare
Choose a tag to compare

Fixes and improvements release adding ue_pr support, Add step and array multi-matching.

Changes

Bump sbt-native-packager to 1.7.6 (#89)
Bump beam base image to 0.2.0 (#88)
Bump scalafmt to 1.6.1 (#85)
Use github actions for CI (#87)
Raise instead of not modifying when querystring invalid (#86)
Coerce ue_pr to CollectorPayload from RawEvent (#81)
Add check transformation for url-encoded fields (closes #77)
Greedy array matcher (close #91)
Add Add step (#80)