Skip to content

Releases: GoogleCloudPlatform/DataflowTemplates

Dataflow Templates 2023-02-21-00_RC00

21 Feb 22:12
Compare
Choose a tag to compare

Release Week of 2023-02-21

Note: This release is in the process of rolling out. It may not be in your region yet.

New Templates

  • Pub/Sub to Kafka

Improvements

  • Improve flex-template tutorial to use the plugin

Bug Fixes

  • Import/Export: Support spanner.commit_timestamp columns in PostgreSQL dialect
  • Prevent casting to TemporalAccessor when reading datetimes from MSSQL

Contributors

(Listed alphabetically)

  • andreigurau
  • bvolpato
  • pranavbhandari24

Dataflow Templates 2023-02-07-00_RC01

07 Feb 20:10
Compare
Choose a tag to compare

Release Week of 2023-02-07

Improvements

  • Improve and cleanup README.md, moving plugins documentation up
  • [Avro Import Template] Logging schema operations.
  • [Datastream To Spanner Template] Add transactions tags to the writes.
  • [Performance Tests] Draft of PS Lite to BigTable perf test.
  • [Datastream to BigQuery Template] Add flag and option to use deterministic job id
  • [Templates Plugin] Add conscrypt to classpath before other libraries, exclude shaded JAR containing libconscrypt from common
  • [Integration Tests] waitForConditionAndCancel should trigger job cancellation instead of draining
  • [Syndeo Template] Test for create never behavior on BQ
  • [Integration Tests] Create BigQueryToElasticsearchIT, include GCSToElasticsearchIT ES6 variant
  • Adds new license() rules, load statements and, default_applicable_license attributes for root third party packages.
  • [Integration Tests] Increase coverage in Bulk(Compressor|Decompressor)IT
  • [Integration Tests] Create test for FileFormatConversion (+ Avro and Parquet utilities)
  • [Plugin] Improve metadata parent, allow checkstyle check across project
  • [JdbcToBigQuery] ConnectionProperties should not be mandatory
  • [Integration Tests] Initial test for TextImportPipeline (GCS to Spanner) template
  • [Integration Tests] Initial test for PubSubToElasticsearch template
  • [Integration Tests] Change Kafka Bootstrap Server / topics list param to accept commas.
  • [Common] Allow usage of a different project id when doing merge
  • [UDF] Create unit tests for udf-samples
  • [DataStream to BigQuery] Remove determinstic id flag and assign merge jobs to workers by table name and merge concurrency limit instead of randomly

Bug Fixes

  • [Integration Tests] Improve names to use generic Beam terms
  • [Syndeo Template] Generate BQ and PS configs properly
  • [Datastream to BigQuery Template] Fix deterministic uuid generation
  • [Templates Plugin] Fix stage for projects containing ":"
  • [Security] Bump mysql-connector-java to 8.0.30
  • [Syndeo Template] Update proto-java to 3.21.9 to resolve conflicts

Contributors

@bvolpato
@pabloem
@xianhualiu
@rarsan

Dataflow Templates 2023-01-29-00_RC00

30 Jan 21:47
Compare
Choose a tag to compare

Release Week of 2023-01-29

Improvements

  • [Elasticsearch] Allow specifying a path on the connection url.
  • [Datastore-to-BQ] Flex template with support for BQ Storage Write API
  • Update Templates to Beam 2.44.0 (except kafka to bigquery)
  • Add better PostgreSQL support for datastream-to-spanner template
  • A number of integration tests added.

Bug Fixes

  • Fix bug in JdbcToBigQuery and DataplexJdbcIngestion templates where microseconds portion of timestamp would be incorrectly written.

Contributors

@an2x
@bvolpato
@oleg-semenov
@pabloem
@Polber
@pranavbhandari24

Dataflow Templates 2023-01-17-01_RC00

20 Jan 21:00
Compare
Choose a tag to compare

Release Week of 2023-01-17

Note: This release also includes changes from the release 2023-01-10-00_RC00, which was cancelled. If you're looking for a version that includes a bugfix from 2023-01-10-00_RC00, please use the latest version 2023-01-17-01_RC00 instead.

Improvements

  • [Spanner Change Stream Templates] Support import/export for change streams in Cloud Spanner PostgreSQL-dialect databases.
  • Added JDBC and Spanner sinks to StreamingDataGenerator template.
  • A number of integration tests and resource managers added.

Bug Fixes

  • [Datastream Templates] A fix for a bug sometimes causing duplicated records to be written to the target database.

Contributors

@Abacn
@nancyxu123
@bvolpato
@pranavbhandari24
@nirfi
@Polber

Dataflow Templates 2023-01-10-00_RC00

13 Jan 19:32
Compare
Choose a tag to compare

Release Week of 2023-01-10

Note: This release has been cancelled and hasn't been fully rolled out to production. Please use version 2023-01-17-01_RC00 or later instead, which includes these changes.

Improvements

  • [Integration Tests] Create Elasticsearch Resource Manager and Create GCS to Elasticsearch integration test
  • [Documentation] Improve documentation for Dataflow CSV import pipeline trailing delimiter.
  • [Flex Templates] Add SecretManagerUtils in v2 common directory

Bug Fixes

  • [Security] Bump postgresql dependency due to CVE-2022-21724 and CVE-2022-31197
  • [Spanner Tests] Fix an invalid column default value in RandomDdlGenerator.

Contributors

@andreigurau
@bvolpato
@oleg-semenov

Dataflow Templates 2023-01-03-00_RC00

10 Jan 18:58
Compare
Choose a tag to compare

Release Week of 2023-01-03

Improvements

[TextIOToBigQuery] Integration test for TextIOToBigQuery flex template
[All Templates] Enable available Nashorn engine ES6 support for Teleport templates
[Syndeo] An integration test for the Kafka-to-BQ flow in Syndeo
[StreamingDataGenerator] Add schema templates to StreamingDataGenerator dataflow template
[Flex Templates] Use structured logging for uncaught exceptions, to log them with ERROR severity/level.
[Flex Templates] Refactor v2 to remove ValueProviders
[Templates in googlecloud-to-googlecloud] Improve dependency management

Bug Fixes

[Security] Bump commons-beanutils dependency due to CVE-2019-10086
[Security] Bump spring-expression due to CVE-2022-22950
[Security] Bump commons-configuration2 dependency due to CVE-2022-33980
[Integration Tests] Build plugin dependencies using a specific target folder to prevent ClassLoader issues
[Spanner Templates] Unit test fixes
[Templates Plugin] Improve plugins documentation + sanitize bucket name arguments
[MongoDbToBigQuery] Integration test fixes
[Datastore Templates] Do not use @default for Firestore Workers, as it overrides Datastore options

Contributors

@andreigurau
@bvolpato
@oleg-semenov
@pabloem
@Polber
@pranavbhandari24
@nancyxu123
@ryanmadden-google

Dataflow Templates 2022-12-13-00_RC01

14 Dec 00:45
Compare
Choose a tag to compare

Release Week of 2022-12-13

Note: This release is in the process of rolling out. It may not be in your region yet.

Improvements

  • [BigQuery] Add support to Storage Write API to several templates interacting with BigQuery.
  • [DatastreamToSpanner] Support Postgres as a source
  • [Integration Tests] Added several integration / end-to-end tests and resource managers
  • [PubsubAvroToBigQuery] Upgrade PubsubAvroToBigQuery template to support Storage Write API.
  • [Security] Bump dependencies as a response to CVEs
  • [Templates Plugin] Several improvements to metadata annotations and templates plugin

Bug Fixes

  • [All Templates] Add imperative version for Jackson/FasterXML to match Beam 2.43.0
  • [Changestreams] Fix WriteDataChangeRecordsToAvroTest serialization issue
  • [Flex Templates] Make classpath deterministic for Flex Template executions, by always making sure that Conscrypt is loaded first.

Contributors

@Harwayne
@bvolpato
@oleg-semenov
@pranavbhandari24

Dataflow Templates 2022-12-05-00_RC00

02 Dec 21:45
Compare
Choose a tag to compare

Release Week of 2022-12-05

Improvements

  • [All templates] Introduce metadata annotations.
  • [DataStreamToBigQuery] Expose mergeConcurrency option and re-throw error on merge statement fail.
  • Trigger Java PR workflow when any XML is changed + files are deleted.
  • [Classic templates] Support JSONB arrays.
  • [PubSubCdcToBigQuery] Support maxStreamingBatchSize parameter
  • [DatastreamToSpanner] Add changes for the new HarbourBridge session file with tableID and columnID support.
  • [All templates] Upgrade Beam version to 2.43.
  • [Classic templates] Prepare plugin infra to test classic templates + Create BulkCompressionIT
  • [Integration Tests] Do not make artifactBucket mandatory (only if bucketName not provided for ITs)
  • [DataStreamToSpanner] Change default values for dlqRetryMinutes and dlqMaxRetryCount params.
  • [Integration Tests] Avoid Joiner conflict, and improve plugin staging speed
  • [Flex templates] Plain text logging for Flex Templates unit tests
  • [Integration Tests] Improve plugin bucket parameter requirements
  • [SpannerChangeStreamsTemplates] Simplify the code of setting experiments for spanner change streams to BigQuery and spanner change streams to GCS templates.
  • [Integration Tests] Create MongoDB Resource Manager
  • [Integration Tests] Create MongoDBToBigQueryIntegrationTest
  • [Integration Tests] Add TestContainers framework
  • [Integration Tests] Create PubsubAvroToBigQueryIT + prepare profile to run integration tests together
  • [MongoDBToBigQuery] Create udf for MongoDB BigQuery Templates
  • [Security] Update hadoop version affected by CVE-2022-25168
  • Improve Templates Plugin instructions
  • [Syndeo templates] Separating JSON build

Bug Fixes

  • [Classic templates] WindowedFilenamePolicy's dayPattern defaults to dd instead of DD
  • [JDBC templates] Do not log unencrypted values/keys to the console
  • [DataStream templates] Rethrow exception from ExtractGcsFile so that Dataflow will retry the pardo
  • [Integration Tests] Fix integration tests parameters passing
  • [Flex templates] Fix log dependencies (log4j initialization error)

Contributors

@bvolpato
@oleg-semenov
@pabloem
@Polber
@pranavbhandari24
@theshanbhag

Dataflow Templates 2022-11-16-00_RC00

17 Nov 23:15
Compare
Choose a tag to compare

Release Week of 2022-11-16

New Templates

Text-to-BigQuery Flex template with support for BQ Storage Write API

Improvements

  • [All templates] Upgrade Beam version to 2.42.
  • [Batch Flex template with BQ Sinks] BQ Storage Write API options for Batch Flex Templates
  • [Syndeo] Add Jib to syndeo-template module and bind it to the package phase for the Syndeo template
  • [Syndeo] Handle schemas that are not provided by configuration
  • [Flex templates] Enable structured logging for v2 templates
  • [All templates] Enforce Conscrypt version 2.5.2 (matching Beam 2.42.0)
  • [All templates] Update commons-text
  • [BigTable templates] update bigtable-beam-import version
  • [Spanner templates] Supported new value capture types (NEW_VALUES and NEW_ROW).

Bug Fixes

  • pom file name fix in .github/workflows/prepare-java-cache.yml
  • [Unit tests] Reducing verbosity of unit test logs

Contributors

@bvolpato
@oleg-semenov
@pabloem
@pranavbhandari24
@zhoufek

Dataflow Templates 2022-11-01-00_RC00

03 Nov 14:33
Compare
Choose a tag to compare

Release Week of 2022-11-01

Note: This release is in the process of rolling out. It may not be in your region yet.

Bug Fixes

[CDC] Print exception with "Avro File Read Failure" logs. Fixes #450

Contributors

Bruno Volpato @bvolpato