Releases: GoogleCloudPlatform/DataflowTemplates
Dataflow Templates 2023-02-21-00_RC00
Release Week of 2023-02-21
Note: This release is in the process of rolling out. It may not be in your region yet.
New Templates
- Pub/Sub to Kafka
Improvements
- Improve flex-template tutorial to use the plugin
Bug Fixes
- Import/Export: Support spanner.commit_timestamp columns in PostgreSQL dialect
- Prevent casting to TemporalAccessor when reading datetimes from MSSQL
Contributors
(Listed alphabetically)
- andreigurau
- bvolpato
- pranavbhandari24
Dataflow Templates 2023-02-07-00_RC01
Release Week of 2023-02-07
Improvements
- Improve and cleanup README.md, moving plugins documentation up
- [Avro Import Template] Logging schema operations.
- [Datastream To Spanner Template] Add transactions tags to the writes.
- [Performance Tests] Draft of PS Lite to BigTable perf test.
- [Datastream to BigQuery Template] Add flag and option to use deterministic job id
- [Templates Plugin] Add conscrypt to classpath before other libraries, exclude shaded JAR containing libconscrypt from common
- [Integration Tests] waitForConditionAndCancel should trigger job cancellation instead of draining
- [Syndeo Template] Test for create never behavior on BQ
- [Integration Tests] Create BigQueryToElasticsearchIT, include GCSToElasticsearchIT ES6 variant
- Adds new license() rules, load statements and, default_applicable_license attributes for root third party packages.
- [Integration Tests] Increase coverage in Bulk(Compressor|Decompressor)IT
- [Integration Tests] Create test for FileFormatConversion (+ Avro and Parquet utilities)
- [Plugin] Improve metadata parent, allow checkstyle check across project
- [JdbcToBigQuery] ConnectionProperties should not be mandatory
- [Integration Tests] Initial test for TextImportPipeline (GCS to Spanner) template
- [Integration Tests] Initial test for PubSubToElasticsearch template
- [Integration Tests] Change Kafka Bootstrap Server / topics list param to accept commas.
- [Common] Allow usage of a different project id when doing merge
- [UDF] Create unit tests for udf-samples
- [DataStream to BigQuery] Remove determinstic id flag and assign merge jobs to workers by table name and merge concurrency limit instead of randomly
Bug Fixes
- [Integration Tests] Improve names to use generic Beam terms
- [Syndeo Template] Generate BQ and PS configs properly
- [Datastream to BigQuery Template] Fix deterministic uuid generation
- [Templates Plugin] Fix stage for projects containing ":"
- [Security] Bump mysql-connector-java to 8.0.30
- [Syndeo Template] Update proto-java to 3.21.9 to resolve conflicts
Contributors
Dataflow Templates 2023-01-29-00_RC00
Release Week of 2023-01-29
Improvements
- [Elasticsearch] Allow specifying a path on the connection url.
- [Datastore-to-BQ] Flex template with support for BQ Storage Write API
- Update Templates to Beam 2.44.0 (except kafka to bigquery)
- Add better PostgreSQL support for datastream-to-spanner template
- A number of integration tests added.
Bug Fixes
- Fix bug in JdbcToBigQuery and DataplexJdbcIngestion templates where microseconds portion of timestamp would be incorrectly written.
Contributors
@an2x
@bvolpato
@oleg-semenov
@pabloem
@Polber
@pranavbhandari24
Dataflow Templates 2023-01-17-01_RC00
Release Week of 2023-01-17
Note: This release also includes changes from the release 2023-01-10-00_RC00, which was cancelled. If you're looking for a version that includes a bugfix from 2023-01-10-00_RC00, please use the latest version 2023-01-17-01_RC00 instead.
Improvements
- [Spanner Change Stream Templates] Support import/export for change streams in Cloud Spanner PostgreSQL-dialect databases.
- Added JDBC and Spanner sinks to StreamingDataGenerator template.
- A number of integration tests and resource managers added.
Bug Fixes
- [Datastream Templates] A fix for a bug sometimes causing duplicated records to be written to the target database.
Contributors
@Abacn
@nancyxu123
@bvolpato
@pranavbhandari24
@nirfi
@Polber
Dataflow Templates 2023-01-10-00_RC00
Release Week of 2023-01-10
Note: This release has been cancelled and hasn't been fully rolled out to production. Please use version 2023-01-17-01_RC00 or later instead, which includes these changes.
Improvements
- [Integration Tests] Create Elasticsearch Resource Manager and Create GCS to Elasticsearch integration test
- [Documentation] Improve documentation for Dataflow CSV import pipeline trailing delimiter.
- [Flex Templates] Add SecretManagerUtils in v2 common directory
Bug Fixes
- [Security] Bump postgresql dependency due to CVE-2022-21724 and CVE-2022-31197
- [Spanner Tests] Fix an invalid column default value in RandomDdlGenerator.
Contributors
Dataflow Templates 2023-01-03-00_RC00
Release Week of 2023-01-03
Improvements
[TextIOToBigQuery] Integration test for TextIOToBigQuery flex template
[All Templates] Enable available Nashorn engine ES6 support for Teleport templates
[Syndeo] An integration test for the Kafka-to-BQ flow in Syndeo
[StreamingDataGenerator] Add schema templates to StreamingDataGenerator dataflow template
[Flex Templates] Use structured logging for uncaught exceptions, to log them with ERROR severity/level.
[Flex Templates] Refactor v2 to remove ValueProviders
[Templates in googlecloud-to-googlecloud] Improve dependency management
Bug Fixes
[Security] Bump commons-beanutils dependency due to CVE-2019-10086
[Security] Bump spring-expression due to CVE-2022-22950
[Security] Bump commons-configuration2 dependency due to CVE-2022-33980
[Integration Tests] Build plugin dependencies using a specific target folder to prevent ClassLoader issues
[Spanner Templates] Unit test fixes
[Templates Plugin] Improve plugins documentation + sanitize bucket name arguments
[MongoDbToBigQuery] Integration test fixes
[Datastore Templates] Do not use @default for Firestore Workers, as it overrides Datastore options
Contributors
@andreigurau
@bvolpato
@oleg-semenov
@pabloem
@Polber
@pranavbhandari24
@nancyxu123
@ryanmadden-google
Dataflow Templates 2022-12-13-00_RC01
Release Week of 2022-12-13
Note: This release is in the process of rolling out. It may not be in your region yet.
Improvements
- [BigQuery] Add support to Storage Write API to several templates interacting with BigQuery.
- [DatastreamToSpanner] Support Postgres as a source
- [Integration Tests] Added several integration / end-to-end tests and resource managers
- [PubsubAvroToBigQuery] Upgrade PubsubAvroToBigQuery template to support Storage Write API.
- [Security] Bump dependencies as a response to CVEs
- [Templates Plugin] Several improvements to metadata annotations and templates plugin
Bug Fixes
- [All Templates] Add imperative version for Jackson/FasterXML to match Beam 2.43.0
- [Changestreams] Fix WriteDataChangeRecordsToAvroTest serialization issue
- [Flex Templates] Make classpath deterministic for Flex Template executions, by always making sure that Conscrypt is loaded first.
Contributors
Dataflow Templates 2022-12-05-00_RC00
Release Week of 2022-12-05
Improvements
- [All templates] Introduce metadata annotations.
- [DataStreamToBigQuery] Expose mergeConcurrency option and re-throw error on merge statement fail.
- Trigger Java PR workflow when any XML is changed + files are deleted.
- [Classic templates] Support JSONB arrays.
- [PubSubCdcToBigQuery] Support maxStreamingBatchSize parameter
- [DatastreamToSpanner] Add changes for the new HarbourBridge session file with tableID and columnID support.
- [All templates] Upgrade Beam version to 2.43.
- [Classic templates] Prepare plugin infra to test classic templates + Create BulkCompressionIT
- [Integration Tests] Do not make artifactBucket mandatory (only if bucketName not provided for ITs)
- [DataStreamToSpanner] Change default values for dlqRetryMinutes and dlqMaxRetryCount params.
- [Integration Tests] Avoid Joiner conflict, and improve plugin staging speed
- [Flex templates] Plain text logging for Flex Templates unit tests
- [Integration Tests] Improve plugin bucket parameter requirements
- [SpannerChangeStreamsTemplates] Simplify the code of setting experiments for spanner change streams to BigQuery and spanner change streams to GCS templates.
- [Integration Tests] Create MongoDB Resource Manager
- [Integration Tests] Create MongoDBToBigQueryIntegrationTest
- [Integration Tests] Add TestContainers framework
- [Integration Tests] Create PubsubAvroToBigQueryIT + prepare profile to run integration tests together
- [MongoDBToBigQuery] Create udf for MongoDB BigQuery Templates
- [Security] Update hadoop version affected by CVE-2022-25168
- Improve Templates Plugin instructions
- [Syndeo templates] Separating JSON build
Bug Fixes
- [Classic templates] WindowedFilenamePolicy's dayPattern defaults to dd instead of DD
- [JDBC templates] Do not log unencrypted values/keys to the console
- [DataStream templates] Rethrow exception from ExtractGcsFile so that Dataflow will retry the pardo
- [Integration Tests] Fix integration tests parameters passing
- [Flex templates] Fix log dependencies (log4j initialization error)
Contributors
@bvolpato
@oleg-semenov
@pabloem
@Polber
@pranavbhandari24
@theshanbhag
Dataflow Templates 2022-11-16-00_RC00
Release Week of 2022-11-16
New Templates
Text-to-BigQuery Flex template with support for BQ Storage Write API
Improvements
- [All templates] Upgrade Beam version to 2.42.
- [Batch Flex template with BQ Sinks] BQ Storage Write API options for Batch Flex Templates
- [Syndeo] Add Jib to syndeo-template module and bind it to the package phase for the Syndeo template
- [Syndeo] Handle schemas that are not provided by configuration
- [Flex templates] Enable structured logging for v2 templates
- [All templates] Enforce Conscrypt version 2.5.2 (matching Beam 2.42.0)
- [All templates] Update commons-text
- [BigTable templates] update bigtable-beam-import version
- [Spanner templates] Supported new value capture types (NEW_VALUES and NEW_ROW).
Bug Fixes
- pom file name fix in .github/workflows/prepare-java-cache.yml
- [Unit tests] Reducing verbosity of unit test logs