Skip to content

Exclude all rows or limit n rows per stream #13

@aaronsteers

Description

@aaronsteers

Sometimes it's useful to run an EL pipeline with zero rows in order to create the target tables but not yet load data into them.

In the SDK for taps, we recently added --test=schema to emit only the SCHEMA messages without emitting any RECORD messages.

For non-SDK taps, it would be helpful to have a similar option that excludes all rows or all rows past the first n records.

Note:

  • For safety, we should probably also not pass along any STATE messages when running in this mode. Since we'd be dropping records intentionally, we would not want to pass a bookmark that implied records had been written which were not actually passed.
  • To the extent that the tap is still having to process records, there's still a performance hit from having to read all the records from source. However, most pipelines are constrained on target performance, so by skipping the write process, there should still be significant gains for many/most use cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions