Skip to content

Incremental replication doesn't respect the current state #196

Open
@emishas

Description

@emishas

The tap doesn't respect existing replication state by filter out data older than the replication key value in the state.

How to reproduce

Github tap configuration

  - name: tap-github-repos
    inherit_from: tap-github
    pip_url: git+https://github.com/MeltanoLabs/tap-github.git
    config:
      user_agent: ''
      start_date: '2023-01-01T00:00:00Z'
      searches:
      - name: All repos
        query: apache/*
    variant: meltanolabs
    select:
    - repositories.*
    metadata:
      repositories:
        replication-method: INCREMENTAL

Run a sync that produces 1000 (limit for the 'repositories' stream) records and a state record.

meltano run tap-github-repos target-jsonl

Run the same sync one more time

meltano run tap-github-repos target-jsonl

Result is there are 2000 records in the target json file and each record is fully duplicated.

The issue can be reproduced on the repositories stream.
I couldn't reproduce this on the issues stream.
I haven't tested other streams.

If Github APIs do not allow fetching data from a specific replication point (at least for the repositories stream) then the tap should filter those records instead of sending them down the pipeline.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions