Open
Description
The tap doesn't respect existing replication state by filter out data older than the replication key value in the state.
How to reproduce
Github tap configuration
- name: tap-github-repos
inherit_from: tap-github
pip_url: git+https://github.com/MeltanoLabs/tap-github.git
config:
user_agent: ''
start_date: '2023-01-01T00:00:00Z'
searches:
- name: All repos
query: apache/*
variant: meltanolabs
select:
- repositories.*
metadata:
repositories:
replication-method: INCREMENTAL
Run a sync that produces 1000 (limit for the 'repositories' stream) records and a state record.
meltano run tap-github-repos target-jsonl
Run the same sync one more time
meltano run tap-github-repos target-jsonl
Result is there are 2000 records in the target json file and each record is fully duplicated.
The issue can be reproduced on the repositories
stream.
I couldn't reproduce this on the issues
stream.
I haven't tested other streams.
If Github APIs do not allow fetching data from a specific replication point (at least for the repositories
stream) then the tap should filter those records instead of sending them down the pipeline.
Metadata
Metadata
Assignees
Labels
No labels