-
Notifications
You must be signed in to change notification settings - Fork 17
Description
I am currently fixing the incremental replications. I think the current state of this part of the code is quite broken.
https://github.com/singer-io/tap-chargify/blob/master/tap_chargify/streams.py#L104-L111
Some streams are currently not ordered (ex Subscriptions) and the bookmark is updated for each record. Then before yielding a record we check to see if it is newer than the bookmark. As the stream is not ordered, it could be that the record is older than the bookmark, whereas it should legitimately write this record.
To make it clear let's take an example.
Say that the state for the Subscription stream is "updated_at": "2021-04-01T02:06:58.147Z"
The first query on Chargify API will be
https://subdomain.chargify.com/subscriptions.json?page=1&per_page=200&start_datetime=2021-04-01T02%3A06%3A58.147Z&date_field=updated_at&direction=asc
it will return unsorted records such as (all fields removed but id and updated_at)
{"id": 38025480, "updated_at": "2021-04-01T12:21:37.000000Z"}
{"id": 11255022, "updated_at": "2021-04-01T12:21:37.000000Z"}
{"id": 30812776, "updated_at": "2021-04-01T13:04:02.000000Z"}
{"id": 37986658, "updated_at": "2021-04-01T12:21:37.000000Z"}
{"id": 23329748, "updated_at": "2021-04-01T13:04:03.000000Z"}
{"id": 37925200, "updated_at": "2021-04-01T13:06:28.000000Z"}
{"id": 39472331, "updated_at": "2021-04-01T13:06:28.000000Z"}
{"id": 37925203, "updated_at": "2021-04-01T13:06:29.000000Z"}
{"id": 32061301, "updated_at": "2021-04-01T13:06:28.000000Z"}The first step is of course to sort the stream, which is quite easy thanks to Chargify API.
However, it is not sufficient. As you can notice there are several records with the same updated_at value (the example above is from real data). With the current code, subsequent records with a value lower than the bookmark are discarded. As we are filtering the API with the value of the bookmark I am confident with the fact that all the records returned by the API should be written to be consumed by the target.
I have a PR that fixed the issue for Subscriptions. I will open a PR that fixes all the streams to make them incremental where it is possible.
On a side note, I see that Chargify is now in beta in Stitch. If this is this with this version of the code, I think this is really not ready for production, not even beta.