Incremental replications

I am currently fixing the incremental replications. I think the current state of this part of the code is quite broken.

https://github.com/singer-io/tap-chargify/blob/master/tap_chargify/streams.py#L104-L111


Some streams are currently not ordered (ex Subscriptions) and the bookmark is updated for each record. Then before yielding a record we check to see if it is newer than the bookmark. As the stream is not ordered, it could be that the record is older than the bookmark, whereas it should legitimately write this record.

To make it clear let's take an example.

Say that the state for the Subscription stream is `"updated_at":  "2021-04-01T02:06:58.147Z"`
The first query on Chargify API will be 
```
https://subdomain.chargify.com/subscriptions.json?page=1&per_page=200&start_datetime=2021-04-01T02%3A06%3A58.147Z&date_field=updated_at&direction=asc
```
it will return *unsorted* records such as (all fields removed but id and updated_at)

```json
{"id": 38025480, "updated_at": "2021-04-01T12:21:37.000000Z"}
{"id": 11255022, "updated_at": "2021-04-01T12:21:37.000000Z"}
{"id": 30812776, "updated_at": "2021-04-01T13:04:02.000000Z"}
{"id": 37986658, "updated_at": "2021-04-01T12:21:37.000000Z"}
{"id": 23329748, "updated_at": "2021-04-01T13:04:03.000000Z"}
{"id": 37925200, "updated_at": "2021-04-01T13:06:28.000000Z"}
{"id": 39472331, "updated_at": "2021-04-01T13:06:28.000000Z"}
{"id": 37925203, "updated_at": "2021-04-01T13:06:29.000000Z"}                                                        
{"id": 32061301, "updated_at": "2021-04-01T13:06:28.000000Z"}
```

The first step is of course to sort the stream, which is quite easy thanks to Chargify API.
However, it is not sufficient. As you can notice there are several records with the same `updated_at` value (the example above is from real data). With the current code, subsequent records with a value lower than the bookmark are discarded. As we are filtering the API with the value of the bookmark I am confident with the fact that all the records returned by the API should be written to be consumed by the target. 
I have a PR that fixed the issue for Subscriptions. I will open a PR that fixes all the streams to make them incremental where it is possible.

On a side note, I see that Chargify is now in beta in Stitch. If this is this with this version of the code, I think this is really not ready for production, not even beta.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incremental replications #35

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incremental replications #35

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions