Skip to content

Incremental replications #35

@cirotix

Description

@cirotix

I am currently fixing the incremental replications. I think the current state of this part of the code is quite broken.

https://github.com/singer-io/tap-chargify/blob/master/tap_chargify/streams.py#L104-L111

Some streams are currently not ordered (ex Subscriptions) and the bookmark is updated for each record. Then before yielding a record we check to see if it is newer than the bookmark. As the stream is not ordered, it could be that the record is older than the bookmark, whereas it should legitimately write this record.

To make it clear let's take an example.

Say that the state for the Subscription stream is "updated_at": "2021-04-01T02:06:58.147Z"
The first query on Chargify API will be

https://subdomain.chargify.com/subscriptions.json?page=1&per_page=200&start_datetime=2021-04-01T02%3A06%3A58.147Z&date_field=updated_at&direction=asc

it will return unsorted records such as (all fields removed but id and updated_at)

{"id": 38025480, "updated_at": "2021-04-01T12:21:37.000000Z"}
{"id": 11255022, "updated_at": "2021-04-01T12:21:37.000000Z"}
{"id": 30812776, "updated_at": "2021-04-01T13:04:02.000000Z"}
{"id": 37986658, "updated_at": "2021-04-01T12:21:37.000000Z"}
{"id": 23329748, "updated_at": "2021-04-01T13:04:03.000000Z"}
{"id": 37925200, "updated_at": "2021-04-01T13:06:28.000000Z"}
{"id": 39472331, "updated_at": "2021-04-01T13:06:28.000000Z"}
{"id": 37925203, "updated_at": "2021-04-01T13:06:29.000000Z"}                                                        
{"id": 32061301, "updated_at": "2021-04-01T13:06:28.000000Z"}

The first step is of course to sort the stream, which is quite easy thanks to Chargify API.
However, it is not sufficient. As you can notice there are several records with the same updated_at value (the example above is from real data). With the current code, subsequent records with a value lower than the bookmark are discarded. As we are filtering the API with the value of the bookmark I am confident with the fact that all the records returned by the API should be written to be consumed by the target.
I have a PR that fixed the issue for Subscriptions. I will open a PR that fixes all the streams to make them incremental where it is possible.

On a side note, I see that Chargify is now in beta in Stitch. If this is this with this version of the code, I think this is really not ready for production, not even beta.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions