Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream_bath - pollWindow size in case of restart client application #8

Open
AdamStawarz opened this issue Dec 8, 2021 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@AdamStawarz
Copy link

Issue description

We are trying to implement storing progress in our cdc-consumer-service using the inbuilt progress manager(provided by scylla-cdc-go). But after the cdc-consumer-service and restarting it, we are unable to receive any more cdc updates.
However, according to the logs the progress is loaded for the last_timestamp(check screenshot attached).

On further checking the library code, we noticed that this could be happening because inside the getPollWindow() function in stream_batch.go, the queryWindowRightEnd and confidenceWindowStart are very far apart. On printing the queryWindowRightEnd, we found this value to be equal to the current_generation value(2021-08-04 05:42:32.878000+0000 - progress table screenshot attached) and confidenceWindowStart= time.Now().
Because of this huge size window we are not receiving cdc updates.

On changing this window value to a small one(say 1second - by hard coding), it is working fine.
So I believe the value of queryWindowRightEnd in getPollWindow() function is not getting set to the appropriate value. Are we missing something here?###hat this could be happening because inside the getPollWindow() function in stream_batch.go, the queryWindowRightEnd and confidenceWindowStart are very far apart. On printing the queryWindowRightEnd, we found this value to be equal to the current_generation value(2021-08-04 05:42:32.878000+0000 - progress table screenshot attached) and confidenceWindowStart= time.Now().
Because of this huge size window we are not receiving cdc updates.
On changing this window value to a small one(say 1second - by hard coding), it is working fine.

So I believe the value of queryWindowRightEnd in getPollWindow() function is not getting set to the appropriate value. Are we missing something here?

@fee-mendes
Copy link
Member

We have seen a similar behavior reported before https://scylladb-users.slack.com/archives/C2NLNBXLN/p1632379210266400

In short, after we restart a consumer with progresstable, unprocessed stream_ids will start polling from the generation's start and, with very small QueryTimeWindowSize values, this may cause it to take a VERY LONG TIME to catch up, specially with a very old stream generation.

A way to "workaround" it is to increase QueryTimeWindowSize to something like 24h, but it may not be an ideal approach for everybody.

I think the issue/report should be a programmatic way to allow specifying the start Poll time (at user own's risk), instead of polling from the starting generation for unprocessed stream_ids. This will help development/non-write-heavy workloads to catch up faster when polling unprocessed stream IDs.

FYI @piodul @haaawk

@haaawk
Copy link

haaawk commented Dec 8, 2021

FYI @avelanarius.

@piodul and @avelanarius let's have a meeting to find a common solution for both Golang and Java

@hartmut-co-uk
Copy link
Contributor

Please note I've addressed the issue with persisting CDC log reader state + restart behaviour in #10.

@hartmut-co-uk
Copy link
Contributor

And I also would be interested to help with Java lib + scylla-cdc-source-connector where I'm suspecting similar issues (but even harder to debug/analyse...).

@piodul
Copy link
Collaborator

piodul commented Oct 20, 2022

This should be fixed by #13. Now, when the library starts reading for the first time (without any saved progress), the default progress manager saves the timestamp it originally started reading from and, when restarting, makes sure not to read older changes than that when restarting.

Users can also implement the Empty() method in their consumers so that saving progress during the time when there are no rows on a stream is possible.

@dkropachev
Copy link
Collaborator

@AdamStawarz , could you please update if your issue was resolved by #13

@dkropachev dkropachev added the bug Something isn't working label Aug 17, 2024
@dkropachev dkropachev self-assigned this Aug 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants