Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSTORE-862][APPEND] Don't provide the current offsets by default when starting job #1116

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

bubriks
Copy link
Contributor

@bubriks bubriks commented Sep 13, 2023

This PR adds/fixes/changes...

  • please summarize your changes to the code
  • and make sure to include all changes to user-facing APIs

JIRA Issue: -

Priority for Review: -

Related PRs: -

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Tests on VM

Checklist For The Assigned Reviewer:

- [ ] Checked if merge conflicts with master exist
- [ ] Checked if stylechecks for Java and Python pass
- [ ] Checked if all docstrings were added and/or updated appropriately
- [ ] Ran spellcheck on docstring
- [ ] Checked if guides & concepts need to be updated
- [ ] Checked if naming conventions for parameters and variables were followed
- [ ] Checked if private methods are properly declared and used
- [ ] Checked if hard-to-understand areas of code are commented
- [ ] Checked if tests are effective
- [ ] Built and deployed changes on dev VM and tested manually
- [x] (Checked if all type annotations were added and/or updated appropriately)

@bubriks bubriks requested a review from SirOibaf September 13, 2023 12:46
@SirOibaf
Copy link
Contributor

Are you sure this PR is correct? Looks to me that we always end up in the first branch of the if statement as the initial_check_point is always not empty string as we set it here:
https://github.com/logicalclocks/feature-store-api/blame/25cfcd57ad792a3b6a732570943692c49b406fbc/python/hsfs/engine/python.py#L956

It doens't seem there is a way of controlling the skip_offset parameter in the first branch and the offsets are always skipped.

Additionally the skip_offset parameter is not documented anywhere in the APIs. Please add the proper documentation in the insert method.

* rename skip_offsets -> use_current_offsets
* add documentation
@bubriks
Copy link
Contributor Author

bubriks commented Sep 18, 2023

I think everything should be correct.

initial_check_point will be empty if topic doesn't exists (for example after upgrade).

In the first if statement (here: https://github.com/logicalclocks/feature-store-api/blame/25cfcd57ad792a3b6a732570943692c49b406fbc/python/hsfs/engine/python.py#L1016) we always run materialization job setting the initial offset to 0 for all partitions of topic since the topic didn't exist and job should start from the beginning (done here: https://github.com/logicalclocks/feature-store-api/blame/25cfcd57ad792a3b6a732570943692c49b406fbc/python/hsfs/engine/python.py#L1028)

@bubriks
Copy link
Contributor Author

bubriks commented Sep 18, 2023

@SirOibaf I also change the skip_offset parameter name to use_current_offsets as i think its more descriptive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants