Skip to content

[FSTORE-1382] fix for async concurrency issues for sql client #1343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jun 19, 2024

Conversation

dhananjay-mk
Copy link
Contributor

This PR adds/fixes/changes...

  • please summarize your changes to the code
  • and make sure to include all changes to user-facing APIs

JIRA Issue: -

Priority for Review: -

Related PRs: -

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Tests on VM

Checklist For The Assigned Reviewer:

- [ ] Checked if merge conflicts with master exist
- [ ] Checked if stylechecks for Java and Python pass
- [ ] Checked if all docstrings were added and/or updated appropriately
- [ ] Ran spellcheck on docstring
- [ ] Checked if guides & concepts need to be updated
- [ ] Checked if naming conventions for parameters and variables were followed
- [ ] Checked if private methods are properly declared and used
- [ ] Checked if hard-to-understand areas of code are commented
- [ ] Checked if tests are effective
- [ ] Built and deployed changes on dev VM and tested manually
- [x] (Checked if all type annotations were added and/or updated appropriately)

@dhananjay-mk dhananjay-mk requested review from kennethmhc and vatj June 7, 2024 09:45
@@ -59,6 +57,3 @@ def get_sdk_info():


__all__ = ["connection", "disable_usage_logging", "get_sdk_info"]
# running async code in jupyter throws "RuntimeError: This event loop is already running"
# with tornado 6. This fixes the issue without downgrade to tornado==4.5.3
nest_asyncio.apply()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that we want to stop providing support for older version of tornado? Or is it enough to move it to the sql engine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After digging deeper, I realised some of the problems were caused by nest_asyncio as it makes the actual async errors difficult to debug. So I tried to remove its dependency, but its still needed in case of jupyter, otherwise the user has to explicitly call nest_asyncio.apply() in each notebook. It is not needed for python runtimes. So it is needed for below cases:

  1. jupyter -> added check in init_async methods
  2. locust -> added into the locustfile

@@ -205,18 +209,30 @@ def _parametrize_prepared_statements(
return prepared_statements_dict

def init_async_mysql_connection(self, options=None):
def is_runtime_notebook():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a local function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to the utils class

from typing import Any, Dict, List, Optional, Set, Tuple, Union

import nest_asyncio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this an optional import by importing only if is_runtime_notebook?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

)
self._async_pool = await util.create_async_engine(
online_connector, self._external, default_min_size, options=options
async def _get_connection_pool(self, default_min_size: int, loop=None) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loop is not used by any caller

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loop argument is not strictly required as the engine calls its asyncio built-in method, but good to keep it as optional

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand why we would not always pass loop?

loop = asyncio.get_running_loop()
except RuntimeError as er:
raise RuntimeError(
"Event loop is not running. Please provide an event loop to create the engine."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can user provide the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it gets the loop by the asyncio.get_running_loop so there should be running loop first to call this co-routine. I modified the error message a bit to make it clear. Loop argument is not strictly required but good to keep it as optional

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no loop argument in init_serving or get_feature_vector

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed loop argument to avoid confusion @kennethmhc

Copy link
Contributor

@vatj vatj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to understand the optional loop use case. It feels like always having a per serving FV loop is the way to go and keeping it optional is error-prone/not clear to me

@@ -698,3 +726,11 @@ def feature_view_api(self) -> feature_view_api.FeatureViewApi:
@property
def storage_connector_api(self) -> storage_connector_api.StorageConnectorApi:
return self._storage_connector_api

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about online_connector and connection_pool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

)
self._async_pool = await util.create_async_engine(
online_connector, self._external, default_min_size, options=options
async def _get_connection_pool(self, default_min_size: int, loop=None) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand why we would not always pass loop?

self._hostname = util.get_host_name() if self._external else None

if util.is_runtime_notebook():
_logger.debug("Running in Jupyter notebook, applying nest_asyncio")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a warning rather than debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think debug is fine as it is not really something user needs to do

@dhananjay-mk dhananjay-mk marked this pull request as draft June 11, 2024 15:21
@dhananjay-mk dhananjay-mk marked this pull request as ready for review June 12, 2024 08:16
@dhananjay-mk dhananjay-mk merged commit 02291fb into logicalclocks:master Jun 19, 2024
12 checks passed
dhananjay-mk added a commit to dhananjay-mk/feature-store-api that referenced this pull request Jul 12, 2024
…lclocks#1343)

* init-working

* add lock changes

* move lock to execute method

* update lock

* pass pool as arg

* loop changes and refactoring

* make new connection pool for each call

* make exec_prep async and add lock

* use context manager and refactor init connection

* use create task

* add semaphore

* remove nest_async and use manual loop

* make nest asyncio only if jupyter

* refactoring and move hostname retrieving to init serving

* minor clean

* minor cleanup

* add unit test

* remove return of pool

* revert locust changes and refactoring

* revert locust changes and refactoring

* fix review comments

* fix review comments

* remove loop method argument

---------

Co-authored-by: Victor Jouffrey <[email protected]>
vatj added a commit that referenced this pull request Aug 7, 2024
…#1372)

* init-working

* add lock changes

* move lock to execute method

* update lock

* pass pool as arg

* loop changes and refactoring

* make new connection pool for each call

* make exec_prep async and add lock

* use context manager and refactor init connection

* use create task

* add semaphore

* remove nest_async and use manual loop

* make nest asyncio only if jupyter

* refactoring and move hostname retrieving to init serving

* minor clean

* minor cleanup

* add unit test

* remove return of pool

* revert locust changes and refactoring

* revert locust changes and refactoring

* fix review comments

* fix review comments

* remove loop method argument

---------

Co-authored-by: Victor Jouffrey <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants