Skip to content

[New connector] Onelake connector #3057

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

Delacrobix
Copy link

TITLE: [New connector] Onelake connector

Closes #3051

Added Onelake connector files, the connector’s code, test, requirements and the reference to the connector in the sources list (connectors/config.py)

Checklists

Pre-Review Checklist

  • this PR does NOT contain credentials of any kind, such as API keys or username/passwords (double check config.yml.example)
  • this PR has a meaningful title
  • this PR links to all relevant github issues that it fixes or partially addresses
  • if there is no GH issue, please create it. Each PR should have a link to an issue
  • this PR has a thorough description
  • Covered the changes with automated tests
  • Tested the changes locally
  • Added a label for each target release version (example: v7.13.2, v7.14.0, v8.0.0)
  • Considered corresponding documentation changes
  • Contributed any configuration settings changes to the configuration reference
  • if you added or changed Rich Configurable Fields for a Native Connector, you made a corresponding PR in Kibana

Copy link

cla-checker-service bot commented Dec 24, 2024

💚 CLA has been signed


for path in doc_paths:
file_name = path.name.split("/")[-1]
field_client = await self._get_file_client(file_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, for each file multiple clients are created. Is there a way to reuse the clients between calls? I can see how this can become a problem with RAM

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Artem, thank you so much for your feedback.

Regarding this comment: each file client represents a specific file and is initialized with the file name, so it’s not possible to reuse the client. I believe the garbage collector should remove unused clients, but that’s just an assumption.

Comment on lines 92 to 99
def _get_account_url(self):
"""Get the account URL for OneLake

Returns:
str: Account URL
"""

return f"https://{self.configuration['account_name']}.dfs.fabric.microsoft.com"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just become an field of the class that's set during init

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Delacrobix
Copy link
Author

Hello!

I made changes regarding the asynchrony. I converted the synchronous methods to asynchronous using asyncio, based on the implementation of the Google Drive connector.

https://github.com/Delacrobix/connectors/blob/78627c81e17ed666bdf1d30a9fd8cc24740893df/connectors/sources/onelake.py#L244-L247

https://github.com/Delacrobix/connectors/blob/78627c81e17ed666bdf1d30a9fd8cc24740893df/connectors/sources/onelake.py#L364

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Onelake connector
2 participants