-
Notifications
You must be signed in to change notification settings - Fork 977
Add pyhive connection timeout #7300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds an optional connection_timeout parameter to PyHive's Hive connection class to prevent connections from hanging indefinitely when issues occur. The timeout is specified in milliseconds and applies to both HTTP and TSocket transport types.
Changes:
- Added
connection_timeoutparameter to theConnection.__init__method signature and documentation - Implemented timeout setting for both HTTP (THttpClient) and TSocket transport paths
- Added integration test to verify connections work with the timeout parameter
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| python/pyhive/hive.py | Added connection_timeout parameter to Connection class, implemented setTimeout calls for both HTTP and socket transports, and added parameter documentation |
| python/pyhive/tests/test_hive.py | Added integration test to verify connection works with timeout parameter |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Incompatible with host, port, auth, kerberos_service_name, and password. | ||
| :param ssl_context: A custom SSL context to use for HTTPS connections. If provided, | ||
| this overrides check_hostname and ssl_cert parameters. | ||
| :param connection_timeout: Millisecond timeout for Thrift connections. |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation should clarify that connection_timeout is ignored when using a custom thrift_transport. Users need to configure the timeout on their custom transport object directly. Consider adding a note to the docstring like: "Note: connection_timeout is ignored when thrift_transport is provided. Configure timeout on the custom transport object instead."
| :param connection_timeout: Millisecond timeout for Thrift connections. | |
| :param connection_timeout: Millisecond timeout for Thrift connections when using the | |
| built-in HTTP/HTTPS transport. | |
| Note: connection_timeout is ignored when thrift_transport is provided. Configure | |
| timeout on the custom transport object instead. |
| def test_connection_timeout(self): | ||
| """Test that a connection timeout is set without error.""" | ||
| with contextlib.closing(hive.connect( | ||
| host=_HOST, | ||
| port=10000, | ||
| connection_timeout=10 * 1000 | ||
| )) as connection: | ||
| with contextlib.closing(connection.cursor()) as cursor: | ||
| # Use the same query pattern as other tests | ||
| cursor.execute('SELECT 1 FROM one_row') | ||
| self.assertEqual(cursor.fetchall(), [(1,)]) |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test only verifies that a connection can be established with the timeout parameter, but doesn't actually test timeout behavior. Consider adding a test that verifies the timeout is properly set on the underlying socket/transport. This could be done by checking the socket's timeout value after connection, or by testing that a connection attempt to an unreachable host times out as expected.
| thrift_transport=None, | ||
| ssl_context=None | ||
| ssl_context=None, | ||
| connection_timeout=None, |
Copilot
AI
Jan 11, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The connection_timeout parameter should be validated to ensure it's a positive value. Negative or zero timeout values could cause unexpected behavior. Consider adding validation like:
if connection_timeout is not None and connection_timeout <= 0:
raise ValueError("connection_timeout must be a positive value")
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #7300 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 698 698
Lines 43646 43646
Branches 5894 5894
======================================
Misses 43646 43646 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Why are the changes needed?
Adds an optional parameter to pyhive hive connection class init to allow setting the socket connection timeout. By default, thrift socket connections have no timeout which causes connections to hang indefinitely when something goes wrong. This helps us in dbt by allowing for multi-threaded management of connections while continuing to rely on pyhive thrift setup.
Reference thrift lines:
https://github.com/apache/thrift/blob/0d18fb2e97a00fc56997fa059d6819e54cdff64b/lib/py/src/transport/THttpClient.py#L130
https://github.com/apache/thrift/blob/0d18fb2e97a00fc56997fa059d6819e54cdff64b/lib/py/src/transport/TSocket.py#L111
How was this patch tested?
Manually on TSocket path and added unit test
Was this patch authored or co-authored using generative AI tooling?
No