You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
the issue with a local PostgreSQL install?: Can reproduce locally
Python version: 3.11, but should not matter much
Platform: macOS, Linux
Do you use pgbouncer?: No
Can the issue be reproduced under both asyncio and uvloop?: Yes
I think the same or similar issue was already raised before, but it was closed so I decided to create a new one and provide a bit more details and reproducers.
I opened a discussion about this almost a year ago which has some details as well, but I never followed up. I would really love to get some attention to this now as we'll go through the same database upgrade steps again soon and I would like to avoid patching the library locally.
In a nutshell, the issue is that the existing timeout does not work well in certain conditions with multi-host setup: when the first host can be resolved, as in given it's DNS name, one can get IP address, but the host is nevertheless unreachable and connection cannot be established. In this case the whole timeout might be used up waiting for a connection to the first host and there would never be an attempt to connect to the second host.
In the linked ticket I described one case when this actually works by configuring a fairly large timeout of 80 seconds. In this case the first connection attempt times out after 75 seconds and there's still 5 more seconds to attempt to connect to the second host. After some googling I found that these 75 seconds seem to be TCP SYN wait-timer default.
I have made a reproducer that tests both cases and does not require actual database:
importasyncioimporttimeimportasyncpgfromasyncpgimportConnectionasyncdefconnect(timeout: int) ->Connection:
returnawaitasyncpg.connect(
"postgresql://",
user="postgres",
password="",
host=[
# Non-routable IP address to simulate connect timeout"10.255.255.1",
# It does not really matter if we have working database# for the same of the test as long as we see that there was an attempt to connect"unknown-host-name",
],
port=[5432, 45432],
database="postgres",
server_settings={
"search_path": "public",
"statement_timeout": "60000",
},
timeout=timeout,
)
asyncdefmain(timeout: int):
print("Connecting with timeout", timeout)
start=time.time()
try:
awaitconnect(timeout)
exceptTimeoutError:
end=time.time()
print("Issue reproduced: connection timeout")
print(f"Elapsed time: {end-start}\n\n")
exceptOSError:
end=time.time()
print("No issue: attempted to connect to the second host as expected")
print(f"Elapsed time: {end-start}\n\n")
asyncio.run(main(timeout=5))
asyncio.run(main(timeout=80))
I have a similar test that uses psycopg2 (though it uses SQLAlchemy, but I can rewrite it with pure psycopg2 if needed) that demonstrates how this case is handled there:
fromtimeimporttimefromsqlalchemyimportcreate_engine, textfromsqlalchemy.ormimportsessionmakerdefcreate_session(timeout: int):
engine=create_engine(
f"postgresql://", # Only specify which driver we want to useconnect_args={
"user": "postgres",
"dbname": "postgres",
"host": "10.255.255.1,10.255.255.1", # We specify multiple hosts here, both unreachable"port": "5432,5432", # We specify multiple ports here: a port for each host above"connect_timeout": timeout,
},
)
returnsessionmaker(autoflush=False, bind=engine, expire_on_commit=False)()
defmain(timeout: int):
print("Connecting with timeout", timeout)
start=time()
try:
withcreate_session(2) assession:
# This will print the Docker container ID, because it's used as the hostname inside Docker networkprint(
"Database host:",
session.execute(text("select pg_read_file('/etc/hostname') as hostname;")).one()[0].strip(),
)
end=time()
print("Elapsed time", end-start)
exceptExceptionase:
end=time()
print("Failed to connect")
print("Elapsed time", end-start)
main(2)
This case is slightly different, because I have specified both host as unreachable for the same of demonstrating that the overall time it takes to attempt a connect is twice the timeout specified. This means that psycopg2 applies the same timeout when connecting to each host, not an overall timeout.
Whether this is desirable or not is hard to say, I can imagine having an overall connection timeout might be a good thing as well, so maybe we could have two timeouts.
I'm happy to help with the PR to resolve this once it's clear how we want to have this fixed.
The text was updated successfully, but these errors were encountered:
I have realised that there are now two timeouts - somehow I missed that and noticed just now when I started looking into that again.
The new timeout now seems to work on the whole connection attempt, while new command_timeout is set for individual host, but it's still confusing as docs mentions that it is a timeout for an operation and I take it means for any SQL statement as well? How does it relate to statement_timeout that can be set for a session?
I feel like I would still like to be able to distinguish between actual connect timeout per host and operation timeout.
the issue with a local PostgreSQL install?: Can reproduce locally
uvloop?: Yes
I think the same or similar issue was already raised before, but it was closed so I decided to create a new one and provide a bit more details and reproducers.
I opened a discussion about this almost a year ago which has some details as well, but I never followed up. I would really love to get some attention to this now as we'll go through the same database upgrade steps again soon and I would like to avoid patching the library locally.
In a nutshell, the issue is that the existing timeout does not work well in certain conditions with multi-host setup: when the first host can be resolved, as in given it's DNS name, one can get IP address, but the host is nevertheless unreachable and connection cannot be established. In this case the whole timeout might be used up waiting for a connection to the first host and there would never be an attempt to connect to the second host.
In the linked ticket I described one case when this actually works by configuring a fairly large timeout of 80 seconds. In this case the first connection attempt times out after 75 seconds and there's still 5 more seconds to attempt to connect to the second host. After some googling I found that these 75 seconds seem to be TCP SYN wait-timer default.
I have made a reproducer that tests both cases and does not require actual database:
I have a similar test that uses
psycopg2
(though it uses SQLAlchemy, but I can rewrite it with purepsycopg2
if needed) that demonstrates how this case is handled there:This case is slightly different, because I have specified both host as unreachable for the same of demonstrating that the overall time it takes to attempt a connect is twice the timeout specified. This means that
psycopg2
applies the same timeout when connecting to each host, not an overall timeout.Whether this is desirable or not is hard to say, I can imagine having an overall connection timeout might be a good thing as well, so maybe we could have two timeouts.
I'm happy to help with the PR to resolve this once it's clear how we want to have this fixed.
The text was updated successfully, but these errors were encountered: