Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout configuration for multi-host config #1155

Open
inikolaev opened this issue May 18, 2024 · 1 comment
Open

Timeout configuration for multi-host config #1155

inikolaev opened this issue May 18, 2024 · 1 comment

Comments

@inikolaev
Copy link

  • asyncpg version: 0.29.0
  • PostgreSQL version: ANY
  • Do you use a PostgreSQL SaaS? If so, which? Can you reproduce
    the issue with a local PostgreSQL install?
    : Can reproduce locally
  • Python version: 3.11, but should not matter much
  • Platform: macOS, Linux
  • Do you use pgbouncer?: No
  • Can the issue be reproduced under both asyncio and
    uvloop?
    : Yes

I think the same or similar issue was already raised before, but it was closed so I decided to create a new one and provide a bit more details and reproducers.

I opened a discussion about this almost a year ago which has some details as well, but I never followed up. I would really love to get some attention to this now as we'll go through the same database upgrade steps again soon and I would like to avoid patching the library locally.

In a nutshell, the issue is that the existing timeout does not work well in certain conditions with multi-host setup: when the first host can be resolved, as in given it's DNS name, one can get IP address, but the host is nevertheless unreachable and connection cannot be established. In this case the whole timeout might be used up waiting for a connection to the first host and there would never be an attempt to connect to the second host.

In the linked ticket I described one case when this actually works by configuring a fairly large timeout of 80 seconds. In this case the first connection attempt times out after 75 seconds and there's still 5 more seconds to attempt to connect to the second host. After some googling I found that these 75 seconds seem to be TCP SYN wait-timer default.

I have made a reproducer that tests both cases and does not require actual database:

import asyncio
import time

import asyncpg
from asyncpg import Connection


async def connect(timeout: int) -> Connection:
    return await asyncpg.connect(
        "postgresql://",
        user="postgres",
        password="",
        host=[
            # Non-routable IP address to simulate connect timeout
            "10.255.255.1",
            # It does not really matter if we have working database
            # for the same of the test as long as we see that there was an attempt to connect
            "unknown-host-name",
        ],
        port=[5432, 45432],
        database="postgres",
        server_settings={
            "search_path": "public",
            "statement_timeout": "60000",
        },
        timeout=timeout,
    )


async def main(timeout: int):
    print("Connecting with timeout", timeout)
    start = time.time()
    try:
        await connect(timeout)
    except TimeoutError:
        end = time.time()
        print("Issue reproduced: connection timeout")
        print(f"Elapsed time: {end - start}\n\n")
    except OSError:
        end = time.time()
        print("No issue: attempted to connect to the second host as expected")
        print(f"Elapsed time: {end - start}\n\n")


asyncio.run(main(timeout=5))
asyncio.run(main(timeout=80))

I have a similar test that uses psycopg2 (though it uses SQLAlchemy, but I can rewrite it with pure psycopg2 if needed) that demonstrates how this case is handled there:

from time import time

from sqlalchemy import create_engine, text
from sqlalchemy.orm import sessionmaker


def create_session(timeout: int):
    engine = create_engine(
        f"postgresql://",  # Only specify which driver we want to use
        connect_args={
            "user": "postgres",
            "dbname": "postgres",
            "host": "10.255.255.1,10.255.255.1",  # We specify multiple hosts here, both unreachable
            "port": "5432,5432",  # We specify multiple ports here: a port for each host above
            "connect_timeout": timeout,
        },
    )

    return sessionmaker(autoflush=False, bind=engine, expire_on_commit=False)()


def main(timeout: int):
    print("Connecting with timeout", timeout)
    start = time()
    try:
        with create_session(2) as session:
            # This will print the Docker container ID, because it's used as the hostname inside Docker network
            print(
                "Database host:",
                session.execute(text("select pg_read_file('/etc/hostname') as hostname;")).one()[0].strip(),
            )
        end = time()
        print("Elapsed time", end - start)
    except Exception as e:
        end = time()
        print("Failed to connect")
        print("Elapsed time", end - start)


main(2)

This case is slightly different, because I have specified both host as unreachable for the same of demonstrating that the overall time it takes to attempt a connect is twice the timeout specified. This means that psycopg2 applies the same timeout when connecting to each host, not an overall timeout.

Whether this is desirable or not is hard to say, I can imagine having an overall connection timeout might be a good thing as well, so maybe we could have two timeouts.

I'm happy to help with the PR to resolve this once it's clear how we want to have this fixed.

@inikolaev
Copy link
Author

I have realised that there are now two timeouts - somehow I missed that and noticed just now when I started looking into that again.

The new timeout now seems to work on the whole connection attempt, while new command_timeout is set for individual host, but it's still confusing as docs mentions that it is a timeout for an operation and I take it means for any SQL statement as well? How does it relate to statement_timeout that can be set for a session?

I feel like I would still like to be able to distinguish between actual connect timeout per host and operation timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant