Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky SO_REUSEPORT/SO_REUSEADDR client socket #126539

Closed
Ousret opened this issue Nov 7, 2024 · 6 comments
Closed

Flaky SO_REUSEPORT/SO_REUSEADDR client socket #126539

Ousret opened this issue Nov 7, 2024 · 6 comments
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error

Comments

@Ousret
Copy link

Ousret commented Nov 7, 2024

Bug report

Bug description:

Unable to reliably be able to reuse a outgoing port using a client socket.
Depending on the interpreter version and/or OS, it sometime fails, sometime succeed.

import niquests

if __name__ == "__main__":

    with niquests.Session(source_address=("0.0.0.0", 8755)) as s:
        print(s.get("https://1.1.1.1"))

    with niquests.Session(source_address=("0.0.0.0", 8755)) as s:
        print(s.get("https://1.1.1.1"))
import niquests
import asyncio

async def main():
    async with niquests.AsyncSession(source_address=("0.0.0.0", 8755)) as s:
        print(await s.get("https://1.1.1.1"))

    async with niquests.AsyncSession(source_address=("0.0.0.0", 8755)) as s:
        print(await s.get("https://1.1.1.1"))

if __name__ == "__main__":
    asyncio.run(main())

You should get intermittent [Errno 99] Cannot assign requested address or similar depending on the OS.

I ran the following in:

  • Windows
  • MacOS
  • Linux

Across Python 3.7 -- 3.13

Here are the results:

  • Linux 3.7 (sync OK, async OK)

  • Windows 3.7 (sync OK, async OK)

  • Linux 3.11 (sync OK, async KO)

  • Windows 3.11 (sync OK, async KO)

  • Windows 3.10 (sync KO, async KO)

  • MacOS 3.8+ (sync KO, async KO)

Curiously, if you ran:

import niquests

if __name__ == "__main__":

    with niquests.Session(source_address=("0.0.0.0", 8755)) as s:
        print(s.get("https://1.1.1.1"))

By running the interpreter twice (exec python sample.py twice or more), it will work as much as needed. Something happen at interpreter shutdown that should happen before?

Low level speaking, socket.SO_REUSEPORT is applied when available, otherwise using socket.SO_REUSEADDR instead.
The sock.bind((addr, port)) is applied after setting sock opts and before connecting to remote peer.

As it seems to work flawlessly on Python 3.7, I expected it to work on later versions also.

See the minimal code to reproduce this (sync only):

import socket

def cpython_bug_bind_so_reuseport():
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

    try:
        sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
    except (AttributeError, OSError):  # Windows branch or old OS
        sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

    sock.bind(("0.0.0.0", 5784))
    sock.connect(("1.1.1.1", 443))

    sock.shutdown(socket.SHUT_RD)
    sock.close()


if __name__ == "__main__":
    cpython_bug_bind_so_reuseport()
    cpython_bug_bind_so_reuseport()

I posted the "higher" level code because, sometime the execution does not raise "Cannot assign requested address" but timeout instead. So the bind and connect pass but the socket is unusable. You will have to insist a bit to get this behavior.

Did I miss something? The official docs does not clearly mention .bind(..) usage with client-side socket, so we're in a grey area.

Regards,

CPython versions tested on:

3.9, 3.10, 3.11, 3.12, 3.13

Operating systems tested on:

Linux, macOS, Windows

@Ousret Ousret added the type-bug An unexpected behavior, bug, or error label Nov 7, 2024
@picnixz picnixz added the extension-modules C modules in the Modules dir label Nov 7, 2024
@Zheaoli
Copy link
Contributor

Zheaoli commented Nov 7, 2024

First, [Errno 99] Cannot assign requested address is raised when you created a lot of connection and those connections are not gc by system in time.

You need create a connection pool liked mechanism to reuse the connection.

Second, the SO_REUSEPORT is not safe, you may bind a port which is already been used(like a connection in CONNECTED not in TIME_WAIT), this may cause timeout when you send or recive a packet. This is depedent on the system behavior

For me, You should not use REUSEPORT/REUSEADDR in client side, connection pool liked mechanism should be better

@Zheaoli
Copy link
Contributor

Zheaoli commented Nov 7, 2024

@picnixz I think this is not a bug for stdlib, maybe we can remove the label.

@Zheaoli
Copy link
Contributor

Zheaoli commented Nov 7, 2024

BTW, If you think this should be a problem, You can use the https://github.com/cilium/pwru to trace your timeout packet on your Linux environment. And put the detail here, I can help you with more detail.

@Ousret
Copy link
Author

Ousret commented Nov 7, 2024

You misunderstood the point. Unfortunately.
SO_REUSEPORT is perfectly usable for client side conn, and major kernels document it as possible.

As it happen, we found the solution, by further adding opts to the socket, indicating the OS to release it sooner.
Somehow was already set appropriately in Python 3.7 and dropped afterward.

c poc

#include <sys/socket.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <netdb.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <arpa/inet.h>

void cpython_bug_bind_so_reuseport() {
    int sockfd = 0, n = 0;
    struct sockaddr_in serv_addr, local_addr;

      /* a socket is created through call to socket() function */
      if((sockfd = socket(AF_INET, SOCK_STREAM, 0)) < 0)
      {
	      printf("\n Error : Could not create socket \n");
      }

      memset(&serv_addr, '0', sizeof(serv_addr));

      serv_addr.sin_family = AF_INET;
      serv_addr.sin_port = htons(443);
      serv_addr.sin_addr.s_addr = inet_addr("1.1.1.1");

      local_addr.sin_family = AF_INET;
      local_addr.sin_addr.s_addr = INADDR_ANY;
      local_addr.sin_port = htons(12010);

      int enable = 1;
      setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, (char*)&enable, sizeof(enable));

      local_addr.sin_addr.s_addr = inet_addr("192.168.1.12");

      if (bind(sockfd, (struct sockaddr*) &local_addr, sizeof(struct sockaddr_in)) != 0) {
          printf("\n Error : bind error");
      }

      if( connect(sockfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) < 0)
      {
          printf("\n Error : Connect Failed \n");
      } else {
          printf("\n Info : Connect Ok \n");
      }
      
      const struct linger opt = { .l_onoff = 1, .l_linger = 0 };

      setsockopt(sockfd, SOL_SOCKET, SO_LINGER, &opt, sizeof opt);
      
      shutdown(sockfd, SHUT_WR);
      close(sockfd);
}

int main(int argc, char *argv[])
{
	
    cpython_bug_bind_so_reuseport();
    cpython_bug_bind_so_reuseport();
    
	return 0;
}

now it work reliably in sync mode, async part still doesn't work, but I suppose the opts aren't applied correctly either.

@Ousret Ousret closed this as not planned Won't fix, can't repro, duplicate, stale Nov 7, 2024
@Zheaoli
Copy link
Contributor

Zheaoli commented Nov 7, 2024

import asyncio 
import socket
import struct

async def cpython_bug_bind_so_reuseport():
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
    sock.bind(("0.0.0.0", 5784))
    loop = asyncio.get_event_loop()
    await loop.sock_connect(sock, ("1.1.1.1", 443))
    sock.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER,
                             struct.pack('ii', 1, 0))
    sock.shutdown(socket.SHUT_RDWR)
    sock.close()

async def run():
    await cpython_bug_bind_so_reuseport()
    await cpython_bug_bind_so_reuseport()
    


if __name__ == "__main__":
    asyncio.run(run())

This code works fine on my environment. I'm not sure this is async code in your description or not.

@Ousret
Copy link
Author

Ousret commented Nov 8, 2024

As I said,

but I suppose the opts aren't applied correctly either.

So, yes it is working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants