Skip to content

fixed traffic stats for http connection reuse #572

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

peonone
Copy link
Contributor

@peonone peonone commented Jan 24, 2025

Description of the issue

For plain HTTP requests using the forward method, it establishes an HTTP connection to the upstream proxy (or target website if no upstream proxy is specified) using the HTTP client library http.request.

So here the target is the TCP connection created or reused by the http.request call.
It relies on target.once('close'), target.bytesRead and target.bytesWritten for traffic stats, but it does not work well for certain cases, because the TCP connection can be reused, but it's not considered for the traffic stats logic.

Environment to reproduce the issue

proxy-chain version: 2.5.6

Let's use two URLs:

Node.js server script
const ProxyChain = require("proxy-chain");

(async () => {
    const proxyServer = new ProxyChain.Server({
        port: 6001,
        verbose: true,
        prepareRequestFunction: () => {
            return {
                upstreamProxyUrl: "http://<my-upstream-proxy-address>/",
            };
        },
    });
    proxyServer.on('connectionClosed', ({stats}) => console.log('stats from connectionClosed', stats));
    await proxyServer.listen();
    await new Promise((r) => {});
})();

case 1 - target socket reused by multiple requests is closed before the source is closed

Client python script to reproduce
import time
import requests

s = requests.Session()
proxies = {'http': 'http://localhost:6001'}

s.get('http://httpbin.org/flasgger_static/lib/jquery.min.js', proxies=proxies)
s.get('http://httpbin.org/ip', proxies=proxies)


time.sleep(10)
server output
ProxyServer[6001]: Listening...
ProxyServer[6001]: !!! Handling GET http://httpbin.org/flasgger_static/lib/jquery.min.js HTTP/1.1
ProxyServer[6001]: Using upstream proxy http://<my-upstream-proxy-addr>/
ProxyServer[6001]: Using forward()
ProxyServer[6001]: !!! Handling GET http://httpbin.org/ip HTTP/1.1
ProxyServer[6001]: Using upstream proxy http://<my-upstream-proxy-addr>/
ProxyServer[6001]: Using forward()
stats from connectionClosed {
  srcTxBytes: 86257,
  srcRxBytes: 355,
  trgTxBytes: 986,
  trgRxBytes: 172422
}

The trgRxBytes and trgTxBytes are doubled, because it registers socket.once('close') for every request, and when it's triggered, the 2 requests are finished using the same target socket, and the traffic of two requests is counted twice.

case 2 - target socket reused by multiple requests using different source

Client curl commands to reproduce
curl -x http://localhost:6001/ http://httpbin.org/flasgger_static/lib/jquery.min.js > /dev/null
curl  -x http://localhost:6001/ http://httpbin.org/ip > /dev/null
server output
ProxyServer[6001]: 1 | !!! Handling GET http://httpbin.org/flasgger_static/lib/jquery.min.js HTTP/1.1
ProxyServer[6001]: 1 | Using upstream proxy http://<my-upstream-proxy-addr>/
ProxyServer[6001]: 1 | Using forward()
stats from connectionClosed {
  srcTxBytes: 85998,
  srcRxBytes: 156,
  trgTxBytes: 249,
  trgRxBytes: 85975
}
ProxyServer[6001]: 2 | !!! Handling GET http://httpbin.org/ip HTTP/1.1
ProxyServer[6001]: 2 | Using upstream proxy http://<my-upstream-proxy-addr>/
ProxyServer[6001]: 2 | Using forward()
stats from connectionClosed {
  srcTxBytes: 259,
  srcRxBytes: 125,
  trgTxBytes: 467,
  trgRxBytes: 86211
}

The trgRxBytes and trgTxBytes for the second request include the bytes from the first request, because target is reused and target.bytesRead includes the bytes from the first request.

The fix

The fix is to track the number of bytes which is already there, when a socket is being allocated to an HTTP request, reduce it when calculating the stats, and use response.once('close') instead of target.once('close') to track the stats in the request life cycle.

@jirimoravcik
Copy link
Member

Hello @peonone,
I would like to thank you for the contribution. I will review the PR and get back to you as soon as possible.

@jirimoravcik
Copy link
Member

Hey @peonone,
I managed to reproduce this locally, thanks for the report.
I'll be testing the behavior with other modes than forward, after I managed to test all cases, I'll try to create a separate PR and link it here. Thank you

@peonone
Copy link
Contributor Author

peonone commented Feb 13, 2025

Just pushed the fix for HTTPS, as the same issue also happens to HTTPS requests, the target connection is not reused among multiple HTTPS requests, but the one established from a plain HTTP can be reused for an HTTPS request.

It can be reproduced by the below curl commands

curl request
curl -x http://localhost:6001/ http://httpbin.org/flasgger_static/lib/jquery.min.js > /dev/null
curl -x http://localhost:6001/ https://httpbin.org/ip > /dev/null
server output
ProxyServer[6001]: Listening...
ProxyServer[6001]: !!! Handling GET http://httpbin.org/flasgger_static/lib/jquery.min.js HTTP/1.1
ProxyServer[6001]: Using upstream proxy http://<my-upstream-proxy-addr>/
ProxyServer[6001]: Using forward()
stats from connectionClosed {
  srcTxBytes: 85998,
  srcRxBytes: 156,
  trgTxBytes: 249,
  trgRxBytes: 85975
}
ProxyServer[6001]: 1 | !!! Handling CONNECT httpbin.org:443 HTTP/1.1
ProxyServer[6001]: 1 | Using upstream proxy http://<my-upstream-proxy-addr>/
ProxyServer[6001]: 1 | Using chain() => httpbin.org:443
stats from connectionClosed {
  srcTxBytes: 5834,
  srcRxBytes: 851,
  trgTxBytes: 1138,
  trgRxBytes: 91809
}

The traffic stats for the second request includes the bytes for the first request.

@peonone peonone force-pushed the fix-http-traffic-stats branch from bbe7cee to 03df862 Compare February 13, 2025 11:34
@jirimoravcik
Copy link
Member

Just pushed the fix for HTTPS, as the same issue also happens to HTTPS requests, the target connection is not reused among multiple HTTPS requests, but the one established from a plain HTTP can be reused for an HTTPS request.

It can be reproduced by the below curl commands

curl request
server output
The traffic stats for the second request includes the bytes for the first request.

Thank you for this, great job.
I'll have to do this for SOCKS protocols as well and then we can release a new version.

@jirimoravcik
Copy link
Member

Hello @peonone,
the PR is available at #576

Could you please check it out? Thanks

@peonone
Copy link
Contributor Author

peonone commented Mar 10, 2025

Thnaks @jirimoravcik , it looks good.

@peonone peonone closed this Mar 10, 2025
jirimoravcik added a commit that referenced this pull request Mar 10, 2025
This PR attempts to fix incorrect stats due to the reuse of target sockets for HTTP(S) protocols. 

Based on #572

Note: I was forced to upgrade `actions/cache` as `v2` was deprecated and it wouldn't run with it.
I also had to edit eslint config to run with `.` instead of `src` and excluded `tests`, because with `src` the CI was failing (no idea why).
@jirimoravcik
Copy link
Member

This has been released in v2.5.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants