Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential memory leak? #43

Open
lancecarlson opened this issue Oct 26, 2022 · 14 comments
Open

Potential memory leak? #43

lancecarlson opened this issue Oct 26, 2022 · 14 comments
Assignees

Comments

@lancecarlson
Copy link

lancecarlson commented Oct 26, 2022

I'm running this on the client side:

require 'async'
require 'async/http/endpoint'
require 'async/websocket/client'

URL = ARGV.pop || "http://127.0.0.1:7070"

Async do |task|
  endpoint = Async::HTTP::Endpoint.parse(URL)
	
  Async::WebSocket::Client.connect(endpoint) do |connection|
    1000.times do
      connection.send_text("Hello World")
    end
    connection.flush
		
    while message = connection.read
      p message
    end
  end
end

I'm running the server at the bottom example in the guide here:

https://github.com/socketry/async-websocket/tree/main/guides/getting-started

I'm running 3 different clients and closing them, then running them again. Each time checking the memory of the ruby process spawned from the server. Each time it seems to go up without garbage collecting at all. This could wreak havoc on any long running service. Do you have any idea where the leak is coming from? I tried messing with the $connections global to see if the Set was a problem, but I don't think so. It might be related to some weird string allocation thing going on when the incoming message comes through?

@ioquatix ioquatix self-assigned this Oct 26, 2022
@lancecarlson
Copy link
Author

@ioquatix were you able to replicate this ?

@ioquatix
Copy link
Member

I have not had a chance to look at it yet but will soon.

@zephleggett
Copy link

zephleggett commented Nov 11, 2022

@ioquatix I have noticed every growing memory when subscribing to a websocket also. I will investigate this some more.

I really appreciate all the work you have done.

@ioquatix
Copy link
Member

Do you mind telling me how you are measuring memory? RSS?

@ioquatix
Copy link
Member

ioquatix commented Feb 3, 2023

There was at least one bug in async-http which was not correctly releasing memory, do you mind trying again and reporting back with the latest releases? v0.60.1 of async-http and v0.23.0 of async-websocket.

@zephleggett
Copy link

@ioquatix thanks for looking into this. I'm sorry I didn't get around to investigating this further. I will try the new version and report back.

@kaspergrubbe
Copy link

@zephleggett Any updates?

@yard
Copy link

yard commented Jun 26, 2024

hey there! we are observing a similar issue, we are on 0.66.3 of async-http and 0.26.1 of async-websocket.
Our setup is a bit more involved (we are actually using websockets from within Rails), but the tests show the memory grow indefinitely.

We have only started the investigation today, so not really sure what is the source of the leak, but it does seem to be connected to websockets (regular HTTP requests served by Falcon don't exhibit such behaviour).

We will report our findings as soon as there are any, but if someone has ideas how to narrow down the search, these will be highly appreciated.

@ioquatix
Copy link
Member

@yard Are you handing HTTP/1 or HTTP/2 requests?

@ioquatix
Copy link
Member

Are you able to share a reproduction?

If not, can you please do the following:

At the start of your test server, run the following:

ObjectSpace.trace_object_allocations_start

Then make connections which leak objects. Ensure all the connections are closed/finished, and then run the following:

def dump_all
  3.times{GC.start}

  File.open("heap.json", "w") do |file|
    ObjectSpace.dump_all(output: file)
  end
end

Then upload heap.json somewhere private and share it with me.

@yard
Copy link

yard commented Jun 27, 2024

Hi @ioquatix, good to see you again :)

Yeah so we are currently trying to decipher heap dumps – we started with the "real" ones from the application, but they are pretty big (leaking around 400kb per ws connection doing a simple 3 message exchange and then going), so we decided to spin up a minimal example and checking whether the leak with repro there.

Our current theory is that a relatively big object graph gets retained in the real app (we use Rails there, so a lot is going on during a single request lifespan), but the cause is probably just a single object that captures an instance of Rack env, a request or something else, essentially magnifying the leak. And of course regular http request do not leak in the app, so all the signs are pointing at web sockets currently.

Re protocols – we are using HTTP/1.1 (can you even do WS with HTTP/2? I am not sure, but not our case anyway).

I will keep you posted as we progress and will obv propose a fix once we find the root cause.

@ioquatix
Copy link
Member

ioquatix commented Jun 27, 2024

Re protocols – we are using HTTP/1.1 (can you even do WS with HTTP/2? I am not sure, but not our case anyway).

Yes, and it's fully supported.

I will keep you posted as we progress and will obv propose a fix once we find the root cause.

Awesome, sounds great. Yes, holding onto the request (e.g. in a global/thread local cache) will cause problems.. If you dump the heap, and then work backwards from the Async::WebSocket::Connection object (look at what's referencing it), you should be able to identify the leak.

If that's not helping, try looking at something holding onto the Protocol::HTTP1::Connection or Protocol::HTTP::Headers - both can cause ugly leaks if retained accidentally.

Give https://github.com/Shopify/heap-profiler a try too.

@yard
Copy link

yard commented Jun 27, 2024

Ok so the issue turned out to be almost boring: ActiveRecord query cache internals. What's even more boring is that it's probably won't be an issue with 7.2 due to refactoring of the whole thing.

It took us some time to figure out as all the usual suspects (connection objects, sockets etc) were getting properly released (we even started adding finalizers to them to see GC collecting these). The bug was due to https://github.com/rails/rails/blob/v7.1.3.4/activerecord/lib/active_record/connection_adapters/abstract/query_cache.rb#L38 – it simply kept the reference of every fiber ever used with Rails (with the value of true), which lead to the server leaking quite a significant amount of memory for obvious reasons.

We are still gonna be running more stress tests, but so far it seems that at least the memory usage is totally fine.

@ioquatix
Copy link
Member

Thanks for investigating! @byroot FYI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants