How to fix Traefik v2 / v3 (v2.11.x / v3.x.x) stopping Immich uploads that go longer than 60 seconds (502 Bad Gateway) #8872
Replies: 8 comments 3 replies
-
Great job! We had the same issue! Rollback traefik version to 2.11.1 fixed issue. |
Beta Was this translation helpful? Give feedback.
-
@othyn
|
Beta Was this translation helpful? Give feedback.
-
Thanks! I'm using traefik 3.0.0 and I also had to change the timeout. This is how it looks like in docker-compose.yml if your endpoint is called websecure (command section): |
Beta Was this translation helpful? Give feedback.
-
If you use truenas scale and install immich from truecharts (with their traefik) - then you can add "Extra Args" in gui for traefik chart with value
|
Beta Was this translation helpful? Give feedback.
-
do you need to add under http and https entry points if you have them? or is http enough? |
Beta Was this translation helpful? Give feedback.
-
I am on traefik 3.1 and it was driving me crazy that large uploads kept receiving gateway errors. Just pasted this into my
|
Beta Was this translation helpful? Give feedback.
-
Thanks for the above. I also came across the above problem. It needed both web and websecure timeout lines to fix (works with Traefik 3.1), as follows:
|
Beta Was this translation helpful? Give feedback.
-
Traefik docs says default value for 'readTimeout' is '0s'. Anyways setting this value fixes upload 502 Bad Gateway issue. I decided to set this in config instead of labels, as there is number of proxied endpoints with download/upload options. |
Beta Was this translation helpful? Give feedback.
-
TLDR, here's the fix:
Add the following to your Traefik config and restart the Traefik instance. For more examples, see the docs: https://doc.traefik.io/traefik/routing/entrypoints/#respondingtimeouts
What's going on?
I was having issues with Immich not being able to upload large files. The app would stop uploading, whilst the app remained responsive, during large file uploads. This is new behaviour, as I've been able to upload to my hearts content for months. So what's changed? My hunch was it was network related, so I transferred the same file onto my desktop to see if the upload still failed via a browser, it did.
Digging deeper into the issue, the first thought was that it was buffer related, as these things usually are. But as I'm using Traefik, which by default doesn't impose/define any buffer or upload file size limitations, as it tries to behave as much like a transparent reverse proxy as possible.
Doing the file transfer on a desktop was also handy for one main reason, the network inspector. Popping that open revealed that requests were being terminated prematurely with a
502
, but didn't reveal anything else of value, even the request timings (which will be important later), as they weren't that accurate and included the transport and execution times from the browser as well. So nothing stood out as being the problem.Next port of call was enabling
DEBUG
level logging and access logging on the Traefik instance to try and debug what on earth was going on. After doing that, all became clear:The important thing here is it revealed that it was indeed Traefik killing the connection, and crucially why Traefik killed the connection, the
readfrom
config value was set to60s
and therefor triggering thei/o timeout
. But this is new, no? It's never done this before. Why is it doing it now?Burned by Watchtower. It automatically upgraded to Traefik
v2.11.2
that has a breaking change in a minor semver patch version, a big no-no in the software game. There is an ongoing discussion as to why exactly this change came in, and its lead to a lot of heated discussion on their repo. It also revealed why the issue was so hard to track down information for online, as the issue was only opened less than a week ago.Finding the fix!
Initial bug thread: traefik/traefik#10592
The culprit: traefik/traefik#10594 (comment)
The wider discussion thread: traefik/traefik#10598
Tucked away right at the bottom of the changelog, https://doc.traefik.io/traefik/migration/v2/#v2112, is this little doozy:
And there we go, the
i/o timeout
error being linked back to it being a change to the defaultEntryPoint.Transport.RespondingTimeouts.ReadTimeout
value.Checking the docs for this section, we can now apply a fix (notice that the docs now say it defaults to 60 seconds instead of being disabled, further confirming this as the culprit): https://doc.traefik.io/traefik/routing/entrypoints/#respondingtimeouts
This will revert
readTimeout
to its previous default value of being disabled, instead of 60 seconds, and solve the issue.Happy uploading!
Beta Was this translation helpful? Give feedback.
All reactions