Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updateFile - some files may get stuck in is_downloading_active in poor network conditions #2585

Open
vincentneo opened this issue Aug 22, 2023 · 11 comments

Comments

@vincentneo
Copy link
Contributor

Observed on watchOS.

I will send a log to the bot, verbosity level of 10 shortly after submitting this, which the relevant ids shared below, corresponds to this log.

Using downloadFile, synchronous = false, priority 30, to download a file of size 23520 bytes, seems to not be downloaded at all, after waiting for about 5 minutes, with download_size of localFile being 0, throughout all updateFile calls. This problematic file's id is 1376.

I believe poor network conditions that is specifically used in this testing, is partly a cause to this, but what seem odd to me, is when, after 5 minutes, I tried to download another different file, this time, of size 19364 bytes, which successfully downloaded in about a minute, with the same settings and same network conditions as above. This file's id is 1369.

Both files are from 2 different messagePhoto, type m, width 320 photo files.

@levlam
Copy link
Contributor

levlam commented Aug 22, 2023

Your network connection seems to be extremely poor. The connections are closed by timeout all the time, because they can't receive request response in 10-30 seconds.

@vincentneo
Copy link
Contributor Author

Ah. In that case where connections are timed out, am I right to say that there will not be any new updateFile emitted, and is_downloading_active will be true for the remainder of the entire session?
Regardless, if I call downloadFile again after x seconds, will that override the stalled download and try again?
Is this yet another side effect from #2538 and its relevant shortcomings?

I am using the macOS's network link conditioner to simulate slow networks, which was done so, because I have personally frequently observed stalled image downloads on the watch, when I am out, which I guess is with the watch networking being always much much slower than on phone for whatever reasons, even both phone and watch are on the same cellular network or when the watch is sharing the phone's connectivity.

@levlam
Copy link
Contributor

levlam commented Aug 22, 2023

is_downloading_active will be true for the remainder of the entire session?

It will be true until the file is successfully downloaded. There are no "stailed" downloads. The download is in progress and it should eventually succeed, but currently it can't succeed due to extemely poor network. It fails to download 20 kilobytes of data in 10 seconds.

@vincentneo
Copy link
Contributor Author

There are no "stailed" downloads. The download is in progress and it should eventually succeed

The simulated network, as far as I know limits the bandwidth in a fixed manner, with the following setting:
network link conditioner, very bad network profile

If the same network is able to download 19kb in a minute (file 1369), wouldn't it signify that something is wrong with 1376 only?

@levlam
Copy link
Contributor

levlam commented Aug 23, 2023

No, it only means that 1376 was very lucky to have no lost packets, despite each packet is lost with 10% probability. I wouldn't expect file download to work with 10% packet loss. It is almost a broken network for TCP.

@vincentneo
Copy link
Contributor Author

I have re-tested downloading of the same file (of 23520 bytes), same message. This time, the file id is 5237.
Downloading did complete, but it took about 37 minutes to finish!

To eliminate the variable of packet loss, and the theory that id 1369 was about to download within a minute, I have set 0% possibility of packet loss, as well as 0ms of artificial delay, while maintaining the same bandwidth.
network link conditioner, set to 1mbps bandwidth, 0 drop packets, 0ms delay, up/down

Outside of simulation, the network where this is held, is a stable 1gbps fiber network, connected via ethernet, average 4ms ping to telegram server (149.154.171.5)

This proves the following:

  • File does indeed eventually complete download.
  • Theory that 1369 is lucky to have no packet loss, hence faster download, is disproven.

What is odd:

  • A 23kb file took 37 minutes to download, yet a 19kb file took about a minute to download.

What is interesting:

I am seeing the following in the log, that I don't recall seeing the last time:
Repair file reference?

[ 3][t 4][1692783654.816176176][FileManager.cpp:2536][#1][!FileManager][&file_references]	Do not have valid file_reference for file 5237(0)
[ 3][t 4][1692783654.816184282][FileReferenceManager.cpp:432][#1][!FileReferenceManager][&file_references]	Repair file reference for file 5237(0)/5237(0)
[ 3][t 4][1692783654.816191196][FileReferenceManager.cpp:439][#1][!FileReferenceManager][&file_references]	Create new file reference repair query with generation 3
[ 3][t 4][1692783654.816197156][FileReferenceManager.cpp:263][#1][!FileReferenceManager][&file_references]	Trying to repair file reference for file 5237(0)
[ 3][t 4][1692783654.816203117][FileReferenceManager.cpp:293][#1][!FileReferenceManager][&file_references]	Send file reference repair query for file 5237(0) with generation 3 from FileSourceId(99)
[ 3][t 4][1692783658.020383119][FileReferenceManager.cpp:394][#1][!FileReferenceManager][&file_references]	Receive result of file reference repair query for file 5237(0) with generation 3 from FileSourceId(99): OK 0
[ 3][t 4][1692783658.020387172][FileManager.cpp:2552][#1][!FileReferenceManager][&file_references]	Receive result from FileSourceManager for file 5237(0): [Error : 0 : FILE_DOWNLOAD_RESTART_WITH_FILE_REFERENCE]

Log will be provided, shortly after posting this post.

@levlam
Copy link
Contributor

levlam commented Aug 23, 2023

The connections are closed and reopened by TDLib, because no data has been received for 7-44 seconds through them.

@vincentneo
Copy link
Contributor Author

@levlam, I am very sorry for the waste of resources and the opening of this issue.

I had another look into the log, namely, the last log that I've sent to the bot, and I realised, that the testing was in fact, flawed, as there was actually a messageAnimation, 4.83MB file (id 5820), downloading as well without my knowledge at the time of conducting the tests, which I assume, used basically all of the already limited bandwidth. That was, in fact, the main problem.

However, before I close this issue, may I ask, to verify my latest understanding of how downloadFile's priority works? As a accidental discovery, I realised id 5820 is downloading with priority of 5, while the earlier file id of 5237 is downloading with priority of 30.

Am I right to say with the above context, that, as long as a download has started, priority no longer matters? I realise 5820's downloadFile was called before 5237, so my understanding with this is 5820's download will need to finish, before 5237 starts to download. Thank you.

@levlam
Copy link
Contributor

levlam commented Aug 24, 2023

Files are downloaded in a small chunks. Priority always matters for yet unsent file part request, but ongoing requests aren't canceled if a new download with higher priority is started. So, a part of the animation will be downloaded, then the photo, and then the rest of the animation.

@vincentneo
Copy link
Contributor Author

In the case of the final log, I observed the following:

(Numbers refers to timestamp)
5820
Start Download: 1692783528.315485000
Complete Download: 1692786663.937427997

5237
Start Download: 1692783572.393924236
Complete Download: 1692785832.999131917

Since 5820 continued to receive updateFile in between the time of 5237's download, can I verify with you that, the updateFile calls are in response to the requests sent before 1692783572?

@vincentneo
Copy link
Contributor Author

can I verify with you that, the updateFile calls are in response to the requests sent before 1692783572?

I assume this cannot be known from the log? If so I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants