You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
The partition_file() call works fine alone, but wrapping it in with vcr.use_cassette() causes it to hang indefinitely.
To Reproduce
Try a script like this:
import vcr
...
with vcr.use_cassette("/tmp/vcr_cassette.yaml"):
with open(pdf, "rb") as fp:
partitioned_doc = partition_file(fp, ARYN_API_KEY)
Expected behavior
The script terminates.
Additional context
I used tcpdump to capture network traffic. Without vcr, I see FIN packets, indicating closure of the connection. With vcr, there are no FINs. The FIN packets originate from the client side. When it's done reading the response, it appears to close. I think the behavior is in the iterator here:
The end result was still a hang, but this time I saw FIN packets, but they originated on the server side. I also noticed the Python process using 100% of a CPU while hanging.
I suspect the best workaround here is to disable streaming when vcr is involved. When I think about it, streaming doesn't make much sense in the context of cached responses. The reasons we use streaming are twofold: (1) provide timely feedback on progress, and (2) avoid idle connections that a firewall might shut down. If the client disables streaming, I think we lose #1, but keep #2 since the server always sends messages periodically.
The text was updated successfully, but these errors were encountered:
Describe the bug
The partition_file() call works fine alone, but wrapping it in
with vcr.use_cassette()
causes it to hang indefinitely.To Reproduce
Try a script like this:
Expected behavior
The script terminates.
Additional context
I used tcpdump to capture network traffic. Without vcr, I see FIN packets, indicating closure of the connection. With vcr, there are no FINs. The FIN packets originate from the client side. When it's done reading the response, it appears to close. I think the behavior is in the iterator here:
sycamore/lib/aryn-sdk/aryn_sdk/partition/partition.py
Line 122 in e6e9877
I suspected that vcr doesn't handle streaming responses very well. I changed stream from True to False in this line:
sycamore/lib/aryn-sdk/aryn_sdk/partition/partition.py
Line 112 in e6e9877
And now vcr works fine and I see FIN packets. I'm not suggesting that we disable streaming, but it seems like useful diagnostic information.
Another thing I tried was to keep stream=True and set the HTTP header "Connection: close" here:
sycamore/lib/aryn-sdk/aryn_sdk/partition/partition.py
Line 111 in e6e9877
The end result was still a hang, but this time I saw FIN packets, but they originated on the server side. I also noticed the Python process using 100% of a CPU while hanging.
I suspect the best workaround here is to disable streaming when vcr is involved. When I think about it, streaming doesn't make much sense in the context of cached responses. The reasons we use streaming are twofold: (1) provide timely feedback on progress, and (2) avoid idle connections that a firewall might shut down. If the client disables streaming, I think we lose #1, but keep #2 since the server always sends messages periodically.
The text was updated successfully, but these errors were encountered: