-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add option to parallelize pdfs #17
Comments
This should likely be the default. IMO, Unstructured-IO/unstructured#2461 is a blocker. |
Note that #32 has established the pattern for hooking into the autogenerated code. Our custom logic will live in the new decorator dir, and we'll wrap the endpoint function. When Speakeasy makes a new pr, we'll just add the one liners back (we'll have a script to do this soon). |
It's not easy with new client parameter I looked for ways of hiding the endpoint param in Openapi spec, but it's not supported even for the more basic case of endpoints OAI/OpenAPI-Specification#433 Perhaps some clever hacking would work here, but it might not be pretty. So I wanted to ask if it's ok for now (or forever), if we have a parameter which will work only on client side, and will be ignored backend side? We could rename it to |
Yeah, that's a good question. Adding something on the server for this does feel messy. I wonder if we could establish a better pattern for configuring these sorts of things on the client - some sort of Configuration object that we pass in to the sdk constructor. We can see what Speakeasy thinks about this, or figure out how to support the behavior on our own. I think, for now, we can add the logic and assume that it will be the default behavior. We could add configuration via env variable if users really want to be able to turn it off. |
Ok I asked the Speakeasy support and they suggested a way for adding the parameter only on client side, through overlay. This is relatively simple and looks like the best solution. The draft implementation is already in PR. |
This is implementation of #17 It adds a boolean `split_pdf_page` to PartitionParameters, which if True, causes the pdf to be split at client side to 1-page chunks, and send to API. The returned elements are joined to a single result list.
We aren’t planning to bring server side parallel mode back, but this can be a game changer for big documents. We should have the client optionally break a pdf into pages and send these off async. See this example.
We need to see if this it's possible to hook into the autogenerated code for something like this.
The text was updated successfully, but these errors were encountered: