-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Max Num Pages also in MS Word and Powerpoint #689
Comments
@JamMaster1999 I don't think the max_num_pages is used to load the pdf till that number, The reason it errors out because of this condition, if not self.page_count <= self.limits.max_num_pages, check this in document.py, Pipeline runs for all the pages |
Thanks @trinanjan12 So I have to slice the file before hand? Also this is for docx, haven't tried PDF. |
@JamMaster1999 I guess so, for now. |
I opted to do this for PDFs but docx and pptx slicing pages is not possible with open source libraries. If docling can enforce max num pages for docx and pptx, it would be AMAZING!
…On Jan 6, 2025 at 8:45 PM -0800, Trinanjan Saha ***@***.***>, wrote:
@JamMaster1999 I guess so, for now.
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
maybe @dolfim-ibm can comment on this. |
The |
I will flag this issue as a feature request for using the parameter also in the other backends. |
@dolfim-ibm Got it! Is there a way to make max num page to truncate them? I am thinking something more like page range where it processes a specific set of pages in the document, not entirely skip the document. |
@JamMaster1999 We will track this in a clean new feature request here: #845 |
Bug
I am setting max_num_pages in my convert method but it proceeds to perform on all the pages.
Steps to reproduce
`
def init_docling(
input_file: str,
output_folder: str,
max_pages: Optional[int] = None,
):
from docling.document_converter import DocumentConverter, PdfFormatOption, WordFormatOption
Docling version
2.14.0
Python version
3.12.8
The text was updated successfully, but these errors were encountered: