-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
localdocs: avoid cases where batch can make no progress #3094
Conversation
Signed-off-by: Jared Van Bortel <[email protected]>
Signed-off-by: Jared Van Bortel <[email protected]>
Signed-off-by: Jared Van Bortel <[email protected]>
Recording this metadata once avoids the need to open the PDF document every time we enter scanQueue. Signed-off-by: Jared Van Bortel <[email protected]>
Signed-off-by: Jared Van Bortel <[email protected]>
78a3cfb
to
61dc351
Compare
I was able to confirm that for large PDFs with many high-resolution images, the open for reading metadata can reliably take more than 100ms, which before this PR would prevent any words from being read. Before (current main):
Before the docx change, we would always read at least one page, even if it meant exceeding 100ms. After (this PR):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one minor nit
Signed-off-by: Jared Van Bortel <[email protected]>
Signed-off-by: Jared Van Bortel <[email protected]>
It has been brought to my attention that some users are still experiencing hangs in LocalDocs indexing after v3.4.1 that they did not see in v3.3.x or earlier. This is likely a result of #2986. Although the duration timer itself has been in #2396 without much issue, before we would at least process one page of a PDF if beginning the transaction took just under 100ms. Now we may not even process a single word.
I've now realized that the QPdfDocument we use to get metadata may also contribute to this, which this PR doesn't entirely fix - we may still only get a single word per iteration.
TODO: This should be changed such that we do not need to open the PDF document every iteration.Done.Checklist