Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A much faster mergePage function #27

Open
Averell7 opened this issue Jun 18, 2011 · 5 comments
Open

A much faster mergePage function #27

Averell7 opened this issue Jun 18, 2011 · 5 comments

Comments

@Averell7
Copy link

mergePage function is slow. Needing more speed, I have written a modified version mergePage3 which is much faster when you merge pages from the same file (up to 200x faster) and faster also when you merge pages from different files. I can share the code if you are interested.
The basic idea : mergePage uses StreamContent to get the content of a page. But this class always starts the parseContentStream function even when this is not needed, and this function is time consuming.
mergePage3 parses the content only when really needed. Result is :

On a test file of 55 pages, if I put two pages on a sheet (booklet), with mergePage, it takes 34 seconds, with mergePage3 it takes 0.4 second. (I consider here only the time needed for mergePage, not the generation of the output file.

If you are interested, I can share the code.

@Averell7
Copy link
Author

I am new to GitHub and don't know the best way to propose my version on this site. Thanks for any advice.
Since a fork has been created and I don't know how to delete it, I can post my version here.

@Averell7
Copy link
Author

Averell7 commented Jul 9, 2011

I posted all the code in the Averell7 fork. Since it is fully compatible with the present code, let us hope one day it will be integrated in pyPdf.

@whitemice
Copy link

will this patch get mainlined?

@vnakk
Copy link

vnakk commented Nov 20, 2012

Hi

I have tried this patch. Indeed, the merging step is much better than before. But... i have the impression that the time of the saving file step has increased more (x10) than the merging step has decreased (/3).
What do you think ?

Thank you

@Averell7
Copy link
Author

Hi vnakk,
sorry I was unaware of your answer. I don't see really what you mean. The next version of PdfBooklet (already on Github but not released when I write) has two options : fast (with my mergepage3 function) and slow with the standard mergepage. This was implemented because we have been informed of a case where some artifacts appeared with mergepage3, and they were not present with mergepage. This is the single case known (with 1500 monthly download of PdfBooklet)

Source file : full text, 520 pages A4, output : booklet, A3, 260 pages
Mergepage (Slow mode) :

  • create Pdf = 271.2735161781311
  • save Pdf = 0.35102081298828125

Mergepage3 (Fast mode) :

  • create Pdf = 1.0970630645751953 (gain : 248 x)
  • save Pdf = 0.17000985145568848 (gain : 2x)

With a more sophisticated page, graphics and so on, the gain is 10x for creating Pdf, 2x for saving.
So even saving file is faster. (I don't know why, but it is like that).

Note that PdfBooklet is in beta state and does not work with all pfd files in slow mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants