-
Notifications
You must be signed in to change notification settings - Fork 36
Description
I don't have any experience with multiple threads in Python.
I think that it should not be that hard to split the (HTML) file generation part into multiple threads: data is collected and static, articles can be written independently from each other after the populate functions (must check generation of directories though and if not some data structures gets populated along the way).
Here's the output of my current blog generation:
INFO • Parsing Org mode files …
INFO Parsed 22 Org-mode files with 1031079 lines (in 1.44 seconds)
INFO • Generating articles …
INFO • Building index of files …
INFO Built index for 518638 files (in 1.64 seconds)
INFO Generated 827 articles: 41 persistent, 706 temporal, 79 tag-pages, the entry page, and scaled 0 images (in 80.60 seconds)
As it seems, there's not much to gain in the parsing section as it is fairly fast. However, the generating phase is the significant duration here.
With no particular knowledge, I'd guess that moving the "generate" functions into threads doesn't scale much since they only generate one single entry and the threading overhead might add significant time here.
I guess that running a profiler would be the best way to determine which parts to move into threads. I don't have experience with this either. Without doing the profiler analysis, I'd maybe split up into those threads in https://github.com/novoid/lazyblorg/blob/master/lib/htmlizer.py:
- the general pages: entry page, tag cloud, ...
- all instances of:
- _scale_and_write_image_file()
- _copy_image_file_without_exif()
Maybe you do have experience and you are able to run a quick test of this is a quick win task?