Skip to content

Use multiple threads for generating the content to profit from multi-core CPUs #115

@novoid

Description

@novoid

I don't have any experience with multiple threads in Python.

I think that it should not be that hard to split the (HTML) file generation part into multiple threads: data is collected and static, articles can be written independently from each other after the populate functions (must check generation of directories though and if not some data structures gets populated along the way).

Here's the output of my current blog generation:

INFO     • Parsing Org mode files …                                                                                
INFO     Parsed 22 Org-mode files with 1031079 lines (in 1.44 seconds)
INFO     • Generating articles …                                                                                   
INFO     • Building index of files …
INFO     Built index for 518638 files (in 1.64 seconds)                                                            
INFO     Generated 827 articles: 41 persistent, 706 temporal, 79 tag-pages, the entry page, and scaled 0 images (in 80.60 seconds)

As it seems, there's not much to gain in the parsing section as it is fairly fast. However, the generating phase is the significant duration here.

With no particular knowledge, I'd guess that moving the "generate" functions into threads doesn't scale much since they only generate one single entry and the threading overhead might add significant time here.

I guess that running a profiler would be the best way to determine which parts to move into threads. I don't have experience with this either. Without doing the profiler analysis, I'd maybe split up into those threads in https://github.com/novoid/lazyblorg/blob/master/lib/htmlizer.py:

  • the general pages: entry page, tag cloud, ...
  • all instances of:
    • _scale_and_write_image_file()
    • _copy_image_file_without_exif()

Maybe you do have experience and you are able to run a quick test of this is a quick win task?

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions