Skip to content

Support font subsetting to reduce size of pdf #103

@Yang-Xijie

Description

@Yang-Xijie

Describe the bug

I want to add Chinese and Japanese in PDF. I did present Chinese and Japanese characters (は哈) successfully, but the size of output.pdf is too large (14MB).

I read the example doc and found the chapter 8.6.2 Composite fonts. I just want to render each character seperately, namely extract the font of a single character and then package these characters in PDF file. How to achieve this using borb? I wonder if there is an exact configuration in borb?

To Reproduce

Steps to reproduce the behaviour:

Download Microsoft Yahei.ttf at https://github.com/dolbydu/font/blob/master/unicode/Microsoft%20Yahei.ttf

from borb.pdf.document.document import Document
from borb.pdf.page.page import Page
from borb.pdf.canvas.layout.page_layout.multi_column_layout import SingleColumnLayout
from borb.pdf.canvas.layout.page_layout.page_layout import PageLayout
from borb.pdf.canvas.layout.text.paragraph import Paragraph
from borb.pdf.pdf import PDF
from borb.pdf.canvas.font.simple_font.true_type_font import TrueTypeFont
import time

from pathlib import Path

def print_current_time():
    print(time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()))

if __name__ == "__main__":

    print_current_time()

    font_path = Path(__file__).parent / "font" / "Microsoft Yahei.ttf"
    custom_font = TrueTypeFont.true_type_font_from_file(font_path)

    print_current_time()

    doc = Document()
    page = Page()
    doc.append_page(page)
    layout = SingleColumnLayout(page)
    layout.add(Paragraph("はははは哈哈", font=custom_font))

    print_current_time()

    timestamp = time.strftime("%Y_%m_%d_%H_%M_%S", time.localtime())
    pdf_name = timestamp + ".pdf"
    pdf_path = Path(__file__).parent / "pdf" / pdf_name
    with open(pdf_path, "wb") as pdf_file_handle:
        PDF.dumps(pdf_file_handle, doc)

    print_current_time()
2022-05-27 21:19:11
2022-05-27 21:19:26
2022-05-27 21:19:27
2022-05-27 21:20:02
[ 288]  .
├── [  97]  README.md
├── [ 128]  font
│   ├── [ 21M]  Microsoft Yahei.ttf
│   └── [ 74M]  PingFang.ttc
├── [1.3K]  main.py
└── [  96]  pdf
    └── [ 14M]  2022_05_27_20_49_11.pdf

Expected behaviour

The size of PDF file should be less than 1MB.

Desktop (please complete the following information):

  • OS: macOS 12.3
  • borb version 2.0.26
  • Python 3.9.5

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions