Stateless HTTP API to convert HTML to PDF
A dockerized HTTP service, that generates PDF files from HTML using WeasyPrint. The primary use-case is generation of documents from developer controlled templates, such as invoices. It is not meant as a general webpage to PDF converter. The service expects input HTML and other resources to be safe and doesn't do any hardening or sandboxing that would be required for arbitrary inputs. Please consult the security section of this document.
Run the docker image mormahr/pdf-service
and POST
the HTML to /generate
on port 8080.
Consult the API section for details about supported features and how to use them. See the deployment section (security in particular) for best practices in production environments.
docker run --rm -d --name pdf -p 8080:8080 mormahr/pdf-service
curl \
-H "Content-Type: text/html" \
--data '<p>Hello World!</p>' \
http://localhost:8080/generate \
> hello_world.pdf
docker stop pdf
Make a POST
request to /generate
with the HTML file you want to render as the body.
The response will be the PDF file.
curl \
-H "Content-Type: text/html" \
--data '<p>Hello World!</p>' \
https://pdf.example.com/generate \
> hello_world.pdf
Make a POST
request to /generate
with a Content-Type
of multipart/form-data
. Provide your
HTML input as index.html
and add any other required assets. The assets can be referenced in the
HTML either as an absolute URL like /image.png
or a relative one image.png
. Relative URLs are
resolved against /
. Omit the leading slash for the multipart/form-data
name
attribute.
curl \
-F [email protected] \
-F [email protected] \
-F sub-path/image.png=@sub-path/image.png \
https://pdf.example.com/generate \
> hello_world.pdf
<!-- index.html -->
<p>With an image:</p>
<img src="/image.png" />
<img src="/sub-path/image.png" />
The docker image is tagged as mormahr/pdf-service
.
We follow semver as well as possible, including visual changes when we detect them.
As such, we also tag release versions like :1.1.0
. We support semver major (:1
) or minor (:1.1
) tags that use the latest minor or patch
release version.
Images of the current development version are continuously pushed to the :edge
tag.
We strongly recommend that you use a release version instead of :edge
.
The service code is licensed under the MIT license. WeasyPrint, the underlying PDF generator library, is licensed under the BSD license. The prebuilt production container image contains a variety of licenses, including GPLv2 and GPLv3 code.
It's not recommended allowing untrusted HTML input. Use trusted HTML templates and sanitize user inputs.
Fetching of external assets is prohibited as of now. You can add internal assets with the multipart API.
If your instance is exposed publicly, I recommend using a reverse proxy to terminate TLS connections
and require authentication. You could use HTTP Basic Auth and then pass the pdf-service URL to your
client software via an environment variable. This way auth information can be embedded like this:
https://API_USER:[email protected]/generate
, where API_USER
and API_TOKEN
are the
credentials you set up in the reverse proxy.
-
WORKER_COUNT
(default: 4) Sets the worker pool size of the gunicorn server executing pdf_service. -
HOST
if the hostname isn't set on the container, pass it as an environment variable to identify the service in Sentry. -
SENTRY_DSN
Enable the Sentry integration and use this DSN to submit data. -
SENTRY_TRACES_SAMPLE_RATE
(0.0
...1.0
) If the Sentry integration is enabled this controls the tracing sample rate. It defaults to1.0
. Set it to0.0
to disable tracing. -
SENTRY_ENVIRONMENT
This sets the environment sent to Sentry. Defaults todevelopment
. -
SENTRY_RELEASE
This sets the release sent to Sentry. We set this to the current git SHA and you normally shouldn't need to overwrite it. -
SENTRY_TAG_*
Set a tag to a specific value for all transactions. For example to set the tagtest
toabc
, set the environment variableSENTRY_TAG_TEST=abc
.
The service has a /health
endpoint that will respond with a 200
status code if the service is
running. This endpoint is also configured as a docker HEALTHCHECK
.
The docker image supports the linux/amd64
(regular Intel and AMD 64bit processors on x86_64) and
linux/arm64
(Apple Silicon, AWS Graviton, etc.) architectures.
Image sizes and other information that varies between architectures is taken from the linux/amd64
variant.
If you need a different architecture, please open an issue with your use-case.
Native Windows docker images are not supported. The linux image can be run on Windows using Docker Desktop.
- Setup python venv
pip install -r requirements.txt -r requirements-dev.txt
(or:pip install -e '.[dev]'
)- Install docker and docker-compose to run tests. Tests run in docker to ensure render output doesn't differ based on platform.
- Run the development server with
python -m pdf_service
- Run tests with
./test
or./test-watch
- Tests are executed within docker, to ensure render results are identical to the containerized
version. The image contains external dependencies, but code and test files will be mounted from
the project source. If you want to rebuild the dev image add
--build
to the end of the command. This will instructdocker-compose
to rebuild the image.
- Tests are executed within docker, to ensure render results are identical to the containerized
version. The image contains external dependencies, but code and test files will be mounted from
the project source. If you want to rebuild the dev image add
e2e/data
contains reference inputs *.html
and corresponding output .png
.
The e2e test will render the html files and compare the output with the reference images to ensure
no changes slipped in.
To update reference images or add new test cases run ./regenerate-e2e-references
.