Comparison of PDF Сreation Options from a Seasoned Developer

Comparison of PDF Сreation Options from a Seasoned Developer

In this post, Dmytro Sichkar , a Senior Backend Engineer & Tech Lead at Django Stars, explores practical aspects of generating PDF documents for various web service tasks. He draws from his practical experience with three commonly used PDF generation tools: Puppeteer, React-PDF, and wkhtmltopdf, providing insights into their strengths and limitations, features, and suitable applications.?

This can serve as a valuable guide for developers seeking the most appropriate PDF generation tool for their specific project needs.


Generation of PDFs is a task that developers encounter at nearly every turn, whether it's for creating reports, invoices, web page screenshots, ebooks, or other documents. In this post, I'll share practical experience with some tools that efficiently handle this task — to help you make informed choices based on your industry, goals, and tech stack.

So, as stories like this often begin, we stand before several doors, each leading to a different solution to the indicated problem. Let's look at three of them:

  1. Puppeteer, or its counterpart Pyppeteer for true Python developers (there could also be another web framework like Express.js or FastAPI (which, however, is not necessary if dealing with AWS Lambda)
  2. React-pdf + Express.js or some other backend framework (or once again, AWS Lambda)
  3. Wkhtmltopdf, which I'd not recommend considering for actual projects due to poor support for new JS features (otherwise, you might need to rewrite for one of the two previous options later)

Now, let's look at each option in more detail.

Puppeteer: Pulling Too Many Strings

This Node.js library provides a high-level API for controlling headless versions of Chromium web browsers. It simplifies browser automation tasks and can serve as an alternative to Selenium. Puppeteer allows you to open Headless Chrome and, then, print a PDF file from the page, just as you would in a regular browser.

Deploy and development:

Some cloud providers, e.g., GCP, install Puppeteer directly into runtimes (but only in JS ones). This provides the service with ease and simplicity. However, there can be nuances when it comes to local execution. (For example, you have to install the Functions Framework to run your GCP functions locally.)

It's also possible to use Puppeteer with Docker, but it's more complex to set up. Besides, Headless Chrome (for some reason) requires additional capabilities to work. Plus, a container size of up to 100MB certainly doesn't make this approach suitable for all.

How it works:

Typically, after receiving a URL to generate a PDF, Puppeteer makes requests to the main system (represented as a monolith in my examples), which includes the page and templates. The problem with this approach lies in the number of requests needed to generate the document:

While overall it works, it results in a lot of overhead, leading to significant latency. Additionally, it requires having two threads for each user.

Pros:

  • It's not limited to React and offers a rather wide choice for PDF generation.

Cons:

  • Large container size — due to Headless Chrome.
  • High latency — due to multiple factors (Headless Chrome, a high number of requests that take longer outside of the monolith, etc.)
  • Medium service complexity.

React-PDF: Baking in a Special Furnace

React-PDF is a JavaScript library that allows developers to generate PDF documents with dynamic content, such as text, images, and charts, directly from React components. Instead of using a browser, React-PDF generates PDFs using Node.js magiс under the hood.


NOTE:?

  • Despite its name, React-PDF can be used not only in React projects because it generates PDFs not from the project or any of its components. It uses React exclusively for the "layout" of the PDF making it possible to generate nice PDFs even for a multi-technology project, instead of using HTML/CSS and rendering methods 1 and 3.


Deploy and development:

It's possible to combine React-PDF with AWS Lambda and Docker (as a regular Express.js service) with no additional capacities required. The container size will depend mostly on the complexity of the templates, while the service's weight is insignificant by itself.

How it works:

In this case, the templates are stored within the React-PDF service. You only need to provide the data to be rendered in the templates. While this is also possible in the previous case, I haven't done it that way because it won't reduce the container size or eliminate capacities.

Pros:

  • Relatively small container size.
  • Service complexity is simple.
  • Very fast rendering.

Cons:

  • Works only with React templates (there may be some alternatives for other frameworks, but it might not be as straightforward to build).

Wkhtmltopdf: Walking a Fine Line

Wkhtmltopdf is a Command Line Interface (CLI) tool used to convert web pages and HTML documents into PDF format. It is compatible with multiple platforms and enables the generation of PDF files from HTML code, CSS, and images. Although, in general, its capabilities are comparable to Puppeteer, Wkhtmltopdf has limited support and relies on an outdated Qt WebKit rendering engine.


NOTE:?

  • I'd like to add that wkhtmltopdf is not bad at all. It performs quite well with straightforward tasks like creating invoices or in situations where advanced data processing is not needed. For instance, it has been in use for over 10 years in one of our projects.


Pros:

  • It can be installed as a backend dependency (there is a Python wrapper available).
  • It works with any HTML, CSS, and JS with some exceptions. When new features appear in the technologies, wkhtmltopdf may start to support them after years.

Cons:

  • Limited support.
  • Generated files are 2-3 times larger than other solutions.
  • Highly sensitive to JavaScript, with minimal support for new standards.

Takeaways

Over the past 3 years, I have walked through each of the above 3 doors. If you ask me, I prefer the first two. Yet, the choice depends on the specific tasks.

On a project that uses PDFs written in AngularJS, Django templates, or Bootstrap + jQuery, React-PDF is unlikely to be suitable as a universal solution. Most likely, moving these templates to a separate service (to send only the date for rendering) won't be considered. In this case, go through the first door. For example, we use this solution at a large mortgage platform, where switching from wkhtmltopdf to Puppeteer solved a lot of problems with PDF generation.

However, for a new project using React in the frontend, I would definitely consider a solution with React-PDF. For example, this is exactly what we use in a Dutch e-commerce project, and we are quite pleased with the result. Thanks to Mikhailo Kokadii for his recommendation of this library, because my first pick was Puppeteer.

I hope this experience will be valuable to everyone encountering similar tasks in their projects.





要查看或添加评论,请登录

社区洞察

其他会员也浏览了