Comparison of PDF Сreation Options from a Seasoned Developer
In this post, Dmytro Sichkar , a Senior Backend Engineer & Tech Lead at Django Stars, explores practical aspects of generating PDF documents for various web service tasks. He draws from his practical experience with three commonly used PDF generation tools: Puppeteer, React-PDF, and wkhtmltopdf, providing insights into their strengths and limitations, features, and suitable applications.?
This can serve as a valuable guide for developers seeking the most appropriate PDF generation tool for their specific project needs.
Generation of PDFs is a task that developers encounter at nearly every turn, whether it's for creating reports, invoices, web page screenshots, ebooks, or other documents. In this post, I'll share practical experience with some tools that efficiently handle this task — to help you make informed choices based on your industry, goals, and tech stack.
So, as stories like this often begin, we stand before several doors, each leading to a different solution to the indicated problem. Let's look at three of them:
Now, let's look at each option in more detail.
Puppeteer: Pulling Too Many Strings
This Node.js library provides a high-level API for controlling headless versions of Chromium web browsers. It simplifies browser automation tasks and can serve as an alternative to Selenium. Puppeteer allows you to open Headless Chrome and, then, print a PDF file from the page, just as you would in a regular browser.
Deploy and development:
Some cloud providers, e.g., GCP, install Puppeteer directly into runtimes (but only in JS ones). This provides the service with ease and simplicity. However, there can be nuances when it comes to local execution. (For example, you have to install the Functions Framework to run your GCP functions locally.)
It's also possible to use Puppeteer with Docker, but it's more complex to set up. Besides, Headless Chrome (for some reason) requires additional capabilities to work. Plus, a container size of up to 100MB certainly doesn't make this approach suitable for all.
How it works:
Typically, after receiving a URL to generate a PDF, Puppeteer makes requests to the main system (represented as a monolith in my examples), which includes the page and templates. The problem with this approach lies in the number of requests needed to generate the document:
While overall it works, it results in a lot of overhead, leading to significant latency. Additionally, it requires having two threads for each user.
Pros:
Cons:
React-PDF: Baking in a Special Furnace
React-PDF is a JavaScript library that allows developers to generate PDF documents with dynamic content, such as text, images, and charts, directly from React components. Instead of using a browser, React-PDF generates PDFs using Node.js magiс under the hood.
NOTE:?
Deploy and development:
It's possible to combine React-PDF with AWS Lambda and Docker (as a regular Express.js service) with no additional capacities required. The container size will depend mostly on the complexity of the templates, while the service's weight is insignificant by itself.
领英推荐
How it works:
In this case, the templates are stored within the React-PDF service. You only need to provide the data to be rendered in the templates. While this is also possible in the previous case, I haven't done it that way because it won't reduce the container size or eliminate capacities.
Pros:
Cons:
Wkhtmltopdf: Walking a Fine Line
Wkhtmltopdf is a Command Line Interface (CLI) tool used to convert web pages and HTML documents into PDF format. It is compatible with multiple platforms and enables the generation of PDF files from HTML code, CSS, and images. Although, in general, its capabilities are comparable to Puppeteer, Wkhtmltopdf has limited support and relies on an outdated Qt WebKit rendering engine.
NOTE:?
Pros:
Cons:
Takeaways
Over the past 3 years, I have walked through each of the above 3 doors. If you ask me, I prefer the first two. Yet, the choice depends on the specific tasks.
On a project that uses PDFs written in AngularJS, Django templates, or Bootstrap + jQuery, React-PDF is unlikely to be suitable as a universal solution. Most likely, moving these templates to a separate service (to send only the date for rendering) won't be considered. In this case, go through the first door. For example, we use this solution at a large mortgage platform, where switching from wkhtmltopdf to Puppeteer solved a lot of problems with PDF generation.
However, for a new project using React in the frontend, I would definitely consider a solution with React-PDF. For example, this is exactly what we use in a Dutch e-commerce project, and we are quite pleased with the result. Thanks to Mikhailo Kokadii for his recommendation of this library, because my first pick was Puppeteer.
I hope this experience will be valuable to everyone encountering similar tasks in their projects.