Module pdf
ballerina/pdf Ballerina library
The ballerina/pdf module provides functionality to convert HTML content to PDF documents and read data from existing PDFs. It processes HTML strings — including full documents, fragments, and messy real-world markup — and produces PDF byte arrays suitable for writing to files or sending over the network.
All processing is done locally with no external service dependencies.
Quickstart
To use the pdf connector in your Ballerina application, modify the .bal file as follows:
Step 1: Import the module
Import the pdf module.
import ballerina/pdf;
Step 2: Invoke module functions
Extract text from a PDF
byte[] pdfBytes = check io:fileReadBytes("document.pdf"); string[] pages = check pdf:extractText(pdfBytes); foreach int i in 0 ..< pages.length() { io:println("Page ", i + 1, ": ", pages[i]); }
Convert PDF pages to images
byte[] pdfBytes = check io:fileReadBytes("document.pdf"); string[] base64Images = check pdf:toImages(pdfBytes); // Each element is a Base64-encoded PNG string (one per page)
Convert html to pdf
Convert an HTML string to a PDF document.
byte[] pdfBytes = check pdf:parseHtml("<h1>Hello World</h1><p>Generated with Ballerina.</p>"); check io:fileWriteBytes("output.pdf", pdfBytes);
Convert with custom options
string html = check io:fileReadString("report.html"); byte[] pdfBytes = check pdf:parseHtml(html, fallbackFontSize = 10.0, pageSize = pdf:LETTER, margins = {top: 72, right: 54, bottom: 72, left: 54}, additionalCss = "body { font-family: sans-serif; } .container { width: 100% !important; }" ); check io:fileWriteBytes("report.pdf", pdfBytes);
Step 3: Run the Ballerina application
bal run
Examples
The pdf module provides practical examples illustrating usage in various scenarios. Explore these examples.
- HTML to PDF conversion — Reads an HTML report file and converts it to PDF.
Known limitations (HTML-to-PDF conversion)
The parseHtml function uses a custom HTML/CSS renderer that supports CSS 2.1 core layout (block, inline, float, table, absolute/relative positioning) but has gaps compared to browser rendering. Key limitations:
- Layout: No flexbox, CSS Grid, or multi-column layout. No
position: fixedorposition: sticky. - Tables: No
rowspan, no<caption>, notable-layout: fixedalgorithm. - Text: No
text-align: justify, no hyphenation, notext-indent, notext-overflow: ellipsis. - CSS features: No
::before/::afterpseudo-elements, no CSS counters, nocalc(), no custom properties (var()), no@import, no@mediaqueries (print/all media types are supported). - Visual: No CSS gradients, no
text-shadow, no CSS transforms, no SVG rendering. Onlysolidborder style is supported. - Fonts: No
@font-face(use thecustomFontsoption instead). No OpenType features. Bundled fonts: Liberation Sans and Liberation Serif (metrically compatible with Arial and Times New Roman). - Page control: No
page-break-inside: avoid, no orphans/widows control, no@pagemargin boxes.
Import
import ballerina/pdf;Other versions
0.9.0