{"slug": "pdf-make-pdf-generation-extraction-and-modification", "title": "PDF::Make - PDF Generation, Extraction and Modification.", "summary": "A developer built PDF::Make, a Perl toolkit for generating, extracting, and modifying PDF files. The toolkit provides low-level PDF object manipulation and a high-level builder API, and supports post-processing existing PDFs by extracting structured text and drawing annotations. A practical example demonstrates creating a PDF and then highlighting matched terms using the extract_structured method.", "body_md": "I’ve always been fascinated by PDFs. They look simple on the surface. Just a document you can open anywhere but underneath they’re a full layout engine, object graph, drawing model, and archival format all at once. I enjoy that mix of precision and complexity and that is exactly what led me to build `PDF::Make`\n\n(and yes I had some help from Claude LLM). I wanted a fully featured toolkit that could both generate PDFs and let me inspect/edit them programmatically.\n\nAt the low level, `PDF::Make`\n\nexposes the raw building blocks of the format: PDF objects, pages, the drawing canvas, a parser/reader, and import/merge primitives. This is the layer you reach for when you need fine grained control or want to work with the structure of a document directly.\n\nFor everyday document creation, `PDF::Make::Builder`\n\nsits on top of that foundation and provides a higher level API. It handles the boilerplate of page setup, fonts, text flow, and layout so you can produce a polished PDF in just a few lines of Perl.\n\nThe same toolkit is also designed for post-processing. You can open an existing PDF, extract structured text along with its coordinates, and then draw annotations or overlays back onto the page, making it straightforward to build review, QA, or markup workflows on top of documents you didn’t originally generate.\n\nThis post shows a practical two-step flow:\n\n`PDF::Make::Builder`\n\nScript:\n\n``` php\n#!/usr/bin/perl\nuse strict;\nuse warnings;\n\nuse PDF::Make::Builder;\n\nmy $pdf = PDF::Make::Builder->new(\n    file_name => 'source_demo.pdf',\n    configure => {\n        text => {\n            font => { family => 'Helvetica', size => 12, colour => '#222222' },\n        },\n    },\n);\n\n$pdf->add_page(page_size => 'Letter')\n    ->add_h1(text => 'PDF::Make blog demo')\n    ->add_text(text => 'PDF::Make builds and edits PDF files directly from Perl.')\n    ->add_text(text => 'In the next step we extract text coordinates and highlight matches.')\n    ->add_text(text => 'Target terms: PDF::Make, extract_structured, highlight.')\n    ->add_text(text => 'This line repeats PDF::Make so multiple boxes are drawn around matches.')\n    ->save;\n\nprint \"Created corpus/blog_tests/source_demo.pdf\\n\";\n```\n\nThat gives us a baseline document to post process.\n\nNow we:\n\n`extract_structured`\n\npage by page,\n\n``` php\n#!/usr/bin/perl\nuse strict;\nuse warnings;\nuse PDF::Make::Builder;\n\nmy $in  = $ARGV[0] // 'source_demo.pdf';\nmy $out = $ARGV[1] // 'source_demo_highlighted.pdf';\nmy $re  = $ARGV[2] // 'PDF::Make';\n\nmy $b = PDF::Make::Builder->open_existing($in, file_name => $out);\nmy $pad = 1.5;\nmy $page_count = $b->page_count;\nfor my $idx (0 .. $page_count - 1) {\n    my $res = $b->extract_structured($in, page => $idx, invisible => 1);\n    my $blocks = $res->data || [];\n\n    $b->open_page($idx + 1);\n    my $canvas = $b->page->canvas;\n\n    for my $block (@$blocks) {\n        my $lines = $block->{lines} || [];\n        for my $line (@$lines) {\n            my $words = $line->{words} || [];\n            for my $w (@$words) {\n                my $text = $w->{text} // '';\n                next unless $text =~ /$re/i;\n\n                my ($x0, $y0, $x1, $y1) = @{$w}{qw/x0 y0 x1 y1/};\n                my $rx  = $x0 - $pad;\n                my $ry  = $y0 - $pad;\n                my $rw  = ($x1 - $x0) + (2 * $pad);\n                my $rh  = ($y1 - $y0) + (2 * $pad);\n                # border highlight (red stroke, no fill)\n                $canvas->q->w(0.8)->RG(1, 0, 0)->re($rx, $ry, $rw, $rh)->S->Q;\n            }\n        }\n    }\n}\n\n$b->save;\nprint \"Created $out\\n\";\n```\n\nI would recommend reading the full documentation on CPAN to get the most out of the toolkit. The [ PDF::Make](https://metacpan.org/pod/PDF::Make) and\n\n`PDF::Make::Builder`", "url": "https://wpnews.pro/news/pdf-make-pdf-generation-extraction-and-modification", "canonical_source": "https://dev.to/lnation/pdfmake-pdf-generation-extraction-and-modification-4696", "published_at": "2026-06-28 09:39:01+00:00", "updated_at": "2026-06-28 10:04:11.766237+00:00", "lang": "en", "topics": ["developer-tools", "natural-language-processing"], "entities": ["PDF::Make", "PDF::Make::Builder", "CPAN", "Helvetica", "Claude LLM"], "alternates": {"html": "https://wpnews.pro/news/pdf-make-pdf-generation-extraction-and-modification", "markdown": "https://wpnews.pro/news/pdf-make-pdf-generation-extraction-and-modification.md", "text": "https://wpnews.pro/news/pdf-make-pdf-generation-extraction-and-modification.txt", "jsonld": "https://wpnews.pro/news/pdf-make-pdf-generation-extraction-and-modification.jsonld"}}