Useful commands for manipulating PDFs on Fedora (any linux really)

So it’s not much of a secret that the Red Hat expenses system is truly terrible. Not a well known is the EMEA accounts team still require what I call “Arts and Crafts sessions” (all receipts attached to bits of paper and scanned as a whole) even though there’s no legal requirement for paper receipts to be provided any more in the UK/Ireland/EU!

Anyway the system regularly routes the emailed PDFs to /dev/null for no apparent reason and then you have to scratch your head and try and work out what’s wrong.

Size: this is the regular issue, basically if the PDF is larger than a few Mb it barfs. Thankfully ghostscript comes to the rescue here.

gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -sOutputFile=new_file.pdf original_file.pdf

The -dPDFSETTINGS setting has a few options:

  • /screen selects low-resolution output and hence the lowest file-size.
  • /ebook selects medium-resolution output with a medium file-size.
  • /printer uses high-resolution option which is mainly used for printing PDFs.
  • /prepress) similar to /printer but gives you the largest files.

Too many pages: Airline bookings are the great ones here, they add pages of adds to a one page receipt. Two pages are either to “print” just the page you need, or use oowriter (Libre Office Writer) to open it, delete the pages and export as PDF again.

Multiple PDFs: In theory the system can handle multiple docs. My millage has varied a LOT here. Easy fix comes from the poppler-utils package:

pdfunite doc-1.pdf doc-2.pdf doc-3.pdf out-doc.pdf

PDF versions: I have found 1.4 be the most effective here. Ghostscript comes to the rescue again here:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dQUIET -dBATCH -sOutputFile=output.pdf input.pdf

Adjust the level -dCompatibilityLevel to the version you need.