{"slug": "strip-pdf-metadata", "title": "Strip PDF Metadata", "summary": "The article describes a shell script (`clean_pdf.sh`) that recursively finds PDF files in a specified directory (or the current directory) and strips their top-level metadata using `exiftool` and `qpdf`. The script creates a temporary clean copy of each PDF, linearizes it, and replaces the original file, though it notes that only file-level metadata is removed, not metadata within embedded images. The code is provided as-is with no guarantees of safety or effectiveness.", "body_md": "clean_pdf.sh\n\n      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.\n      \nLearn more about bidirectional Unicode characters\n\n \n    Show hidden characters\n\n# --------------------------------------------------------------------\n\n# Recursively find pdfs from the directory given as the first argument, \n\n# otherwise search the current directory.\n\n# Use exiftool and qpdf (both must be installed and locatable on $PATH) \n\n# to strip all top-level metadata from PDFs.\n\n#\n\n# Note - This only removes file-level metadata, not any metadata\n\n# in embedded images, etc. \n\n#\n\n# Code is provided as-is, I take no responsibility for its use,\n\n# and I make no guarantee that this code works\n\n# or makes your PDFs \"safe,\" whatever that means to you.\n\n#\n\n# You may need to enable execution of this script before using,\n\n# eg. chmod +x clean_pdf.sh\n\n#\n\n# example:\n\n# clean current directory:\n\n# >>> ./clean_pdf.sh\n\n#\n\n# clean specific directory:\n\n# >>> ./clean_pdf.sh some/other/directory\n\n# --------------------------------------------------------------------\n\n# Color Codes so that warnings/errors stick out\n\nGREEN=\"\\e[32m\"\n\nRED=\"\\e[31m\"\n\nCLEAR=\"\\e[0m\"\n\n# loop through all PDFs in first argument ($1),\n\n# or use '.' (this directory) if not given\n\nDIR=\"${1:-.}\"\n\necho \"Cleaning PDFs in directory $DIR\"\n\n# use find to locate files, pip to while read to get the\n\n# whole line instead of space delimited\n\n# Note -- this will find pdfs recursively!!\n\nfind $DIR -type f -name \"*.pdf\" | while read -r i\n\ndo\n\n \n\n  # output file as original filename with suffix _clean.pdf\n\n  TMP=${i%.*}_clean.pdf\n\n  # remove the temporary file if it already exists\n\n  if [ -f \"$TMP\" ]; then\n\n      rm \"$TMP\";\n\n  fi\n\n  exiftool -q -q -all:all= \"$i\" -o \"$TMP\"\n\n  qpdf --linearize --deterministic-id --replace-input \"$TMP\"\n\n  echo -e $(printf \"${GREEN}Processed ${RED}${i} ${CLEAR}as ${GREEN}${TMP}${CLEAR}\")\n\ndone", "url": "https://wpnews.pro/news/strip-pdf-metadata", "canonical_source": "https://gist.github.com/sneakers-the-rat/172e8679b824a3871decd262ed3f59c6", "published_at": "2022-01-26 02:33:17+00:00", "updated_at": "2026-05-23 06:05:01.714852+00:00", "lang": "en", "topics": ["developer-tools", "open-source", "cybersecurity"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/strip-pdf-metadata", "markdown": "https://wpnews.pro/news/strip-pdf-metadata.md", "text": "https://wpnews.pro/news/strip-pdf-metadata.txt", "jsonld": "https://wpnews.pro/news/strip-pdf-metadata.jsonld"}}