PDF Automation Station

PDF Automation Station

Extracting PDF Text Using Markup Tools

PDF text can be extracted using the highlighter, strikethrough, underline, and replace text commenting tools.

David Dagley's avatar
David Dagley
May 26, 2025
∙ Paid
white book page on top of table
Photo by Aaron Burden on Unsplash

Important Setting

There's a preferences setting in both Acrobat and Reader that can be used to extract text from PDFs using specific markup tools as listed in the subtitle of this article. Press Ctrl + k to open the preferences window and select the Comments category at the top of the list. Under the Making Comments section, select Copy selected text into Highlight, Strikethrough, Underline and Replace Text comment pop-ups. After this setting is selected any text selected with those markup tools becomes the contents of the annotation, available inside a popup for that annotation.


Get the course above, and a suite of automation tools for FREE with a Professional subscription.


Extracting Contents With JavaScript

PDFs can be converted to Excel spreadsheets by selecting File > Export To > Spreadsheet > Microsoft Excel Workbook. While the end result might resemble the PDF visually, the process is far from perfect and data might not be organized into rows and columns that is usable. This is especially true for scanned documents that have been OCR'd (recognize text). Consider a bank or credit card statement for which you need to extract transactions. Suppose you need data from four columns:

  1. Date

  2. Transaction description

  3. Funds out

  4. Funds in

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 David Dagley
Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture