On 1/28/2025 1:23 PM, Glen Walpert wrote:
Good info, thanks. Took me a while to check it out, but I found that
pdftk can add or revise PageLabels. Using the sample file at:
As I said, elsewhere, a big part of the problem is simply KNOWING
what can be done.
A PDF is actually a container in much the same way that many multimedia
files (e.g., "movies") are containers -- the actual CODEC to be used
isn't always apparent from the file extension.
So, you can store thumbnails, page labels, cross references,
"hidden" text files, multimedia content, etc. in the same document.
It is up to the authoring tool as to which it can process for
entry and the reading tool as to which it can process for "display".
E.g., I have now taken to documenting "things" with PDFs and bundling
everything related to that issue *in* the PDF. Want to know the difference
(sound) between a front vowel and back vowel? Play this audio clip.
Want to see the source code that implements this algorithm?
Extract/detach this ("hidden, embedded") text file.
For example, I have a novelty book entitled "Mouth Sounds":
<
https://www.amazon.com/MouthSounds-Fred-Newman/dp/0894801287>
It includes a (phonograph) "record" with sound samples (which makes
it MUCH easier to understand a sound that is being described in prose).
I've been digitizing my library so, after scanning each of the
pages in the book, I stored the audio from the "record" IN the
PDF file so that it will remain available after I discarded the
physical version of the book/record.
<https://github.com/pdf-association/pdf-differences/blob/main/PageLabels-
UX/PageLabelsTest.pdf>
you can extract the file data with:
dump_data
Reads a single input PDF file and reports its metadata, book-
marks (a/k/a outlines), page metrics (media, rotation and
labels), data embedded by STAMPtk (see STAMPtk's embed
option) and other data to the given output filename or (if no
output is given) to stdout. Non-ASCII characters are encoded
as XML numerical entities. Does not create a new PDF.
pdftk PageLabelsTest.pdf dump_data > orig.info
edit or add page labels in orig.info, save as revised.info,
pdftk PageLabelsTest.pdf update_info revised.info output
RevisedLabelTest.pdf
produced a file showing my revisions.
Adobe's product has a graphical tool to do this. It shows the
"new" document as a markup of the old. It is indispensible
when preparing legal documents as those are often large and
difficult to wade through; you wouldn't want to have to
reread EVERY page *each* time a revision was made (by either
party).
Using that .info file as an example I was able to add page labels to a
file which contained none, and the labels display correctly with Ubuntu
Document Viewer.
Pdftk is also available for Windows. I use it mostly to assemble
documents from CAD system individual page PDF files but it has a lot of
other functionality.
The Adobe tool lets you just select a group of files (in Windows Explorer)
and right-click "combine files in Acrobat..."
Within Acrobat, one can extract, rearrange, rotate, etc. individual
pages (or groups) and see the result (GUI), live. I use this for
processing bank, credit card, investment, etc. statements that often
change page orientation in the middle of a document.
So, I can scan several documents into one PDF. Then, go through and
selectively rotate pages, as needed; delete superfluous pages;
rearrange page order, etc. -- just by looking at the thumbnails of
the pages.
Then, I will go through and selectively DELETE the pages that are
not of interest for a particular PDF (i.e., all of the pages
that are NOT Visa statements) before saving the remainder *AS*
a particular file name (VISA.pdf). repeat the process with the
original "collection PDF" for AMEX, KEOGH, IRA, etc.
As with any tool, a lot depends on how often you need to use it
to offset the cost of learning HOW to use it.