[Apologies if duplicate; apparently some issues with ES?]
On 12/24/2024 6:51 AM, legg wrote:
The thing I'm working on now is 422 pages with a 5000 line index.
I'm not going through THAT, manually inserting links.
I can (have) "fix" the page numbers that appear in teh PDF viewer
(at least Adobe's products; I don't use any of the "knock off" tools)
But, adding hyperlinks is something that you would either have
to do manually or write a script (and hope for the best).
When I build a document with an index (or other cross references),
I place markers in the text and the DTP tool uses these to determine
the final "rendering place" for the reference to place in the
index, etc.
It similarly builds the hyperlinks for the PDF.
Expecting a tool to do this /ex post factum/ (e.g., to a scanned/OCRed
image) might be wishful thinking.
[When I scan documents, I don't even bother with the OCR as my goal
is simply to replace the paper document with an electronic version
having all the capabilities (content) and _limitations_ (e.g., no
automated search) of the original. This makes it a LOT easier to
process paper! (I've scanned over 100,000 sheets -- both sides -- of
such documents in the past year)]
A preliminary draft ( there are already 40 pages updated for
the next draft) is available for comment and criticism.
At 100Meg ( but getting smaller) it may be a slow download.
Few will find the actual content of interest.
http://ve3ute.ca/prairie_gold/sample_241220.zip
I made the changes that I mentioned, upthread, and posted a revised
copy at:
<
https://mega.nz/file/I75REDIA#RFxRfVNXa_jhw3jcyv2_UeTGJ1iRwlUDC4hHk6x15bc>
(ignore the viewer and click on the DOWNLOAD button on the lower right)
I've appended 3 screenshots to the document (you should remove them;
I just used the PDF as a convenient "package" to transport them)
showing how three "significant" pages appear in Acrobat: note the
front cover appears as having a "blank" page LABEL -- yet appears
described as "(1 of...)" while page "I" shows the label "I" with
"(9 of ...)" and page "1" appears with the label "1" and "(15 of ...)"
[Note that "..." is 3 pages longer than your original for the reason
mentioned above]
I'd be more than happy to do this again -- for revisions of this
document or others.
[Perhaps the only effort more tedious than scanning documents is
scanning *film*! I applaud your effort.]
Keep in mind that this is a replica - all original spelling,
formatting and layout (including original errors) is intentionally
preserved. Unless you've got an original hard copy, it may be
hard to tell what's wrong - but overlapping images/text,
or ocr source errors (rn / m , 1/l/I CG e/a) still abound.
OCR on this was so bad that the output was often useless as
a textual contribution, but could sometimes be edited and
reformatted manually. PDF-OCR scanned page file size was
also 3 to 30 x larger than a pdf published from a text-corrected
doc.
As I mentioned, I just store high resolution TIFFs with the thought
that SOMEONE, SOMEDAY may opt to OCR the "imaged pages". I am
content to flip through virtual pages as I would a paper book...
especially if I don't have to make space for those books!
[When I moved here, I had *80* 10-ream "photocopier paper cartons"
of paperbacks. Plus all my textbooks, references, etc. I am now
down to less than one carton of paperbacks (the titles that I
simply insist on holding in my hands while reading) and a dozen
cartons of text books. All the technical papers, standards,
legal documents, etc. have all found their way onto a pair of 2.5"
disk drives]
I get pdfs either from a scan or through printing utilities.
No intention of giving Adobe money for a format that was developed
for common usage.
>
I buy tools that solve problems. I don't really care who gets
paid for them as long as there is a net value added in the
products that I produce with them. Adobe did, after all, create
PostScript.
>
I think you'll find that you are misled in Adobe's original
source for the pdf and eps files' original development.
I said nothing of PDF or EPS origins. Rather, that "Adobe created
PostScript" -- of which EPS and PDF (as well as DPS) are offshoots.
PostScript has been openly documented for decades. And, folks have
been able to create PostScript *interpreters* -- much like they
could for any programming language. Though Adobe still claims
PS as a registered trademark.
If you were doing DTP in the 80's, you quickly realized that using
any other "printer format" would lead to headaches as a document's
LAYOUT would appear differently based on the printer to which it was
rendered: page breaks would change, column fill/feathering,
hyphenation, etc.
Rendering to PostScript gave you independence from the physical
constraints of the particular printer AND f*ckups in the printer
drivers. Particularly important if you wanted to move from some
SOHO printer to something used at a professional publisher
(e.g., Linotype): "Why do my pages look different than my proofs?"
NeXT used DPS in its rendering engines from the late 80's. Adobe
eventually released a PostScript interpreter to run on workstations
to provide PostScript capabilities to "dumb(er)" printers (like HP's
PCL).
PDF was created as a document interchange format. The specification
was originally "made available" free of charge in the early 90's.
But, still held under Adobe's control for more than a decade. In
about 2010, the standard was released as an OPEN standard and control
of it transfered to a separate body.
[So, I suspect YOU have been misled about it's "original development"]
There has never (from Adobe's point of view) been an Open Source version
of the interpreter/renderer. Many of the knockoff products are still
closed source. And, most (?) fail to implement all of the features
defined in the standard -- assuming, instead, that users will have no
need for them in their VIEWED documents (of course, the AUTHOR of the
document decides what features will be needed by the viewer! :> )
For folks thinking PDFs are just "electronic books", that's likely
an acceptable tradeoff. OTOH, if you want to explain the difference
between the different speaking accents of New Yorkers, Bostonians,
Chicagoans, etc., it's much easier to embed three audio clips
and let the "reader" HEAR them instead of trying to describe them
textually. Similarly, instead of publishing N different views of
a 3D object to give the reader an idea of how it is constructed,
it's much easier to embed a 3D model IN the document and let the
reader explore it in whatever manner HE deems appropriate.
These are things that you can't do with paper books.
They were always intended to be open source.
There is no money in this work; or for anybody involved.
Thought it might take if I processed it through html formatting,
but that didn't seem to 'take' either.
>
I suspect this is a common enough problem that there are other
(non-Adobe) solutions out there. In a pinch (for a "one off"),
you could directly edit the EPS.
>
[I do this in reverse; I use Illustrator to create drawings
and then extract the specific PS commands to paste into other
documents. Saves me the trouble of having to develop a
"drawing application".]
>
RL