Re: How good is Linux OCR?

Liste des GroupesRevenir à s crypt 
Sujet : Re: How good is Linux OCR?
De : rich (at) *nospam* example.invalid (Rich)
Groupes : sci.crypt
Date : 09. Jun 2025, 00:03:14
Autres entêtes
Organisation : A noiseless patient Spider
Message-ID : <10254ri$38nl$1@dont-email.me>
References : 1 2 3
User-Agent : tin/2.6.1-20211226 ("Convalmore") (Linux/5.15.139 (x86_64))
Stefan Claas <stefan@mailchuck.com> wrote:
Rich wrote:
 
Note that Tesseract will (I think) compile for windows too, so if you
wanted to know "how well tesseract worked" you could just install the
windows version and see for yourself.
 
I tried tesseract under Linux. It is horrible, because of to many errors.

Fair enough.  The windows version will do the same.

Two other options I'm aware of for Linux:

http://slackbuilds.org/repository/15.0/office/gocr/

http://slackbuilds.org/repository/15.0/libraries/cuneiform/

I have never used either, so I can't comment on how well the work.

Your original image, however, is one that will be hard to OCR, so it is
quite amazing that whatever OCR engine MS supplies is actually able to
convert it with some accuracy.

If where you are going is storing binary data (keys/messages) as these
text strings, then you also want to consider the fact that many OCR
engines often confuse similar letters.  I've seen 5 (five) become S
(letter ess) or 1 (one) become I (letter eye).  I'm not sure I've seen
I become 1, but it is possible, esp. with a font with little to no
difference between those glyphs.

O (letter oh) and 0 (numeral zero) are often confused for each other as
well.

So you might want to restrict your character set to not include the
"easy to confuse" letter pairs.  If they don't exist on the "printouts"
then they can't be confused for each other.

As an alternate, there is also the "OCR-A"
(https://en.wikipedia.org/wiki/OCR-A) and "OCR-B"
(https://en.wikipedia.org/wiki/OCR-B) fonts which was designed for
early OCR engines to be easy to read.  Either might also still be
"easier to read" even though OCR engines have progressed since those
fonts were created.



Date Sujet#  Auteur
8 Jun09:47 * How good is Linux OCR?8Stefan Claas
8 Jun09:48 +* Re: How good is Linux OCR?3Stefan Claas
8 Jun14:40 i`* Re: How good is Linux OCR?2Stefan Claas
8 Jun14:46 i `- Re: How good is Linux OCR?1Stefan Claas
8 Jun22:23 `* Re: How good is Linux OCR?4Rich
8 Jun22:32  `* Re: How good is Linux OCR?3Stefan Claas
9 Jun00:03   `* Re: How good is Linux OCR?2Rich
9 Jun08:18    `- Re: How good is Linux OCR?1Stefan Claas

Haut de la page

Les messages affichés proviennent d'usenet.

NewsPortal