Fixing OCR support in gscan2pdf on Ubuntu 14.04 & derivatives

Edit: Jeffrey Ratcliffe, the very active developer of gscan2pdf, has released an update that fixes this bug. Ubuntu users can access it his PPA (see below).

In this post the other day I talked about my relatively painless experience upgrading to Xubuntu 14.04. Since then, I have discovered a couple of bugs in some OCR software I use fairly regularly.

Here is a solution to a slightly annoying regression in gscan2pdf, an otherwise great little PDF scanning, clean-up and OCR solution.

In Ubuntu 14.04 gscan2pdf has a bug in it’s tesseract OCR support meaning it appears to OCR the document but once completed no text is added to the OCR layer. Although the bug does not affect the gocr OCR engine, tesseract (which was developed by Google HP Labs) is a much better engine and the one I prefer to use.

My first attempt at rectifying the problem was to upgrade gscan2pdf to the latest version (from 1.2.3-1 to 1.2.4) which doesn’t seem to have made it into the Ubuntu 14.04 repos, a shame considering Trusty is an LTS release. On the upside Jeffrey Ratcliffe, gscan2pdf’s developer, has a PPA that contains the latest version, so upgrading was relatively painless. The process is well documented here on the RCLUBLINUX blog.

Unfortunately, the bug is not fixed in gscan2pdf 1.2.4 so the upgrade didn’t fix my problem.

A little poking about on the gscan2pdf Sourceforge page however, showed this bug report, and also patch to fix the problem contributed by user tzieg (Thomas Zieg?).

After applying the patch and firing up gscan2pdf I was glad to see tesseract again worked as expected, thanks Thomas!

Problem: After upgrading to Xubuntu 14.04 the tesseract OCR engine no longer worked in gscan2pdf.

Solution: Patch gscan2pdf using the patch supplied by Thomas Zeig.

Procedure: Download a copy of the patch from gscan2pdf’s Sourceforge bugtracker.

Copy the patch to the gscan2pdf directory.

sudo cp /usr/share/perl5/Gscan2pdf/

Change to the gscan2pdf directory.

cd /usr/share/perl5/Gscan2pdf/

Apply the patch,

sudo patch -p0 <

OCR with tesseract should now work as expected, easy.


Right, now to figure out why OCRFeeder crashes when exporting to PDF.


9 thoughts on “Fixing OCR support in gscan2pdf on Ubuntu 14.04 & derivatives

    • Great! Good to know it helped.

      I do, however, recommend you install the developer’s latest version from the PPA linked in my post. PPAs are a great way of installing software that is newer than, or has not been packaged for, your version of Ubuntu.

      Find out more about PPAs here and here

  1. Hey, do you have any idea what might be happening here? When I apply all your comands, I get this response:

    Hunk #1 FAILED at 186.
    1 out of 1 hunk FAILED — saving rejects to file

    when I go to that file, it gives me back this results:

    *** 2014-03-26 15:00:09.000000000 +0100
    — 2014-05-09 15:53:01.407303746 +0200
    *** 186,192 ****

    # Temporary filename for output
    my $suffix =
    ! version->parse(“v$version”) >= version->parse(“v3”) ? ‘.html’ : ‘.txt’;
    my $txt = File::Temp->new( SUFFIX => $suffix );
    ( $name, $path, undef ) = fileparse( $txt, $suffix );

    — 186,192 —-

    # Temporary filename for output
    my $suffix =
    ! version->parse(“v$version”) >= version->parse(“v3”) ? ‘.hocr’ : ‘.txt’;
    my $txt = File::Temp->new( SUFFIX => $suffix );
    ( $name, $path, undef ) = fileparse( $txt, $suffix );

    any idea of what I might be doing wrong?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s