Fixing OCR support in gscan2pdf on Ubuntu 14.04 & derivatives

Edit: Jeffrey Ratcliffe, the very active developer of gscan2pdf, has released an update that fixes this bug. Ubuntu users can access it his PPA (see below).

In this post the other day I talked about my relatively painless experience upgrading to Xubuntu 14.04. Since then, I have discovered a couple of bugs in some OCR software I use fairly regularly.

Here is a solution to a slightly annoying regression in gscan2pdf, an otherwise great little PDF scanning, clean-up and OCR solution.

In Ubuntu 14.04 gscan2pdf has a bug in it’s tesseract OCR support meaning it appears to OCR the document but once completed no text is added to the OCR layer. Although the bug does not affect the gocr OCR engine, tesseract (which was developed by Google HP Labs) is a much better engine and the one I prefer to use.

My first attempt at rectifying the problem was to upgrade gscan2pdf to the latest version (from 1.2.3-1 to 1.2.4) which doesn’t seem to have made it into the Ubuntu 14.04 repos, a shame considering Trusty is an LTS release. On the upside Jeffrey Ratcliffe, gscan2pdf’s developer, has a PPA that contains the latest version, so upgrading was relatively painless. The process is well documented here on the RCLUBLINUX blog.

Unfortunately, the bug is not fixed in gscan2pdf 1.2.4 so the upgrade didn’t fix my problem.

A little poking about on the gscan2pdf Sourceforge page however, showed this bug report, and also patch to fix the problem contributed by user tzieg (Thomas Zieg?).

After applying the patch and firing up gscan2pdf I was glad to see tesseract again worked as expected, thanks Thomas!

Problem: After upgrading to Xubuntu 14.04 the tesseract OCR engine no longer worked in gscan2pdf.

Solution: Patch gscan2pdf using the patch supplied by Thomas Zeig.

Procedure: Download a copy of the patch from gscan2pdf’s Sourceforge bugtracker.

Copy the patch to the gscan2pdf directory.

sudo cp Tesseract.pm.patch /usr/share/perl5/Gscan2pdf/

Change to the gscan2pdf directory.

cd /usr/share/perl5/Gscan2pdf/

Apply the patch,

sudo patch -p0 < Tesseract.pm.patch

OCR with tesseract should now work as expected, easy.

 

Right, now to figure out why OCRFeeder crashes when exporting to PDF.

Xubuntu 14.04 – Notification Area Missing Icons

Yesterday I bit the bullet and upgraded my fairly stable Xubuntu install from 13.10 Saucy Salamander to 14.04 Trusty Tahr.

I had no pressing need to upgrade (aside from an the occasional reminder when I logged in that a new release was available) but since Trusty had been out for a few weeks I figured any show stopping bugs would be ironed out by now.

First, I have to comment on how painless the upgrade procedure has become, a couple of clicks and it was away. After about an hour or so spent downloading and installing updates, a reboot and a slightly extended initial login, everything seemed to be right where I left it. No longer are we faced fixing a bunch of small things that go awry during the upgrade process.

I did, however, find one minor annoyance. No longer did all my running apps (the ones that I want to anyway) show up in the notification area I have in the top left of my screen.

Notification area Missing Icons

Missing Icons

Conspicuously missing were Network Manager, Dropbox, Spideroak, KeePass and perhaps a few more, leaving me with just the volume control and power indicator icons showing. This was true even though each of my apps appeared to be running after being correctly started at login.

Indicator Plugin

Indicator Plugin

After a bit of poking around in the XFCE panel preferences I found that replacing the Notification Area applet with the Indicator Plugin applet all my application icons were restored.

This, however, left me with another dilemma, as Indicator Plugin also includes a bunch of icons for mail, bluetooth and keyboard that, although I could hide, I couldn’t easily remove. What I really wanted was for Notification Area to work the way it did before the upgrade.

Notification Area with Icons

Notification Area

After further investigation and a little google-fu, I found that by killing indicator-application-service my icons would reappear. A quick delve into ‘Sessions and Startup’ settings found in XFCE’s Settings Manager found this service (listed as Indicator Application) was started on login and by unticking the box next to it I could tell it not to start. Problem solved. Now my notification area looks the way I like it with grey and black icons showing and the more and out of place looking coloured icons nicely hidden away.

Session and Startup Properties

Session and Startup Properties

Problem: After upgrading to Xubuntu 14.04 some application icons no longer show in the notification area.

Solution: Stop indicator-application-service from starting at login.

Procedure:

  • Open XFCE Settings Manager and navigate to Session and Startup preferences.
  • Click on the Application Autostart Tab and scroll down to Indicator Application
  • Untick the tickbox.
  • Click close, log out and log back in again.

 

Why Tim Wilson is wrong about “n______”

pseudomorph:

This article on the referential nature of language and our ‘Freedom Commissioner’s’ failure to understand it is excellent. I highly recommend you all read it.

Originally posted on Castan Centre for Human Rights Law:

By  Patrick Emerton

A little over a week ago, Human Rights Commissioner Tim Wilson stated that he objects to current laws governing racially offensive behaviour because they allow members of particular communities to refer to one another using words that outsiders may not:

Asked whether he was referring to the word “n–––“, Mr Wilson said: “I won’t say it, but that’s right.”

Wilson then argued that repealing the relevant legislation – section 18C of the Racial Discrimination Act 1975 (Cth) – would restore “equality” to Australia’s discrimination laws.

This objection is radically mistaken. It rests upon a confusion about the nature of language, which on this occasion feeds a misguided political agenda.

Philosophers and cultural theorists have written a lot about the nature of language, expressing different views and coming from different perspectives. Racial, racialised and racist language is a particularly contentious matter. This blog adopts the approach of Hilary…

View original 1,437 more words

Annotating PDF with Okular

pseudomorph:

Native PDF annotating under linux has long been a bugbear of mine and something I’d almost given up hope ever being properly supported. Until, that is, I stumbled across this post describing the process in new versions of Okular.

Discovering this also led me to look deeper, and to discover that Evince also supports PDF annotations, and has done for quite some time! See this post for more information on Evince.

With luck, we’ll soon see the ability to simply add annotations and save, rather than requiring saving annotated PDFs as new documents in order for changes to remain.

Originally posted on groak@{subjects of research}:

Once in a while I am looking around if there is finally a way to properly annotate PDF in Linux. The answer was no until a couple of months ago. But I think it is still little known.

Even in this post, whose comments made me have a close look again, did see the option of embedding annotations into PDF. The comments, however, point to Okular which is a very good reader since quite some time, and a more or less recent version of poppler the PDF library.

The way to go is to make annotations with Okular (use the review tool (F6)) and then save the PDF with “save as”. Now the annotations are embedded into the pdf file. I tested the annotations with the Adobe Android reader and I can view them and alter them with it.

Unfortunately this information is hidden in the Okular handbook and…

View original 81 more words

Recommended Reading: My Country is a Horror Show

Today’s recommend reading comes from David Simon, the creator of one of my favourite TV series, The Wire. Published by the Guardian, it is an edited version of an impromptu speech given at the Festival of Dangerous Ideas in Sydney. The subject of the speech is the role unrestrained capitalism has played in creating the widening divide between the class of Americans who feel at their society is willing and able to meet their needs, and those who do not. Capitalism, Simon contends, has lost social compact and we are all the poorer because of it.

My Country is a Horror Show is a speech that stands as a stark warning, not only to America but to all the west and beyond.

Do take the time to read it.

The Great Asylum Silence

pseudomorph:

Me, writing for the AusOpinion blog on Immigration Minister Scott Morrison’s Operation Sovereign Silence.

Originally posted on AusOpinion:

If a refugee drowns in the ocean trying to reach Australia and the Government decides not to tell us, will their soul still haunt a Senator’s dreams?

Image

On Monday, Scott Morrison, the new Minister for Immigration and Border Protection, held the first weekly update of the Coalition Government’s laughably named ‘Operation Sovereign Borders’.

The press conference, for that’s was it was, was fronted by Morrison and his newly minted ‘three star’ Lieutenant General Angus Campbell, a man whose job I do not envy for a second.

Fronting the cameras, Morrison was keen to continue the Coalition’s pre-election lines. The issue was one of border ‘security’ and ‘protection’. People arriving by boat remained ‘illegal’ and the Government’s resolve to stop the boats was “genuine” and “absolute”.

More important, however, was Morrison’s focus on ‘operational matters’. Not only did he confirm the Government’s decision to provide a single weekly update confirming the…

View original 652 more words

Comments on everything: The incivility of internet communication

I wanted to write something that was triggered by a brief Google+ conversation and an even more brief Twitter conversation that I’ve recently had. What follows is my unrefined thoughts on the topic of the incivility of internet communications.

Do tell me – civilly because I do, and will, moderate your comments – what you think below.

Nasty internet comments and conversation is something I’ve been thinking about for a while and I’d like to try to get some of my thoughts in writing. I guess here is as good a place as any to do that. I apologise in advance if this is a bit long or sounds a bit ranty.

In response to a post I made on Google+, Lars wrote in response:

I think you are on the point, the main problem is the sense of anonymity you have online, which makes it easier to ignore the social norms.
I fear this will probably never be solved as it’s almost impossible to, online, recreate the “being watched” feeling that usually keeps people in line with the social norms.
The only internal guides left are empathy and the fundamental respect for others which, sadly, a lot of people, especially the young, seem to be lacking.

I don’t think I’m as pessimistic about the future of online relationships as is Lars, though there is plenty of reason to be concerned.

Let’s look at the concern briefly. There is much evidence to show that online communication, by and large, has lost much of civility that we expect in face to face communication. All one need do is to read the comments on just about any online article, especially in these days of hyper partisanship, those to do with politics.

A recent example from my country was some of the terrible slander our (recently deposed) prime minister was subject to – being our first female PM, much of it was very very sexist and absolutely not anything anyone would dare say in public. Anne Summers has a good run down on it here: http://annesummers.com.au/speeches/her-rights-at-work-r-rated/

In our own community, all we need do is look at some of the fanboyism that sparked this very conversation which all too easily turns from a difference of opinion to outright attack.

More worrying (to me at least), is that that many people seem happy to put their real names and images to these awful comments. A browse of some Facebook hate pages that spring up periodically is all one needs to see truly awful comments alongside people’s names and photos. To me, this says the problem runs deeper than perceived online anonymity.

I would, however, like to consider the issue from a different perspective.

I mentioned in my last post that online communication is still relatively new. And, despite the internet being round for some time now, I actually believe this to be the case when we hold it up against other forms of communication.

Take me for instance, I’ve been chatting online since the early(ish) days of the internet in the mid 1990s, and had been chatting on local BBS systems for a number of years before that. Compared to some of these young whippersnappers (get off my lawn) I could almost be considered an old hand.

Except I don’t represent a generation, or perhaps even half a generation.

When we talk about kids these days having no respect for social norms, what we’re taking about is a generation that is, by and large, finding their way in a communication medium that their parents haven’t even experienced.

Stay with me now, I know this is long but there is a payoff at the end. I promise.

My point is, despite its ubiquitous nature, the internet remains a new frontier for communication.

In this world, many of the social pressures we use to enforce norms of polite communication don’t exist, or don’t seem to exist, and people feel free to flout them.

What I do not think, is that this is a cause for too much alarm. After all, theories of social decline have been with us for generations, and we are yet to completely implode as a species.

What I want to propose it that before online communication becomes both normalised, and beholden to strict social norms, it will take at least a generation and a half, probably two.

In my view, what it will take for new social norms around internet communication to take hold is for the current generation (those who feel free to make nasty comments of any age) to begin to feel the ramification of such actions.

For people to lose their jobs and their livelihoods because they thought they were anonymous; for teenagers to find that they cannot simply delete their comments and be done with it, and for people who feel free to make hurtful comments to feel what it is like to be on the receiving end.

This generation will then be equipped to ensure those mistakes are not repeated, creating in the process a new social pressure to ensure peaceful communication.

And here’s the payoff.

There is hope, we are all humans and deep down, we all want the same thing. Getting there just may take a little more time than we’d like.