More a note to myself than a blog post, but others using desktop Linux might find it useful.
So you scan a book — say, an antiquarian piece of obscure liturgy — but the flatbed glass is much larger than the book, so you get a big black box where the book ends. That’s a problem for two reasons: should I ever want to print out a copy of the PDF, the text will be small and the black box will use up tons of toner. Got to crop the text to the size of the book.
I’ve used command-line tools, but I’ve misplaced the recipe — should I find it or rediscover it, I’ll put it right here.
But a graphical interface can be good too. So I used PDF-Shuffler. It’s in the repositories/software center if you use Ubuntu Linux.
After I loaded the PDF I wanted to trip, I right clicked over one page — they all needed the same trim — and with trial-and-error decided what the right amount should be: it’s a percentage removed of the original document. Then I moused over the other pages, selecting them, and made the same crop. Don’t worry about double cropping a page; it only crops a percentage of the original size. Then — and this is not obvious — I exported the newly cropped doc with a new name.
Easy peasy.


Hey, sweetie. π What OCR software do you use? I’d be interested in hearing how much clean up you have to do.
In short, I don’t OCR. But I’ve heard good things about the tesseract library. It’s been around since the 1980s and went dormant. Then Google started pumping money into its development, so I gather. http://en.wikipedia.org/wiki/Tesseract_(software) for a list of GUIs.
*nods* I had heard of Tesseract, but when I looked into it, the installation seemed a bit hinky. S’all right, though. It’s more of a curiosity than a need. π