Installation
sudo apt-get install tesseract-ocr
Install imagemagick to convert PDF to TIFF
sudo apt-get install imagemagick
Install poppler-utils (pdfinfo) to check number of pages of PDF
sudo apt-get update
sudo apt-get install poppler-utils
Install Other Languages
Download from https://code.google.com/p/tesseract-ocr/downloads/list
Put it in /usr/share/tesseract-ocr/tessdata
Shell Script to OCR PDF
#!/bin/sh
STARTPAGE=1 # set to pagenumber of the first page of PDF you wish to convert
RESOLUTION=600 # set to the resolution the scanner used (the higher, the better)
dumphelp(){
echo "sudo $0
No comments:
Post a Comment