Author Posts

October 7, 2014 at 5:25 am

I have this code below in powershell where I OCR a tif image and save it to a table in sql server. OCR is only working on the first page and not looping thru the tif image pages. Can someone help me with the code to make it loop on my tif image?

#Functions

#OCR Function

#param – imagepath(path the image to ocr)

Function OCR($imagepath) {

#create a new modi object

$modidoc = new-object -comobject modi.document

$modidoc.create($imagepath)

try{

#call the ocr method

$modidoc.ocr()

#single page document so I only need the item(0).layout text

$modidoc.images.item(0).layout.text

}

catch{

#catch the error and go on

return "Error"

}

Finally

{

#clean up the object

$modidoc = ""

}

} # end OCR function

#Function to update the fulltext field in imageskeyvalues table

#param ID(ID of table)

function SaveImagesKeyValues($ID, $fulltext)

{

$cmd = new-object system.data.sqlclient.sqlcommand

$cmd.connection = $sqlconnection

$s = "update dbo.Images_Local set fulltext = '" + ($fulltext.tostring()).replace("",") + "' where ID = " + $ID.tostring()

$cmd.commandtext = $s

$a = $cmd.executenonquery()

} #end SaveImagesKeyValues

#Function to get the list of records to OCR

function GetImagesKeyValues()

{

$sqlda = new-object system.data.sqlclient.sqldataadapter

$datatable = new-object system.data.dataset

$sqlcommandselect = new-object system.data.sqlclient.sqlcommand

$sqlcommandselect.commandtext =

"select ID, FullImagePath, fulltext from images_Local

where (fulltext is null or fulltext like 'error%') order by ID "

#and (fulltext is null or fulltext like 'error%')

#"select b.batch_number, b.sequence_number,a.claimnumber, a.potentialamount,a.buyer, b.[document type],b.fulltext, c.imagelastmodified, c.image_path

#from cip.dbo.cip_mastertable a

#inner join imageskeyvalues b on a.claimnumber = substring(b.[claim number],2,6)

#inner join images c on b.batch_number = c.batch_number and b.sequence_number = c.sequence_number

#where (a.buyer = '006' or (a.buyer = '007' and a.potentialamount > 500000))

#and c.imagelastmodified >= '1/1/13'

#and b.[document type] = '2'

#order by c.imagelastmodified"

$sqlcommandselect.connection = $sqlconnection

$sqlda.selectcommand = $sqlcommandselect

#Fill the datatable and store the output in variable otherwise it shows in the output.

$trap = $sqlda.fill($datatable)

$datatable.tables[0]

}

#end GetImagesKeyValues

#End Functions

#Main

clear

#set the parent path to the working directory

$parentpath = "C:\Data\Portugal PRG\Images\Contratos SONAE 08-11\Imdex\"

#Create new sql connection

$sqlconnection = new-object system.data.sqlclient.sqlconnection

#Assign the connectionstring

$sqlconnection.connectionstring = "Server=ATL01L20969\SQLEXPRESS;Database=Sonae;integrated security=True"

#Open the connection

$sqlconnection.open()

#get the list of records that need ocr'd

$imageskeyvalues = getimageskeyvalues

#iterate through the list

foreach ($t in $imageskeyvalues){

#$completepath = $parentpath + $t.image_path

$completepath = $t.FullImagePath

#call the ocr function and put the results in the fulltext property

$t.fulltext = OCR $completepath

#give some bread crumbs to monitor the script

write-host "Saving " $t.ID

#update the database fulltext filed

Saveimageskeyvalues $t.ID $t.fulltext

}#end Main

October 7, 2014 at 6:01 pm

I didn't have much luck finding documentation on that modi.document COM class; it apparently doesn't ship with Office anymore, either.

October 10, 2014 at 12:07 pm

Does your company use a recent version of MS Office? OneNote may be an option. It's also able to perform OCR and has a much more documented API with lots of examples across the community.

http://dev.onenote.com/