Looping thru tif images using OCR

This topic contains 2 replies, has 3 voices, and was last updated by Profile photo of Tim Pringle Tim Pringle 2 years, 1 month ago.

  • Author
    Posts
  • #19469
    Profile photo of Alex Benitez
    Alex Benitez
    Participant

    I have this code below in powershell where I OCR a tif image and save it to a table in sql server. OCR is only working on the first page and not looping thru the tif image pages. Can someone help me with the code to make it loop on my tif image?

    #Functions

    #OCR Function

    #param – imagepath(path the image to ocr)

    Function OCR($imagepath) {

    #create a new modi object

    $modidoc = new-object -comobject modi.document

    $modidoc.create($imagepath)

    try{

    #call the ocr method

    $modidoc.ocr()

    #single page document so I only need the item(0).layout text

    $modidoc.images.item(0).layout.text

    }

    catch{

    #catch the error and go on

    return "Error"

    }

    Finally

    {

    #clean up the object

    $modidoc = ""

    }

    } # end OCR function

    #Function to update the fulltext field in imageskeyvalues table

    #param ID(ID of table)

    function SaveImagesKeyValues($ID, $fulltext)

    {

    $cmd = new-object system.data.sqlclient.sqlcommand

    $cmd.connection = $sqlconnection

    $s = "update dbo.Images_Local set fulltext = '" + ($fulltext.tostring()).replace("",") + "' where ID = " + $ID.tostring()

    $cmd.commandtext = $s

    $a = $cmd.executenonquery()

    } #end SaveImagesKeyValues

    #Function to get the list of records to OCR

    function GetImagesKeyValues()

    {

    $sqlda = new-object system.data.sqlclient.sqldataadapter

    $datatable = new-object system.data.dataset

    $sqlcommandselect = new-object system.data.sqlclient.sqlcommand

    $sqlcommandselect.commandtext =

    "select ID, FullImagePath, fulltext from images_Local

    where (fulltext is null or fulltext like 'error%') order by ID "

    #and (fulltext is null or fulltext like 'error%')

    #"select b.batch_number, b.sequence_number,a.claimnumber, a.potentialamount,a.buyer, b.[document type],b.fulltext, c.imagelastmodified, c.image_path

    #from cip.dbo.cip_mastertable a

    #inner join imageskeyvalues b on a.claimnumber = substring(b.[claim number],2,6)

    #inner join images c on b.batch_number = c.batch_number and b.sequence_number = c.sequence_number

    #where (a.buyer = '006' or (a.buyer = '007' and a.potentialamount > 500000))

    #and c.imagelastmodified >= '1/1/13'

    #and b.[document type] = '2'

    #order by c.imagelastmodified"

    $sqlcommandselect.connection = $sqlconnection

    $sqlda.selectcommand = $sqlcommandselect

    #Fill the datatable and store the output in variable otherwise it shows in the output.

    $trap = $sqlda.fill($datatable)

    $datatable.tables[0]

    }

    #end GetImagesKeyValues

    #End Functions

    #Main

    clear

    #set the parent path to the working directory

    $parentpath = "C:\Data\Portugal PRG\Images\Contratos SONAE 08-11\Imdex\"

    #Create new sql connection

    $sqlconnection = new-object system.data.sqlclient.sqlconnection

    #Assign the connectionstring

    $sqlconnection.connectionstring = "Server=ATL01L20969\SQLEXPRESS;Database=Sonae;integrated security=True"

    #Open the connection

    $sqlconnection.open()

    #get the list of records that need ocr'd

    $imageskeyvalues = getimageskeyvalues

    #iterate through the list

    foreach ($t in $imageskeyvalues){

    #$completepath = $parentpath + $t.image_path

    $completepath = $t.FullImagePath

    #call the ocr function and put the results in the fulltext property

    $t.fulltext = OCR $completepath

    #give some bread crumbs to monitor the script

    write-host "Saving " $t.ID

    #update the database fulltext filed

    Saveimageskeyvalues $t.ID $t.fulltext

    }#end Main

  • #19502
    Profile photo of Dave Wyatt
    Dave Wyatt
    Moderator

    I didn't have much luck finding documentation on that modi.document COM class; it apparently doesn't ship with Office anymore, either.

  • #19618
    Profile photo of Tim Pringle
    Tim Pringle
    Participant

    Does your company use a recent version of MS Office? OneNote may be an option. It's also able to perform OCR and has a much more documented API with lots of examples across the community.

    http://dev.onenote.com/

You must be logged in to reply to this topic.