Looping thru tif images using OCR

Welcome Forums General PowerShell Q&A Looping thru tif images using OCR

This topic contains 2 replies, has 3 voices, and was last updated by

 
Participant
4 years, 2 months ago.

  • Author
    Posts
  • #19469

    Participant
    Points: 0
    Rank: Member

    I have this code below in powershell where I OCR a tif image and save it to a table in sql server. OCR is only working on the first page and not looping thru the tif image pages. Can someone help me with the code to make it loop on my tif image?

    #Functions

    #OCR Function

    #param – imagepath(path the image to ocr)

    Function OCR($imagepath) {

    #create a new modi object

    $modidoc = new-object -comobject modi.document

    $modidoc.create($imagepath)

    try{

    #call the ocr method

    $modidoc.ocr()

    #single page document so I only need the item(0).layout text

    $modidoc.images.item(0).layout.text

    }

    catch{

    #catch the error and go on

    return "Error"

    }

    Finally

    {

    #clean up the object

    $modidoc = ""

    }

    } # end OCR function

    #Function to update the fulltext field in imageskeyvalues table

    #param ID(ID of table)

    function SaveImagesKeyValues($ID, $fulltext)

    {

    $cmd = new-object system.data.sqlclient.sqlcommand

    $cmd.connection = $sqlconnection

    $s = "update dbo.Images_Local set fulltext = '" + ($fulltext.tostring()).replace("",") + "' where ID = " + $ID.tostring()

    $cmd.commandtext = $s

    $a = $cmd.executenonquery()

    } #end SaveImagesKeyValues

    #Function to get the list of records to OCR

    function GetImagesKeyValues()

    {

    $sqlda = new-object system.data.sqlclient.sqldataadapter

    $datatable = new-object system.data.dataset

    $sqlcommandselect = new-object system.data.sqlclient.sqlcommand

    $sqlcommandselect.commandtext =

    "select ID, FullImagePath, fulltext from images_Local

    where (fulltext is null or fulltext like 'error%') order by ID "

    #and (fulltext is null or fulltext like 'error%')

    #"select b.batch_number, b.sequence_number,a.claimnumber, a.potentialamount,a.buyer, b.[document type],b.fulltext, c.imagelastmodified, c.image_path

    #from cip.dbo.cip_mastertable a

    #inner join imageskeyvalues b on a.claimnumber = substring(b.[claim number],2,6)

    #inner join images c on b.batch_number = c.batch_number and b.sequence_number = c.sequence_number

    #where (a.buyer = '006' or (a.buyer = '007' and a.potentialamount > 500000))

    #and c.imagelastmodified >= '1/1/13'

    #and b.[document type] = '2'

    #order by c.imagelastmodified"

    $sqlcommandselect.connection = $sqlconnection

    $sqlda.selectcommand = $sqlcommandselect

    #Fill the datatable and store the output in variable otherwise it shows in the output.

    $trap = $sqlda.fill($datatable)

    $datatable.tables[0]

    }

    #end GetImagesKeyValues

    #End Functions

    #Main

    clear

    #set the parent path to the working directory

    $parentpath = "C:\Data\Portugal PRG\Images\Contratos SONAE 08-11\Imdex\"

    #Create new sql connection

    $sqlconnection = new-object system.data.sqlclient.sqlconnection

    #Assign the connectionstring

    $sqlconnection.connectionstring = "Server=ATL01L20969\SQLEXPRESS;Database=Sonae;integrated security=True"

    #Open the connection

    $sqlconnection.open()

    #get the list of records that need ocr'd

    $imageskeyvalues = getimageskeyvalues

    #iterate through the list

    foreach ($t in $imageskeyvalues){

    #$completepath = $parentpath + $t.image_path

    $completepath = $t.FullImagePath

    #call the ocr function and put the results in the fulltext property

    $t.fulltext = OCR $completepath

    #give some bread crumbs to monitor the script

    write-host "Saving " $t.ID

    #update the database fulltext filed

    Saveimageskeyvalues $t.ID $t.fulltext

    }#end Main

  • #19502

    Member
    Points: 0
    Rank: Member

    I didn't have much luck finding documentation on that modi.document COM class; it apparently doesn't ship with Office anymore, either.

  • #19618

    Participant
    Points: 60
    Rank: Member

    Does your company use a recent version of MS Office? OneNote may be an option. It's also able to perform OCR and has a much more documented API with lots of examples across the community.

    http://dev.onenote.com/

The topic ‘Looping thru tif images using OCR’ is closed to new replies.