Comparing Large files

This topic contains 4 replies, has 3 voices, and was last updated by  Hil 1 month ago.

  • Author
    Posts
  • #89531

    Hil
    Participant

    Hi,
    I am able to compare large files, But i only need the new lines in the CurrentDwnld.txt file.
    (Meaning If there is an extra line in EarlierDwnld.txt , I do not want it in the final output)
    Also I want to avoid lopping through each line in the file as it is huge.
    Any thoughts would be really appreciated.

    $File_Path = "C:\temp\"
    $File_CurrentDwnld = $File_Path  + "File_CurrentDwnld.txt"
    $File_EarlierDwnld = $File_Path  + "File_EarlierDwnld.txt"
    $Compare_Download = compare-object (get-content $File_CurrentDwnld) (get-content $File_EarlierDwnld)
    $Compare_Download = $Compare_Download.InputObject
    
    $File_Difference = $File_Path  + "File_Difference.txt"
    $Compare_Download > $File_Difference
    
  • #89534

    Don Jones
    Keymaster

    Doing a get-content on a huge file is going to consume a lot of memory, keep in mind. It might be worth putting up with the slower speed.

    That said, with large files, this is a case where I'd turn to an outside utility, not use Compare-Object. There are better, and far faster, text file comparison tools that are written in C++ and offer a lot more flexibility.

    • #89540

      Hil
      Participant

      Don, Appreciate your Input, although the script above is working fine (even for the large files I am using).
      All I need is the new lines on the output file. Please see earlier post for details.

  • #89561

    Sam Boutros
    Participant
    # Make test files
    Remove-Item '.\test1.txt','.\test2.txt' -Force -EA 0 
    $FilePath1 = '.\test1.txt'
    $FilePath2 = '.\test2.txt'
    1..1000 | % { "Exhale completely through your mouth, making a whoosh sound $_" | Out-File $FilePath1 -Append } 
    5..1005 | % { "Exhale completely through your mouth, making a whoosh sound $_" | Out-File $FilePath2 -Append } 
    # Note that this is the slowest part of the process by far - using DotNet
    
    # Read File 1 using COM Object which is faster than DotNet
    $fso = New-Object -ComObject 'Scripting.FileSystemObject'
    $FileObj1 = $fso.OpenTextFile($((Get-Item $FilePath1).FullName),1)
    $File1Lines = while (! $FileObj1.AtEndOfStream ) { $FileObj1.ReadLine() }
    
    # Read each line of file 1 and compare to file 2 lines, recording lines that do NOT match
    $FileObj2 = $fso.OpenTextFile($((Get-Item $FilePath2).FullName),1)
    $LinesIn2ButNotIn1 = while (! $FileObj2.AtEndOfStream ) { 
        $Line = $FileObj2.ReadLine()
        if ($Line -notin $File1Lines ) { $Line } 
    }
    $LinesIn2ButNotIn1
    
    # To get lines in 1 but not in 2, you do the opposite:
    $FileObj2 = $fso.OpenTextFile($((Get-Item $FilePath2).FullName),1)
    $File2Lines = while (! $FileObj2.AtEndOfStream ) { $FileObj2.ReadLine() }
    $FileObj1 = $fso.OpenTextFile($((Get-Item $FilePath1).FullName),1)
    $LinesIn1ButNotIn2 = while (! $FileObj1.AtEndOfStream ) { 
        $Line = $FileObj1.ReadLine()
        if ($Line -notin $File2Lines ) { $Line } 
    }
    $LinesIn1ButNotIn2
    
    $FileObj1.Close()
    $FileObj2.Close()
    
    • #89564

      Hil
      Participant

      Thanks Sam.
      Really appreciate the script

You must be logged in to reply to this topic.