how to split txt files in same size

Welcome Forums Pester how to split txt files in same size

This topic contains 2 replies, has 3 voices, and was last updated by

 
Participant
6 months ago.

  • Author
    Posts
  • #102206

    Participant
    Points: 0
    Rank: Member

    Hi All,

    We have requirement to split 1 gb txt file in multiple

    each files should split to 300 mb
    1. 1st file 300 mb
    2. 2nd file 300 mb
    3. 3rd file 300 mb
    4 4th file 100 mb

    Below is the script we have used to split files into multipe each 300 mb, Script is spliting file , but some files are spliting in 250 mb and some files 300mb

    Can anyone help me with above requirement to split in same size for first 3 files.

    Script

    #split test
    $sw = new-object System.Diagnostics.Stopwatch
    $sw.Start()
    $filename = "E:\So\bkp.txt"
    $rootName = "E:\Ta\"
    $ext = ".txt"

    $linesperFile = 300000#3000k
    $filecount = 1
    $reader = $null
    try{
    $reader = [io.file]::OpenText($filename)
    try{
    "Creating file number $filecount"
    $writer = [io.file]::CreateText("{0}{1}.{2}" -f ($rootName,$filecount.ToString("000"),$ext))
    $filecount++
    $linecount = 0

    while($reader.EndOfStream -ne $true) {
    "Reading $linesperFile"
    while( ($linecount -lt $linesperFile) -and ($reader.EndOfStream -ne $true)){
    $writer.WriteLine($reader.ReadLine());
    $linecount++
    }

    if($reader.EndOfStream -ne $true) {
    "Closing file"
    $writer.Dispose();

    "Creating file number $filecount"
    $writer = [io.file]::CreateText("{0}{1}.{2}" -f ($rootName,$filecount.ToString("000"),$ext))
    $filecount++
    $linecount = 0
    }
    }
    } finally {
    $writer.Dispose();
    }
    } finally {
    $reader.Dispose();
    }
    $sw.Stop();

    Write-Host "Split complete in " $sw.Elapsed.TotalSeconds "seconds"

    thanks in advance

  • #102433

    Participant
    Points: 0
    Rank: Member

    Since you want to split the files on size I believe you should have the code look at the bytes copied rather than the text line count copied. Here is an example, you can save it to a file (eg: Out-FileChunks.ps1) and then do

    . .\Out-FileChunks.ps1 -Path "SourceFile" -OutputPath "C:\SomeOutputFolder" -ChunkSizeBytes 300MB

    [CmdletBinding()]
    PARAM 
    (
        [Parameter(Mandatory=$true, ValueFromPipeline=$true)]
        [string] $Path,
    
        [Parameter(Mandatory=$true)]
        [string] $OutputPath,
    
        [Parameter()]
        [int] $ChunkSizeBytes = 300MB
    )
    
    $bufferSize = 24 * 1024;
    $buffer = [System.Byte[]]::CreateInstance([System.Byte], $bufferSize)
    
    # Create the output folder if it doesn't exist.
    if ( -not (Test-Path $OutputPath))
    {
        $null = New-Item -Path $OutputPath -ItemType Directory -Force
    }
    
    $fileExtension = [System.IO.Path]::GetExtension($Path)
    $fileNameRoot = $Path | Split-Path -Leaf
    $outputFileNameRoot = [System.IO.Path]::GetFileNameWithoutExtension($fileNameRoot)
    
    try
    {
        $inputStream = [System.IO.File]::OpenRead($Path)
        $fileCount = 0;
    
        # Loop through the entire file.
        while ($inputStream.Position -lt $inputStream.Length)
        {
            $outputFileName = [string]::Format("{0}{1}{2}", $outputFileNameRoot, $fileCount.ToString("000"), $fileExtension)
            $outputFilePath = Join-Path $OutputPath $outputFileName
    
            Write-Progress "Writing file chunk $outputFilePath"
            # Create ouptut files up to the splitSize.
            $outputStream = [System.IO.File]::Create($outputFilePath)
                
            $chunkBytesRemaining = $ChunkSizeBytes
    
            while ($chunkBytesRemaining -gt 0)
            {
                $bytesRead = $inputStream.Read($buffer, 0, [System.Math]::Min($chunkBytesRemaining, $bufferSize))
    
                if ( $bytesRead -le 0 )
                {
                    # nothing left to read so done writing all chunks.
                    break;
                }
    
                $outputStream.Write($buffer, 0, $bytesRead);
                $chunkBytesRemaining -= $bytesRead;
            }
                
            $outputStream.Dispose();
            $outputStream = $null;
    
            Write-Progress "Completed writing file chunk $outputFilePath"
            $fileCount++;
        }
    }
    finally
    {
        if ( $inputStream -ne $null )
        {
            $inputStream.Dispose();
            $inputStream = $null;
        }
    
        if ( $outputStream -ne $null )
        {
            $outputStream.Dispose();
            $outputStream = $null;
        }
    }
    
  • #102640

    Participant
    Points: 0
    Rank: Member

    You can split text files into smaller multiple text file using command line:
    split -l 5000 -d –additional-suffix=.txt $FileName file
    -l 5000: split file into files of 5,000 lines each.
    -d: numerical suffix. This will make the suffix go from 00 to 99 by default instead of aa to zz.
    –additional-suffix: lets you specify the suffix, here the extension
    $FileName: name of the file to be split.
    file: prefix to add to the resulting files.
    As always, check out man split for more details.

    For Mac, the default version of split is apparently dumbed down. You can install the GNU version using the following command.
    brew install coreutils
    and then you can run the above command by replacing split with gsplit. Check out man gsplit for details.

The topic ‘how to split txt files in same size’ is closed to new replies.